Latency- and cost-optimized Gemini variant with long context and multimodal input for high-volume workloads.
Specifications
- Provider
- Google DeepMind
- Type
- Vendor / proprietary
- Modality
- Multimodal
- Category
- Multimodal model
- Context window
- 1M
- Released
- May 19, 2026
What it was trained for
Gemini 3.5 Flash is a fast, cost-efficient multimodal model in Google's Gemini family, built to handle high-volume tasks at low latency while retaining multimodal understanding.
Best for
- ▸High-throughput, latency-sensitive applications
- ▸Cost-efficient summarization and extraction
- ▸Multimodal understanding of text and images
- ▸Chat assistants and customer support
- ▸Real-time content classification
Capabilities
Low latency inferenceMultimodal inputCost-efficient operationLong context windowTool and function calling
Performance & positioning
Optimized for speed and cost efficiency, offering a strong balance of quality and throughput for everyday and high-volume workloads.
More from Google DeepMind
