Gemini 3.5 Flash

Vendor · Google DeepMind · Multimodal model

Latency- and cost-optimized Gemini variant with long context and multimodal input for high-volume workloads.

Specifications

Provider: Google DeepMind
Type: Vendor / proprietary
Modality: Multimodal
Category: Multimodal model
Context window: 1M
Released: May 19, 2026

What it was trained for

Gemini 3.5 Flash is a fast, cost-efficient multimodal model in Google's Gemini family, built to handle high-volume tasks at low latency while retaining multimodal understanding.

Best for

▸High-throughput, latency-sensitive applications
▸Cost-efficient summarization and extraction
▸Multimodal understanding of text and images
▸Chat assistants and customer support
▸Real-time content classification

Capabilities

Low latency inferenceMultimodal inputCost-efficient operationLong context windowTool and function calling

Performance & positioning

Optimized for speed and cost efficiency, offering a strong balance of quality and throughput for everyday and high-volume workloads.

Learn more ↗