Google DeepMind's long-context multimodal flagship handling text, images, audio and video.
Specifications
- Provider
- Google DeepMind
- Type
- Vendor / proprietary
- Modality
- Multimodal
- Category
- Multimodal model
- Context window
- 1M+
- License
- Proprietary (API)
- Released
- March 25, 2025
What it was trained for
Google's flagship Gemini 2.5 Pro is a long-context, natively multimodal model trained to reason across text, images, audio, video, and code.
Best for
- ▸Reasoning over very large documents and codebases
- ▸Multimodal analysis of images, audio, and video
- ▸Complex coding and software development tasks
- ▸Long-context retrieval and summarization
- ▸Math and science problem solving
Capabilities
Long context windowNative multimodal inputAdvanced reasoningCode generationTool and function calling
Performance & positioning
A flagship-tier model that delivers strong reasoning and coding quality, especially on tasks requiring very long context.
More from Google DeepMind
