Gemini 2.5 Flash: Cost-Effective Thinking Model by Google

Published On Sat Apr 19 2025

Google's most cost-effective AI model—Gemini 2.5 Flash—now available

Google recently unveiled its latest artificial intelligence (AI) model, Gemini 2.5 Flash, following closely on the heels of the Gemini 2.5 Pro launch. This new model is currently in preview mode and can be accessed through the Gemini API, AI Studio, and Vertex AI platforms.

Model Advancements

Gemini 2.5 Flash boasts a knowledge cutoff set at January 2025 and is equipped to process various types of data including text, images, video, and audio prompts. It is designed with a one-million-token context window, allowing for more comprehensive reasoning capabilities compared to its predecessor, Flash 2.0.

Google emphasizes that models like Gemini 2.5 Flash require additional time to interpret queries before generating responses. This approach aims to deliver more precise and tailored outputs that align better with user requirements than previous speed-centric models.

Developer Flexibility

Google positions Gemini 2.5 Flash as its most cost-effective thinking model, with developers being charged $0.15 per million input tokens. Output pricing is tiered, ranging from $0.60 per million tokens with reasoning disabled to $3.50 per million tokens with reasoning enabled.

The company highlights that developers have the flexibility to adjust the reasoning capabilities of the model to achieve optimal performance, effectively balancing cost considerations with complexity. In cases where no specific budget is set, the model autonomously determines the query complexity.

Gemini 2.5: Our newest Gemini model with thinking

If developers do not set a budget, the model decides the complexity of queries on its own.

Test Results

In a recent Humanity's Last Exam (HLE) assessment, Gemini 2.5 Flash achieved a score of 12%. This performance surpassed rival models such as Claude 3.7 Sonnet and DeepSeek R1 but fell short of OpenAI's newly introduced o4-mini model, which scored 14%.

The HLE test serves as an alternative benchmark to existing industry assessments that may no longer adequately challenge rapidly evolving models like Gemini 2.5 Flash.