Google announces Gemini 2.5 Flash preview, its first fully hybrid reasoning model
During the Cloud Next event last week, Google announced that the Gemini 2.5 Flash model is coming soon with major improvements. Today, Google announced the roll out of the Gemini 2.5 Flash preview in the Gemini API via Google AI Studio and Vertex AI. This new model is also available for Gemini users via the model picker and can be used with Canvas for easily refining documents and code.
Performance Enhancements
Following in the footsteps of its predecessor, Gemini 2.0 Flash, Gemini 2.5 Flash comes with significant improvements to reasoning capabilities without incurring high costs or latency. Google claims that this new model has an excellent performance-to-cost ratio. The pricing details are below:
This is an early version of 2.5 Flash, but it already shows huge gains over 2.0 Flash.
You can fully turn off thinking if needed and use this model as a drop in replacement for 2.0 Flash.
It’s available across the Gemini API, AI Studio, Vertex, and the Gemini app!
Hybrid Reasoning Model
Gemini 2.5 Flash is the first fully hybrid reasoning model from Google that allows developers to enable reasoning on or off. This is said to help developers optimize their responses depending on the targeted quality, cost, and latency. Check out the benchmarks for this new model below.
As shown in the table above, despite its low cost, Gemini 2.5 Flash seems to hold its own when compared to frontier models from Anthropic and Grok. OpenAI's recently released o4-mini appears to perform better than the Gemini 2.5 Flash preview, but it costs significantly more.