Gemini 2.5 now introduces an innovative feature known as 'native audio,' enabling the generation of speech with human-like expressiveness. This cutting-edge technology is available for experimentation on Google AI Studio and various other platforms.
Gemini 2.5's Native Audio Capabilities

Upon testing this feature by inputting prompts and having them read aloud, it demonstrated remarkable abilities to convey the intended emotions naturally. However, there were observations that the intonation of the Kansai dialect sounded somewhat unnatural.
To experience this feature firsthand, users can input their desired text into the 'Raw structure' section and specify the speaker's name in the designated box. This functionality allows for the generation of conversational audio involving up to two individuals.
For a visual demonstration, check out the video showcasing controllable text reading using 'Gemini 2.5' on Google AI Studio: I tried out controllable text reading with 'Gemini 2.5' in Google AI Studio - YouTube
The native audio functionality is accessible on Google AI Studio and Vertex AI through the Gemini API. Google emphasizes that all generated audio is protected with their unique watermarking technology, SynthID.
Related Posts:
<< Next Florida law banning teens from using social media temporarily blocked as 'potentially unconstitutional' Prev >> It turns out that Meta was tracking users' behavior in the same way as Russia's Yandex, embedding 'code that communicates with smartphone apps' on millions of websites, so deleting browser history is useless in Review, Software, Web Application, Video
