Empower Your Voice Agents with OpenAI's State-of-the-Art Audio Capabilities

Published On Fri Mar 21 2025
Empower Your Voice Agents with OpenAI's State-of-the-Art Audio Capabilities

New audio models in the API + tools for voice agents ...

Exciting news! We are thrilled to announce the launch of three new audio models in the API. Additionally, we have made enhancements to our Agents SDK to seamlessly integrate these new models. With just a few lines of code, you can now transform any text-based agent into an audio agent.

How to Access OpenAI's New Audio Models API

Speech-to-text

Introducing the gpt-4o-transcribe and gpt-4o-mini-transcribe models, designed to elevate customer service voice agents and streamline the transcription of meeting notes. These new models, surpassing Whisper in both accuracy and performance, offer bidirectional streaming capabilities for real-time transcription. The updated streaming API features built-in noise cancellation and a semantic voice activity detector, ensuring accurate transcriptions only when the user completes their thoughts. For detailed information, refer to our documentation.

Text-to-speech

With the latest addition of the gpt-4o-mini-tts model, users can now have precise control over the tone, emotion, and speed of generated voices, resulting in more natural and engaging interactions. Starting with 10 preset voices, you can customize speech prompts to suit various scenarios, ranging from empathetic customer service interactions to expressive storytelling experiences. Explore our OpenAI.fm demo to test our new TTS model under beta terms or consult our docs to begin.

OpenAI has released a voice transcription model and text-to-speech ...

Agents SDK updates

Enhance your text agents with audio capabilities by incorporating speech-to-text and text-to-speech functionalities using the Agents SDK with minimal effort. To kickstart this integration, visit the Agents SDK documentation. Leveraging these new models with the Agents SDK is optimal for those with existing text-based agents or voice agents powered by our speech-to-text and text-to-speech pipeline. For low-latency speech-to-speech experiences, we recommend utilizing our speech-to-speech models in the Realtime API. Stay tuned for the upcoming Realtime vision feature!

Live Roleplays powered by OpenAI Realtime API

If you have queries regarding Brazilian Portuguese language support or the release date of the RealTime API upgrade, reach out for assistance. We strive to offer optimal solutions for all language and accent requirements.

Experience the future of audio agents with OpenAI's innovative audio models and tools. Elevate your voice agent capabilities and redefine user interactions with our advanced AI technologies.