AI & VideoProduct Launch

OpenAI adds live voice, translation, and transcription models

OpenAI has introduced three new audio models into its API: GPT-Realtime-2 for real-time voice intelligence, GPT-Realtime-Translate for live translation across 70+ input and 13 output languages, and GPT-Realtime-Whisper for low-latency streaming speech-to-text transcription. These models aim to enable developers to build more natural, intelligent, and responsive voice applications by improving reasoning, context handling, and real-time processing capabilities.

Key Takeaways

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning and a 32K-to-128K context window increase for longer sessions.
GPT-Realtime-Translate supports more than 70 input languages and 13 output languages, and OpenAI cites use cases including customer support, cross-border sales, education, events, media, and creator platforms.
GPT-Realtime-Whisper is a streaming transcription model that turns speech into text live as the speaker talks.
OpenAI says the Realtime API supports EU Data Residency and includes active classifiers that can halt sessions flagged for harmful content.
Pricing starts at $32 per 1M audio input tokens for GPT-Realtime-2, $0.034 per minute for GPT-Realtime-Translate, and $0.017 per minute for GPT-Realtime-Whisper.

Why It Matters

OpenAI is pushing realtime audio beyond simple turn-taking toward models that can reason, translate, and transcribe while a conversation is still in progress. That matters for voice interfaces in streaming-adjacent workflows such as captions, live translation, support, and other spoken interactions the company explicitly lists. The ecosystem signal is that OpenAI is positioning voice as a production API layer, not just a demo feature, with named examples from Zillow, Deutsche Telekom, Priceline, Vimeo, and BolnaAI. What to watch next: whether developers adopt GPT-Realtime-2’s higher-context and adjustable reasoning modes in shipped products, not just in the Playground.

Read full article at openai.com

Agora: Agora Integrates OpenAI Real-Time API for Low-Latency Conversational AI

Amazon Web Services, Inc.: AWS SageMaker Adds Multi-Turn RL for Specialized AI Model Training

wTVision: wTVision Debuts CricketStats CG, Enters Cricket Graphics Market in Bangladesh

OpenAI adds live voice, translation, and transcription models

Key Takeaways

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning and a 32K-to-128K context window increase for longer sessions.
GPT-Realtime-Translate supports more than 70 input languages and 13 output languages, and OpenAI cites use cases including customer support, cross-border sales, education, events, media, and creator platforms.
GPT-Realtime-Whisper is a streaming transcription model that turns speech into text live as the speaker talks.
OpenAI says the Realtime API supports EU Data Residency and includes active classifiers that can halt sessions flagged for harmful content.
Pricing starts at $32 per 1M audio input tokens for GPT-Realtime-2, $0.034 per minute for GPT-Realtime-Translate, and $0.017 per minute for GPT-Realtime-Whisper.

Why It Matters

Read full article at openai.com

OpenAI adds live voice, translation, and transcription models

Key Takeaways

Why It Matters

Related Articles

OpenAI adds live voice, translation, and transcription models

Key Takeaways

Why It Matters

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

Agora Integrates OpenAI Real-Time API for Low-Latency Conversational AI

AWS SageMaker Adds Multi-Turn RL for Specialized AI Model Training

wTVision Debuts CricketStats CG, Enters Cricket Graphics Market in Bangladesh