Microsoft Foundry Unveils MAI-Voice-2 AI for Multilingual Speech Generation
Microsoft is offering MAI-Voice-2 in public preview via Microsoft Foundry, enabling natural speech generation across more than 10 languages. This first-party AI voice model supports voice cloning and voice prompting for developers. It allows streaming professionals to build multilingual virtual agents and audio content workflows directly with Microsoft's tools.
Key Takeaways
- MAI-Voice-2 supports voice cloning from short reference samples and voice prompting for tone and style modifications.
- The model generates natural speech across more than 10 languages, available through Microsoft Foundry.
- Developers can deploy MAI-Voice-2 directly via the Foundry catalog, integrating it with other speech and language models.
- Microsoft will sell MAI-Voice-2 with its standard billing, content safety, and responsible AI policies.
Why It Matters
Microsoft's introduction of MAI-Voice-2 simplifies the development of AI-driven voice applications for streaming. The ability to generate natural, multilingual speech with voice cloning and prompting streamlines content localization and accessibility efforts. This move intensifies competition in the AI voice synthesis market, offering a more integrated solution for developers. Watch for adoption rates among content platforms and the specific applications emerging for global audience engagement.
Read full article at azure.microsoft.com