Voice-Pro goes open source after pausing active development
The developer of 'Voice-Pro', a Gradio-based web UI for AI-powered audio processing, has made the project's full codebase open-source and is pausing active development. The tool integrates functions for YouTube video downloading, voice separation via Demucs, speech recognition using Whisper variants, multilingual translation, and text-to-speech, including zero-shot voice cloning with models like CosyVoice and F5-TTS. The software is presented as an open-source alternative to commercial services like ElevenLabs.
Key Takeaways
- The repository says all Voice-Pro code has been made open source and “completely free.”
- Active development and updates are paused because the team is working on WeConnect.
- Voice-Pro combines yt-dlp downloads, Demucs vocal separation, and Whisper, Faster-Whisper, WhisperX, and Whisper-Timestamped for speech recognition.
- The TTS stack includes Edge-TTS, kokoro, E2-TTS, F5-TTS, and CosyVoice for zero-shot voice cloning and multilingual speech generation.
- The README positions Voice-Pro as an alternative to ElevenLabs and says it supports Windows, Mac, and Linux.
Why It Matters
Voice-Pro is no longer just a packaged tool; its code is now available for anyone to redistribute and modify, and the project’s own roadmap is on hold. That makes the repo more useful as a reference stack for dubbing, transcription, and voice cloning workflows than as a fast-moving product. The broader signal is that a single Gradio interface can stitch together YouTube ingestion, source separation, ASR, translation, and TTS without a proprietary platform. Watch the GitHub repo for issue activity, since the README explicitly directs requests there while updates remain paused.
Read full article at github.com