node-webrtc-rust brings voice agents into Node with Rust media
akirilyuk has developed a new Node.js library, node-webrtc-rust, that allows developers to build real-time voice agents with WebRTC transport and Rust-native media timing. The library supports embedded voice activity detection (VAD), barge-in functionality, and integrates with various STT/TTS vendors, including a free on-device Sherpa-ONNX option. The platform prioritizes agent logic in TypeScript, while delegating complex audio processing and WebRTC handling to Rust, offering an alternative to standalone media servers for voice agent workloads.
Key Takeaways
- VoiceAgent runs one WebRTC conversation per connection, with one inbound track and one outbound track.
- The stack supports VAD, barge-in, user_speaking_start/end, user_speech_final, and speechEvents() delivery modes.
- Sherpa-ONNX local-sherpa provides free on-device STT and TTS with SHERPA_STT_MODEL_PATH and SHERPA_TTS_MODEL_PATH env vars.
- The repo lists six prebuilt binary targets, including macOS aarch64/x86_64, Linux x64 glibc/musl, Linux arm64, and Windows x64 MSVC.
- The project’s 0.3.0 roadmap centers on VoiceAgent, VAD, barge-in, six STT/TTS vendors, and a speech event stream.
Why It Matters
This moves real-time voice-agent infrastructure closer to the application layer: Node handles session logic while Rust keeps audio timing, VAD, barge-in, and playback inside the same process. The practical effect is less dependence on a separate SFU-style media tier for agent workloads, with optional cloud STT/TTS and a local Sherpa-ONNX path for sensitive audio. For the streaming stack, the notable signal is the breadth of supported vendors and the packaging model: npm install plus prebuilt binaries across six platform targets. Watch whether the 0.3.0 release expands beyond the current voice-agent feature set and vendor matrix.
Read full article at github.com