| AWS Polly | TTS plugin using Amazon’s cloud-based service with natural-sounding voices and neural engine support | AWS Polly |
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications | Cartesia |
| Decart | Real-time AI video transformation service for applying artistic styles and effects to video streams | Decart |
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization | Deepgram |
| ElevenLabs | TTS plugin with highly realistic and expressive voices for conversational agents | ElevenLabs |
| Fast-Whisper | High-performance STT plugin using OpenAI’s Whisper model with CTranslate2 for fast inference | Fast-Whisper |
| Fish Audio | STT and TTS plugin with automatic language detection and voice cloning capabilities | Fish Audio |
| Gemini | Realtime API for building conversational agents with support for both voice and video | Gemini |
| HeyGen | Realtime interactive avatars powered by HeyGen | Heygen |
| Inworld | TTS plugin with high-quality streaming voices for real-time conversational AI agents | Inworld |
| Kokoro | Local TTS engine for offline voice synthesis with low latency | Kokoro |
| Moondream | Moondream provides realtime detection and VLM capabilities. Developers can choose from using the hosted API or running locally on their CUDA devices. Vision Agents supports Moondream’s Detect, Caption and VQA skills out-of-the-box. | Moondream |
| OpenAI | Realtime API for building conversational agents with out of the box support for real-time video directly over WebRTC, LLMs and Open AI TTS | OpenAI |
| Smart Turn | Advanced turn detection system combining Silero VAD, Whisper, and neural models for natural conversation flow | Smart Turn |
| Vogent | Neural turn detection system for intelligent turn-taking in voice conversations | Vogent |
| Wizper | STT plugin with real-time translation capabilities powered by Whisper v3 | Wizper |
| xAI | LLM plugin using xAI’s Grok models with advanced reasoning and real-time knowledge | xAI |