You’ll need API keys for Stream and each provider you use. Stream offers 333,000 free participant minutes monthly, plus additional credits through the Maker Program for indie developers.
Available Plugins
LLMs & Realtime
| Plugin | Description | Docs |
|---|---|---|
gemini | Realtime API (WebSocket) + LLM with function calling | Gemini |
openai | Realtime API (WebRTC) + LLM + TTS | OpenAI |
openrouter | Unified access to Claude, Gemini, GPT, and more | OpenRouter |
anthropic | Claude models with function calling | — |
xai | Grok models | xAI |
huggingface | LLM and VLM via HuggingFace Inference API | HuggingFace |
qwen | Qwen 3 Realtime with native audio I/O | Qwen |
aws | Nova Realtime + Polly TTS | AWS Bedrock |
Speech (STT & TTS)
| Plugin | STT | TTS | Description | Docs |
|---|---|---|---|---|
deepgram | ✓ | ✓ | Fast transcription with turn detection | Deepgram |
elevenlabs | ✓ | Expressive voices for conversational AI | ElevenLabs | |
cartesia | ✓ | Low-latency TTS with audio markup | Cartesia | |
pocket | ✓ | CPU-based TTS with voice cloning | ||
fish | ✓ | ✓ | Voice cloning and auto language detection | Fish Audio |
fast_whisper | ✓ | Local Whisper with CTranslate2 | Fast-Whisper | |
wizper | ✓ | STT with real-time translation | Wizper | |
kokoro | ✓ | Local TTS for offline use | Kokoro | |
inworld | ✓ | Streaming expressive voices for realtime applications | Inworld |
Vision & Video
Turn Detection
| Plugin | Description | Docs |
|---|---|---|
smart_turn | Neural turn detection with Silero VAD | Smart Turn |
vogent | Intelligent turn-taking | Vogent |
Infrastructure
| Plugin | Description | Docs |
|---|---|---|
getstream | Edge network for low-latency transport | — |
twilio | Phone integration (inbound/outbound) | Calling Guide |
turbopuffer | Vector search for RAG | Turbopuffer |

