Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
Quick Start
TTS
Expressive text-to-speech synthesis.| Name | Type | Default | Description |
|---|---|---|---|
voice_id | str | "VR6AewLTigWG4xSOukaG" | ElevenLabs voice ID |
model_id | str | "eleven_multilingual_v2" | TTS model |
api_key | str | None | API key (defaults to ELEVENLABS_API_KEY env var) |
STT
Real-time transcription via Scribe v2 (~150ms latency, 99 languages).| Name | Type | Default | Description |
|---|---|---|---|
model_id | str | "scribe_v2_realtime" | Scribe model |
language_code | str | "en" | Language code |
api_key | str | None | API key (defaults to ELEVENLABS_API_KEY env var) |
Scribe v2 does not support turn detection. Use a separate turn detection plugin like Smart Turn if needed.

