Skip to main content
Install Vision Agents from PyPI. We recommend uv as the package manager with Python 3.12 and CPython installed on your machine. For the best development experience we recommend adding our MCP server and Skill.md to your preferred coding tools.
uv add vision-agents
The SDK installs without provider packages by default. Add the ones you need:
uv add "vision-agents[getstream,gemini,deepgram,elevenlabs]" python-dotenv
You’ll need API keys for Stream and each provider you use. Stream offers 333,000 free participant minutes monthly, plus additional credits through the Maker Program for indie developers.

Available Plugins

LLMs & Realtime

PluginDescriptionDocs
geminiRealtime API (WebSocket) + LLM with function callingGemini
openaiRealtime API (WebRTC) + LLM + TTSOpenAI
openrouterUnified access to Claude, Gemini, GPT, and moreOpenRouter
anthropicClaude models with function calling
xaiGrok modelsxAI
huggingfaceLLM and VLM via HuggingFace Inference APIHuggingFace
qwenQwen 3 Realtime with native audio I/OQwen
awsNova Realtime + Polly TTSAWS Bedrock

Speech (STT & TTS)

PluginSTTTTSDescriptionDocs
deepgramFast transcription with turn detectionDeepgram
elevenlabsExpressive voices for conversational AIElevenLabs
cartesiaLow-latency TTS with audio markupCartesia
pocketCPU-based TTS with voice cloningPocket
fishVoice cloning and auto language detectionFish Audio
fast_whisperLocal Whisper with CTranslate2Fast-Whisper
wizperSTT with real-time translationWizper
kokoroLocal TTS for offline useKokoro
inworldStreaming expressive voices for realtime applicationsInworld

Vision & Video

PluginDescriptionDocs
nvidiaCosmos 2 VLM for video understandingNVIDIA
ultralyticsYOLO detection, pose, segmentationUltralytics
roboflowCloud or local detection with RF-DETRRoboflow
moondreamDetection, captioning, VQAMoondream
decartReal-time video style transferDecart
heygenInteractive avatarsHeyGen

Turn Detection

PluginDescriptionDocs
smart_turnNeural turn detection with Silero VADSmart Turn
vogentIntelligent turn-takingVogent

Infrastructure

PluginDescriptionDocs
getstreamEdge network for low-latency transport
twilioPhone integration (inbound/outbound)Calling Guide
turbopufferVector search for RAGTurbopuffer

Next Steps