Star Vision Agents on GitHub
Get started with examples, contribute, and stay updated
What Can You Build?
- Voice Agents — customer support, phone bots, voice assistants
- Video AI — coaching, avatars, surveillance, manufacturing
- Phone Integration — inbound and outbound calling via Twilio
- RAG Applications — knowledge-powered agents with Gemini or Turbopuffer
Key Features
| Feature | Description |
|---|---|
| 25+ Integrations | OpenAI, Gemini, Deepgram, ElevenLabs, NVIDIA, and more |
| Two Modes | Realtime (WebRTC/WebSocket) or custom STT→LLM→TTS pipelines |
| Video Processing | YOLO, Roboflow, custom processors for computer vision |
| Production Ready | HTTP server, Prometheus metrics, Docker deployment |
| Phone Support | Twilio integration for voice calls |
| RAG | Gemini FileSearch, Turbopuffer vector search |
Built-in Integrations
LLMs: OpenAI, Gemini, Anthropic, xAI, OpenRouter, HuggingFace Realtime: OpenAI (WebRTC), Gemini (WebSocket), Qwen, AWS Nova STT: Deepgram, Fast-Whisper, Fish, Wizper TTS: ElevenLabs, Cartesia, Deepgram, Pocket, AWS Polly Vision: NVIDIA Cosmos, HuggingFace VLMs, Moondream, Roboflow, Ultralytics Turn Detection: Deepgram (built-in), Vogent, Smart Turn Each integration is extensible. Build custom processors withVideoProcessor, add new LLM providers, or integrate any speech service.

