Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
Realtime
Native speech-to-speech with optional video over WebSocket.| Name | Type | Default | Description |
|---|---|---|---|
model | str | "gemini-2.5-flash" | Gemini model |
fps | int | 1 | Video frames per second |
api_key | str | None | API key (defaults to GOOGLE_API_KEY env var) |
LLM
Standard chat completions. Requires separate STT/TTS.Built-in Tools
Gemini provides built-in tools you can enable:| Tool | Description |
|---|---|
GoogleSearch | Ground responses with web data |
CodeExecution | Run Python code |
FileSearch | RAG over your documents |
URLContext | Read specific web pages |
File Search (RAG)
Managed RAG with automatic chunking and retrieval:Function Calling
Events
The Gemini plugin emits events for connection state and responses. Most developers should use the core events (LLMResponseCompletedEvent, etc.) for provider-agnostic code.| Event | Description |
|---|---|
GeminiConnectedEvent | Realtime connection established |
GeminiErrorEvent | Error occurred |
GeminiAudioEvent | Audio output received |
GeminiTextEvent | Text output received |
GeminiResponseEvent | Response chunk received |

