The Realtime component provides end-to-end speech-to-speech communication, combining STT, LLM, and TTS functionality in a single, optimized interface. It delivers ultra-low latency speech processing, direct audio streaming without intermediate text conversion, and support for multiple modalities (audio, video, text).Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
When to Use Realtime
Use a Realtime LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required. Use the traditional STT → LLM → TTS pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don’t support realtime audio.Supported Providers
- OpenAI Realtime — WebRTC-based, supports video
- Gemini Live — WebSocket-based, supports video
- AWS Nova — WebSocket-based
- Qwen Omni — Native audio support
Basic Usage
Agent methods with realtime
agent.simple_response(...) to inject text prompts and agent.say(...) for scripted speech. You usually do not call realtime audio methods directly from app code.
Properties
| Property | Type | Description |
|---|---|---|
connected | bool | True if the realtime session is active |
fps | int | Video frames per second sent to the model (default: 1) |
session_id | str | UUID identifying the current session |
epoch | int | Monotonic interruption counter. Increments each time interrupt() is called, allowing stale audio output events to be identified and dropped. |
Realtime methods
interrupt()
Increments the epoch counter so that any in-flight audio output from a previous response is detected as stale and discarded by the Agent. The Agent calls this automatically on barge-in.
Events
The Realtime class emits a small set of events for connection state:| Event | Description |
|---|---|
RealtimeConnectedEvent | Connection established with session config & capabilities |
RealtimeDisconnectedEvent | Connection closed (includes reason and clean flag) |
UserTranscriptEvent fires in both classic STT and realtime modes:
For provider-specific parameters and configuration, see the integration docs for OpenAI, Gemini, AWS Bedrock, or Qwen.

