When to Use Realtime
Use a Realtime LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required. Use the traditional STT → LLM → TTS pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don’t support realtime audio.Supported Providers
- OpenAI Realtime — WebRTC-based, supports video
- Gemini Live — WebSocket-based, supports video
- AWS Nova — WebSocket-based
- Qwen Omni — Native audio support
Basic Usage
Methods
simple_response(text, processors=None, participant=None)
Sends a text prompt to the realtime model. The model responds with audio.
simple_audio_response(pcm, participant=None)
Sends raw PCM audio data directly to the model for processing.
Properties
| Property | Type | Description |
|---|---|---|
connected | bool | True if the realtime session is active |
fps | int | Video frames per second sent to the model (default: 1) |
session_id | str | UUID identifying the current session |
epoch | int | Monotonic interruption counter. Increments each time interrupt() is called, allowing stale audio output events to be identified and dropped. |
Methods
interrupt()
Increments the epoch counter so that any in-flight RealtimeAudioOutputEvent from a previous response is detected as stale and discarded by the Agent. The Agent calls this automatically on barge-in.
Events
The Realtime class emits events for monitoring conversations:| Event | Description |
|---|---|
RealtimeConnectedEvent | Connection established with session config & capabilities |
RealtimeDisconnectedEvent | Connection closed (includes reason and clean-close flag) |
RealtimeUserSpeechTranscriptionEvent | Transcript of user speech |
RealtimeAgentSpeechTranscriptionEvent | Transcript of agent speech |
RealtimeResponseEvent | AI response text (with is_complete flag) |
RealtimeAudioInputEvent | Audio sent to the realtime LLM |
RealtimeAudioOutputEvent | Audio received from the realtime LLM |
RealtimeAudioOutputDoneEvent | Audio output complete for a response |
RealtimeConversationItemEvent | Conversation state update (message, function call, etc.) |
RealtimeErrorEvent | Error during processing (with recoverability flag) |
For provider-specific parameters and configuration, see the integration docs for OpenAI, Gemini, AWS Bedrock, or Qwen.

