Realtime Class

The Realtime component provides end-to-end speech-to-speech communication, combining STT, LLM, and TTS functionality in a single, optimized interface. It delivers ultra-low latency speech processing, direct audio streaming without intermediate text conversion, and support for multiple modalities (audio, video, text).

When to Use Realtime

Use a Realtime LLM when you want the lowest latency voice interactions. The model handles speech recognition, response generation, and speech synthesis natively—no separate STT or TTS services required. Use the traditional STT → LLM → TTS pipeline when you need custom voices (e.g., Cartesia, ElevenLabs), specific transcription providers, or models that don’t support realtime audio.

Supported Providers

OpenAI Realtime — WebRTC-based, supports video
Gemini Live — WebSocket-based, supports video
AWS Nova — WebSocket-based
Qwen Omni — Native audio support

Basic Usage

from vision_agents.plugins import openai, getstream
from vision_agents.core.agents import Agent
from vision_agents.core.edge.types import User

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant", id="agent"),
    instructions="You're a helpful voice assistant",
    llm=openai.Realtime(model="gpt-realtime", voice="marin"),
    processors=[]
)

Methods

`simple_response(text, processors=None, participant=None)`

Sends a text prompt to the realtime model. The model responds with audio.

await agent.llm.simple_response("What do you see in the video?")

`simple_audio_response(pcm, participant=None)`

Sends raw PCM audio data directly to the model for processing.

await agent.llm.simple_audio_response(audio_pcm_data)

Properties

Property	Type	Description
`connected`	`bool`	`True` if the realtime session is active
`fps`	`int`	Video frames per second sent to the model (default: 1)
`session_id`	`str`	UUID identifying the current session
`epoch`	`int`	Monotonic interruption counter. Increments each time `interrupt()` is called, allowing stale audio output events to be identified and dropped.

Methods

`interrupt()`

Increments the epoch counter so that any in-flight RealtimeAudioOutputEvent from a previous response is detected as stale and discarded by the Agent. The Agent calls this automatically on barge-in.

Events

The Realtime class emits events for monitoring conversations:

Event	Description
`RealtimeConnectedEvent`	Connection established with session config & capabilities
`RealtimeDisconnectedEvent`	Connection closed (includes reason and clean-close flag)
`RealtimeUserSpeechTranscriptionEvent`	Transcript of user speech
`RealtimeAgentSpeechTranscriptionEvent`	Transcript of agent speech
`RealtimeResponseEvent`	AI response text (with `is_complete` flag)
`RealtimeAudioInputEvent`	Audio sent to the realtime LLM
`RealtimeAudioOutputEvent`	Audio received from the realtime LLM
`RealtimeAudioOutputDoneEvent`	Audio output complete for a response
`RealtimeConversationItemEvent`	Conversation state update (message, function call, etc.)
`RealtimeErrorEvent`	Error during processing (with recoverability flag)

from vision_agents.core.llm.events import RealtimeUserSpeechTranscriptionEvent

@agent.llm.events.on(RealtimeUserSpeechTranscriptionEvent)
async def on_user_speech(event):
    print(f"User said: {event.text}")

For provider-specific parameters and configuration, see the integration docs for OpenAI, Gemini, AWS Bedrock, or Qwen.

Getting Started

AI Technologies

Core Architecture

Reference

When to Use Realtime

Supported Providers

Basic Usage

Methods

`simple_response(text, processors=None, participant=None)`

`simple_audio_response(pcm, participant=None)`

Properties

Methods

`interrupt()`

Events

Getting Started

AI Technologies

Core Architecture

Reference

​When to Use Realtime

​Supported Providers

​Basic Usage

​Methods

​simple_response(text, processors=None, participant=None)

​simple_audio_response(pcm, participant=None)

​Properties

​Methods

​interrupt()

​Events

When to Use Realtime

Supported Providers

Basic Usage

Methods

`simple_response(text, processors=None, participant=None)`

`simple_audio_response(pcm, participant=None)`

Properties

Methods

`interrupt()`

Events