- OpenAI Realtime - A native integration for realtime video and audio with out-of-the-box support for OpenAI’s realtime models. Stream both video and audio to OpenAI over WebRTC and receive responses in real-time. Supports MCP and function calling.
-
OpenAI LLM - Access OpenAI’s language models (like
gpt-4o) with full support for the Responses API. Includes conversation history management, tool calling, and streaming responses. Works with separate STT/TTS components for voice interactions. - OpenAI Chat Completions - A flexible LLM integration that works with any OpenAI-compatible API (including OSS models). Ideal for using custom models, fine-tuned models, or third-party providers that implement the OpenAI Chat Completions API.
- OpenAI TTS - A text-to-speech implementation using OpenAI’s TTS models with streaming support for low-latency audio synthesis.
Installation
Install the Stream OpenAI plugin withTutorials
The Voice AI quickstart and Video AI quickstart pages have examples to get you up and running.Example
Check out our OpenAI example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.OpenAI Realtime
The OpenAI Realtime plugin is used as the LLM component of an Agent for real-time speech-to-speech interactions.Usage
Here’s a complete example:Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "gpt-realtime" | The OpenAI model to use for speech-to-speech. Supports real-time models only. |
voice | str | "marin" | The voice to use for spoken responses (e.g., “marin”, “alloy”, “echo”). |
fps | int | 1 | Number of video frames per second to send (for video-enabled agents). |
OPENAI_API_KEY environment variable. Instructions are set via the Agent’s instructions parameter.
Methods
connect()
Establishes the WebRTC connection to OpenAI’s Realtime API. This is called automatically when the agent joins a call and should not be called directly in most cases.simple_response(text)
Sends a text message to the OpenAI Realtime session. The model will respond with audio output.simple_audio_response(pcm_data)
Sends raw PCM audio data to OpenAI. Audio should be 48 kHz, 16-bit PCM format.request_session_info()
Requests session information from the OpenAI API.Properties
output_track
Theoutput_track property provides access to the audio output stream from OpenAI. This is an AudioStreamTrack that contains the synthesized speech responses.
is_connected
ReturnsTrue if the realtime session is currently active.
Function calling
You can give the model the ability to call functions in your code while using the Realtime plugin via the main Agent class. Follow the instructions in the MCP tool calling guide, replacing the LLM with the OpenAI Realtime class.Events
The OpenAI Realtime plugin emits various events during conversations that you can subscribe to. The plugin wraps OpenAI’s native events into a strongly-typed event system with better ergonomics.OpenAI LLM
An alternative way for developers to interact with OpenAI would be through the conventional LLM pattern. This is useful if you’re using a model which does not support connecting directly over WebRTC or if you’re using a custom/fine-tuned model.Usage
To use the OpenAI LLM, the plugin exposes anLLM class, which can be used directly with the Agent. When using the LLM mode, you must also supply an STT and TTS plugin, as the audio from the user is first converted into text and then sent to the model. The response from the model is then converted back into audio.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | - | The OpenAI model to use. See OpenAI models for available options. |
api_key | Optional[str] | None | OpenAI API key. If not provided, reads from OPENAI_API_KEY environment variable. |
base_url | Optional[str] | None | Custom base URL for OpenAI API. Useful for proxies or OpenAI-compatible endpoints. |
client | Optional[AsyncOpenAI] | None | Custom AsyncOpenAI client instance. If not provided, a new client is created with the provided or environment API key. |
OpenAI Chat Completions
The OpenAI Chat Completions plugin provides a flexible way to use any model that implements the OpenAI Chat Completions API. This includes OpenAI models, custom fine-tuned models, and third-party providers that offer OpenAI-compatible endpoints. This is ideal for:- Using OSS models hosted on services like Together AI, Fireworks, or Replicate
- Deploying custom fine-tuned models
- Testing different model providers without changing your code
- Using models that don’t have native realtime support
Usage
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | - | The model identifier to use (e.g., "gpt-4o", "deepseek-ai/DeepSeek-V3.1"). |
api_key | Optional[str] | None | API key for the service. If not provided, reads from OPENAI_API_KEY environment variable. |
base_url | Optional[str] | None | Custom base URL for the API endpoint. Required for non-OpenAI providers. |
client | Optional[AsyncOpenAI] | None | Custom AsyncOpenAI client instance. If not provided, a new client is created with the provided or environment API key. |
Features
- Streaming responses: Real-time text generation with chunk events
- Conversation memory: Automatic conversation history management
- Event-driven: Emits LLM events for integration with other components
- Provider flexibility: Works with any OpenAI-compatible API
Methods
simple_response(text, processors, participant)
Generate a response to text input:Example with OSS Models
OpenAI TTS
The OpenAI TTS plugin provides text-to-speech synthesis using OpenAI’s TTS models. It supports streaming audio output for low-latency speech generation.Usage
Use the OpenAI TTS plugin as thetts parameter when creating an Agent:
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
api_key | Optional[str] | None | OpenAI API key. If not provided, reads from OPENAI_API_KEY environment variable. |
model | str | "gpt-4o-mini-tts" | The OpenAI TTS model to use. See OpenAI TTS docs for options. |
voice | str | "alloy" | The voice to use for synthesis. Options include “alloy”, “echo”, “fable”, “onyx”, “nova”, and “shimmer”. |
client | Optional[AsyncOpenAI] | None | Custom AsyncOpenAI client instance. If not provided, a new client is created with the provided or environment API key. |
Methods
stream_audio(text)
Synthesizes speech from text and returns PCM audio data. This method is called internally by the Agent when usingagent.say().
stop_audio()
Stops any ongoing audio synthesis. For OpenAI TTS, this is a no-op as the agent manages the output track.Audio format
OpenAI TTS returns audio in the following format:- Sample rate: 24,000 Hz
- Channels: 1 (mono)
- Format: 16-bit signed PCM

