It supports multiple languages and voices, making it ideal for real-time conversational agents, narrated content, accessibility tools, and voice-enabled applications. The ElevenLabs plugin for the Stream Python AI SDK allows you to add both TTS and STT functionality to your project.
Installation
Install the Stream ElevenLabs plugin withExample
Check out our Elevenlabs example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.Text-to-Speech (TTS)
Initialisation
The ElevenLabs TTS plugin exists in the form of theTTS class:
Parameters
These are the parameters available in the ElevenLabs TTS plugin for you to customise:| Name | Type | Default | Description |
|---|---|---|---|
api_key | str or None | None | Your ElevenLabs API key. If not provided, the plugin will look for the ELEVENLABS_API_KEY environment variable. |
voice_id | str | "VR6AewLTigWG4xSOukaG" | The ID of the voice to use for TTS. You can use any voice from your ElevenLabs account. |
model_id | str | "eleven_multilingual_v2" | The ID of the ElevenLabs TTS model to use. Controls the language and tone model for synthesis. |
Functionality
Send text to convert to speech
Thesend() method sends the text passed in for the service to synthesize.
The resulting audio is then played through the configured output track.
Speech-to-Text (STT)
ElevenLabs provides real-time speech-to-text capabilities through their Scribe v2 model, which offers low latency (~150ms) transcription with support for 99 languages.Initialisation
The ElevenLabs STT plugin uses theSTT class:
Parameters
These are the parameters available in the ElevenLabs STT plugin for you to customise:| Name | Type | Default | Description |
|---|---|---|---|
api_key | str or None | None | Your ElevenLabs API key. If not provided, the plugin will look for the ELEVENLABS_API_KEY environment variable. |
model_id | str | "scribe_v2_realtime" | The model to use for transcription. Defaults to Scribe v2 realtime model. |
language_code | str | "en" | Language code for transcription (e.g., “en”, “es”, “fr”). Supports 99 languages. |
vad_silence_threshold_secs | float | 1.5 | VAD silence threshold in seconds before committing a transcript. |
vad_threshold | float | 0.4 | VAD threshold for speech detection (0.0-1.0). |
min_speech_duration_ms | int | 100 | Minimum speech duration in milliseconds to trigger transcription. |
min_silence_duration_ms | int | 100 | Minimum silence duration in milliseconds to detect speech boundaries. |
audio_chunk_duration_ms | int | 100 | Duration of audio chunks to send (100-1000ms recommended). |
client | AsyncElevenLabs or None | None | Optional pre-configured AsyncElevenLabs client instance. |
Features
- Real-time transcription: Low latency (~150ms) speech recognition
- Multi-language support: 99 languages supported
- VAD-based commit strategy: Automatic transcript segmentation based on voice activity detection
- Automatic reconnection: Built-in exponential backoff for connection failures
- Audio resampling: Automatically resamples audio to 16kHz mono for optimal quality
The Scribe v2 model does not support turn detection. The
turn_detection property is set to False for this implementation.
