Vision Agents requires a Stream account
for real-time transport. Get your Sarvam API key from the Sarvam
dashboard.
Installation
Quick start
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | "bulbul:v3" | TTS model (bulbul:v2 or bulbul:v3) |
language | str | "hi-IN" | Target language code (e.g. hi-IN, en-IN) |
speaker | str | "anushka" | Speaker voice id (e.g. shubh, anushka) |
sample_rate | int | 24000 | Output sample rate in Hz |
pace | float | None | Speech pace (bulbul:v3 supports 0.5–2.0) |
pitch | float | None | Speech pitch (bulbul:v2 only) |
loudness | float | None | Speech loudness (bulbul:v2 only) |
temperature | float | None | Sampling temperature (bulbul:v3 only) |
enable_preprocessing | bool | True | Normalize mixed-language and numeric text |
api_key | str | None | API key (defaults to SARVAM_API_KEY env var) |
Next steps
Sarvam STT
Streaming speech-to-text for Indian languages
Sarvam LLM
Chat completions with Sarvam models

