Skip to main content
Sarvam AI provides streaming speech-to-text built for Indian languages. The plugin uses WebSocket streaming with built-in voice activity detection for turn events.
Vision Agents requires a Stream account for real-time transport. Get your Sarvam API key from the Sarvam dashboard.
Sarvam also provides text-to-speech and an LLM. You can use all three in the same agent.

Installation

uv add "vision-agents[sarvam]"

Quick start

from vision_agents.core import Agent, User
from vision_agents.plugins import sarvam, getstream, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Sarvam Agent", id="agent"),
    instructions="Reply in the same language the user speaks.",
    llm=sarvam.LLM(model="sarvam-m"),
    stt=sarvam.STT(language="hi-IN"),
    tts=sarvam.TTS(speaker="shubh"),
    turn_detection=smart_turn.TurnDetection(),
)
Set SARVAM_API_KEY in your environment or pass api_key directly.

Parameters

stt = sarvam.STT(
    model="saaras:v3",
    language="hi-IN",
    mode="transcribe",
    sample_rate=16000,
    high_vad_sensitivity=False,
)
NameTypeDefaultDescription
modelstr"saaras:v3"Streaming model (saaras:v3, saarika:v2.5, saaras:v2.5)
languagestrNoneLanguage code (e.g. hi-IN, en-IN). None for auto-detect
modestrNonetranscribe, translate, verbatim, translit, or codemix
sample_rateint16000Input sample rate — 8000 or 16000 Hz
high_vad_sensitivityboolFalseIncrease VAD sensitivity for noisy input
vad_signalsboolTrueEmit speech start/end events for turn detection
promptstrNoneOptional biasing prompt sent after connect
api_keystrNoneAPI key (defaults to SARVAM_API_KEY env var)

Next steps

Sarvam TTS

Streaming text-to-speech for Indian languages

Sarvam LLM

Chat completions with Sarvam models