Cartesia provides low-latency text-to-speech with the Sonic model. Designed for real-time voice applications with natural-sounding speech synthesis.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[cartesia]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import cartesia, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=cartesia.TTS(),
)
Set CARTESIA_API_KEY in your environment or pass api_key directly.
Parameters
| Name | Type | Default | Description |
|---|
model_id | str | "sonic-3" | Cartesia TTS model |
voice_id | str | "f9836c6e-a0bd-460e-9d3c-f7299fa60f94" | Voice ID |
sample_rate | int | 16000 | Audio sample rate in Hz |
api_key | str | None | API key (defaults to CARTESIA_API_KEY env var) |
Next Steps