Skip to main content
Cartesia provides low-latency text-to-speech with the Sonic model. Designed for real-time voice applications with natural-sounding speech synthesis.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[cartesia]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import cartesia, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=deepgram.STT(),
    tts=cartesia.TTS(),
)
Set CARTESIA_API_KEY in your environment or pass api_key directly.

Parameters

NameTypeDefaultDescription
model_idstr"sonic-3"Cartesia TTS model
voice_idstr"f9836c6e-a0bd-460e-9d3c-f7299fa60f94"Voice ID
sample_rateint16000Audio sample rate in Hz
api_keystrNoneAPI key (defaults to CARTESIA_API_KEY env var)

Next Steps