Kokoro is a local TTS engine that runs entirely on your machine. No API key or internet connection required. Ideal for offline voice synthesis, privacy-sensitive applications, or prototyping.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[kokoro]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import kokoro, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=kokoro.TTS(),
)
Kokoro runs locally. No API key or internet connection is required.
Parameters
| Name | Type | Default | Description |
|---|
voice | str | "af_heart" | Voice preset |
lang_code | str | "a" | Language code ("a" = American English) |
speed | float | 1.0 | Playback speed (e.g., 0.9 slower, 1.2 faster) |
device | str | None | Device ("cuda", "cpu", or auto-detect) |
Next Steps