HeyGen provides realistic AI avatars with automatic lip-sync. Add a video avatar to your agent that speaks with natural movements synchronized to your agent’s voice.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[heygen]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import heygen, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You're a friendly AI assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=heygen.VideoQuality.HIGH,
)
],
)
Set HEYGEN_API_KEY in your environment or pass api_key directly.
Parameters
| Name | Type | Default | Description |
|---|
avatar_id | str | "default" | HeyGen avatar ID (from dashboard) |
quality | VideoQuality | HIGH | Quality (LOW, MEDIUM, HIGH) |
resolution | Tuple[int, int] | (1920, 1080) | Output resolution |
api_key | str | None | API key (defaults to HEYGEN_API_KEY env var) |
How It Works
The avatar works differently depending on your LLM type:
With Streaming LLMs (Lower Latency)
- LLM generates text → Text sent to HeyGen for lip-sync → HeyGen generates avatar video + audio
With Realtime LLMs
- Realtime LLM generates audio → Audio transcribed → Text sent to HeyGen for lip-sync → HeyGen generates video only (audio from LLM)
# With Gemini Realtime
agent = Agent(
llm=gemini.Realtime(),
processors=[heygen.AvatarPublisher(avatar_id="default")],
)
Next Steps