Skip to main content
HeyGen provides realistic AI avatars with automatic lip-sync. Add a video avatar to your agent that speaks with natural movements synchronized to your agent’s voice.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[heygen]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import heygen, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You're a friendly AI assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=deepgram.STT(),
    processors=[
        heygen.AvatarPublisher(
            avatar_id="default",
            quality=heygen.VideoQuality.HIGH,
        )
    ],
)
Set HEYGEN_API_KEY in your environment or pass api_key directly.

Parameters

NameTypeDefaultDescription
avatar_idstr"default"HeyGen avatar ID (from dashboard)
qualityVideoQualityHIGHQuality (LOW, MEDIUM, HIGH)
resolutionTuple[int, int](1920, 1080)Output resolution
api_keystrNoneAPI key (defaults to HEYGEN_API_KEY env var)

How It Works

The avatar works differently depending on your LLM type: With Streaming LLMs (Lower Latency)
  1. LLM generates text → Text sent to HeyGen for lip-sync → HeyGen generates avatar video + audio
With Realtime LLMs
  1. Realtime LLM generates audio → Audio transcribed → Text sent to HeyGen for lip-sync → HeyGen generates video only (audio from LLM)
# With Gemini Realtime
agent = Agent(
    llm=gemini.Realtime(),
    processors=[heygen.AvatarPublisher(avatar_id="default")],
)

Next Steps