HeyGen is a service that provides realistic AI avatars with automatic lip-sync capabilities. The HeyGen plugin allows you to add a video avatar to your AI agent that speaks with natural movements and expressions synchronized to your agent’s voice.
The HeyGen plugin for the Stream Python AI SDK allows you to add avatar video functionality to your project, creating more engaging and human-like AI interactions.
Features
- 🎤 Automatic Lip-Sync: Avatar automatically syncs with audio
- 🚀 WebRTC Streaming: Low-latency real-time video streaming
- 🎨 Customizable: Change avatar, quality, and resolution
Installation
Install the Stream HeyGen plugin with
uv add vision-agents[heygen]
Example
Check out our HeyGen examples to see working code samples using the plugin, or read on for some key details.
Initialisation
The HeyGen plugin for Stream exists in the form of the AvatarPublisher class:
from vision_agents.plugins import heygen
avatar = heygen.AvatarPublisher(
avatar_id="default",
quality=heygen.VideoQuality.HIGH
)
To initialise without passing in the API key, make sure the HEYGEN_API_KEY is available as an environment variable.
You can do this either by defining it in a .env file or exporting it directly in your terminal.
Parameters
These are the parameters available in the HeyGen AvatarPublisher plugin for you to customise:
| Name | Type | Default | Description |
avatar_id | str | "default" | HeyGen avatar ID to use for streaming. Get this from your HeyGen dashboard. |
quality | VideoQuality | VideoQuality.HIGH | Video quality setting. Options: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH. |
resolution | Tuple[int, int] | (1920, 1080) | Output video resolution as (width, height). |
api_key | str or None | None | Your HeyGen API key. If not provided, the plugin will look for the HEYGEN_API_KEY environment variable. |
How It Works
The HeyGen avatar integration works differently depending on whether you’re using a standard streaming LLM or a Realtime LLM:
With Streaming LLMs (Recommended for Lower Latency)
When using a standard streaming LLM (like Gemini LLM), the flow is:
- Text Generation: Your LLM generates text responses
- Lip-Sync: Text is sent directly to HeyGen for avatar lip-sync generation
- Audio Synthesis: HeyGen generates both the avatar video and audio with TTS
- Streaming: Avatar video and audio are streamed to call participants
This approach has lower latency because text goes directly to HeyGen without transcription delays.
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, heygen
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Avatar Assistant"),
instructions="You're a friendly AI assistant.",
llm=gemini.LLM("gemini-2.0-flash-exp"),
stt=deepgram.STT(),
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=heygen.VideoQuality.HIGH
)
]
)
With Realtime LLMs
When using a Realtime LLM (like Gemini Realtime), the flow is:
- Audio Generation: Realtime LLM generates audio directly
- Transcription: Audio is transcribed to text
- Lip-Sync: Text transcription is sent to HeyGen for avatar lip-sync
- Video Only: HeyGen generates avatar video (audio comes from the Realtime LLM)
- Streaming: Avatar video and LLM audio are streamed together
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, heygen
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Avatar Assistant"),
instructions="You're a friendly AI assistant.",
llm=gemini.Realtime(model="gemini-2.5-flash-native-audio-preview-09-2025"),
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=heygen.VideoQuality.HIGH
)
]
)
Usage in Agent
Add the AvatarPublisher to your agent’s processors list:
from uuid import uuid4
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, heygen
async def start_avatar_agent():
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="AI Assistant with Avatar", id="agent"),
instructions="You're a friendly AI assistant.",
llm=gemini.LLM("gemini-2.0-flash"),
stt=deepgram.STT(),
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=heygen.VideoQuality.HIGH,
resolution=(1920, 1080)
)
]
)
call = agent.edge.client.video.call("default", str(uuid4()))
async with await agent.join(call):
await agent.edge.open_demo(call)
await agent.simple_response("Hello! I'm your AI assistant with an avatar.")
await agent.finish()
Video Quality Options
Choose the appropriate quality based on your bandwidth and requirements:
VideoQuality.LOW: Lower bandwidth usage, suitable for slower connections
VideoQuality.MEDIUM: Balanced quality and bandwidth
VideoQuality.HIGH: Best quality, requires stable high-bandwidth connection
Getting Your Avatar ID
- Sign up for a HeyGen account
- Navigate to your HeyGen dashboard
- Find your avatar ID in the avatar settings
- Use this ID in the
avatar_id parameter
Troubleshooting
Connection Issues
If you experience connection problems:
- Verify your HeyGen API key is valid
- Ensure network access to HeyGen’s servers
- Check firewall settings for WebRTC traffic
Video Quality Issues
To optimize video quality:
- Use
quality=VideoQuality.HIGH for best results
- Ensure stable internet connection
- Consider lowering resolution if bandwidth is limited
No Avatar Appearing
- Check browser console for errors
- Verify Stream credentials are correct
- Ensure HeyGen API key has proper permissions