NVIDIA provides powerful vision language models through their NIM platform. The plugin enables real-time video understanding using models like Cosmos Reason2 with automatic frame buffering and NVCF asset management.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[nvidia]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import nvidia, getstream, deepgram, elevenlabs
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="Analyze the video and answer questions.",
llm=nvidia.VLM(
model="nvidia/cosmos-reason2-8b",
fps=1,
frame_buffer_seconds=10,
),
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
)
Set NVIDIA_API_KEY in your environment or pass api_key directly.
Parameters
| Name | Type | Default | Description |
|---|
model | str | "nvidia/cosmos-reason2-8b" | NVIDIA model ID |
fps | int | 1 | Video frames per second to buffer |
frame_buffer_seconds | int | 10 | Seconds of video to buffer |
frame_width | int | 800 | Frame width |
frame_height | int | 600 | Frame height |
api_key | str | None | API key (defaults to NVIDIA_API_KEY env var) |
Next Steps