Skip to main content
NVIDIA provides powerful vision language models through their NIM platform. The plugin enables real-time video understanding using models like Cosmos Reason2 with automatic frame buffering and NVCF asset management.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[nvidia]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import nvidia, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="Analyze the video and answer questions.",
    llm=nvidia.VLM(
        model="nvidia/cosmos-reason2-8b",
        fps=1,
        frame_buffer_seconds=10,
    ),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)
Set NVIDIA_API_KEY in your environment or pass api_key directly.

Parameters

NameTypeDefaultDescription
modelstr"nvidia/cosmos-reason2-8b"NVIDIA model ID
fpsint1Video frames per second to buffer
frame_buffer_secondsint10Seconds of video to buffer
frame_widthint800Frame width
frame_heightint600Frame height
api_keystrNoneAPI key (defaults to NVIDIA_API_KEY env var)

Next Steps