Telemetry & Metrics

Vision Agents provides built-in observability through OpenTelemetry. Collect metrics and traces across all components to monitor performance, latency, and errors in your agents.

Quick Start

To enable metrics collection, configure OpenTelemetry:

# 1. Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# 2. Now import and create your agent
from vision_agents.core import Agent

agent = Agent(llm=..., stt=..., tts=...)

Metrics are now available at http://localhost:9464/metrics.

MetricsCollector

The MetricsCollector class subscribes to events from all agent components and records OpenTelemetry metrics automatically. Each Agent automatically creates a MetricsCollector internally, so metrics collection is enabled by default. If no OpenTelemetry providers are configured, metrics are no-ops and have no performance impact. The collector listens to events from:

LLM — Response latency, token usage, tool calls
STT — Transcription latency, audio duration
TTS — Synthesis latency, audio duration, characters
Turn Detection — Turn duration, trailing silence
Realtime LLM — Session metrics, audio I/O, transcriptions
VLM — Inference latency, token usage
Video Processors — Frame processing, detections

Metric Attributes

All metrics include contextual attributes:

Attribute	Description
`provider`	The plugin name (e.g., `openai`, `deepgram`)
`model`	Model identifier when available
`error_type`	Exception class name for error metrics
`error_code`	Error code when available

Metrics Reference

All metrics use the vision_agents.core meter namespace.

STT Metrics

Metric	Type	Unit	Description
`stt.latency.ms`	Histogram	ms	Processing latency for speech-to-text
`stt.audio_duration.ms`	Histogram	ms	Duration of audio processed
`stt.errors`	Counter	—	Total STT errors

TTS Metrics

Metric	Type	Unit	Description
`tts.latency.ms`	Histogram	ms	Synthesis latency
`tts.audio_duration.ms`	Histogram	ms	Duration of synthesized audio
`tts.characters`	Counter	—	Characters synthesized
`tts.errors`	Counter	—	Total TTS errors

LLM Metrics

Metric	Type	Unit	Description
`llm.latency.ms`	Histogram	ms	Response latency (request to complete)
`llm.time_to_first_token.ms`	Histogram	ms	Time to first token (streaming)
`llm.tokens.input`	Counter	—	Input/prompt tokens consumed
`llm.tokens.output`	Counter	—	Output/completion tokens generated
`llm.tool_calls`	Counter	—	Tool/function calls executed
`llm.tool_latency.ms`	Histogram	ms	Tool execution latency
`llm.errors`	Counter	—	Total LLM errors

Turn Detection Metrics

Metric	Type	Unit	Description
`turn.duration.ms`	Histogram	ms	Duration of detected speech turns
`turn.trailing_silence.ms`	Histogram	ms	Silence duration before turn end

Realtime LLM Metrics

For speech-to-speech models like OpenAI Realtime:

Metric	Type	Unit	Description
`realtime.sessions`	Counter	—	Sessions started
`realtime.session_duration.ms`	Histogram	ms	Session duration
`realtime.audio.input.bytes`	Counter	bytes	Audio bytes sent to LLM
`realtime.audio.output.bytes`	Counter	bytes	Audio bytes received from LLM
`realtime.audio.input.duration.ms`	Counter	ms	Audio duration sent
`realtime.audio.output.duration.ms`	Counter	ms	Audio duration received
`realtime.responses`	Counter	—	Complete responses received
`realtime.transcriptions.user`	Counter	—	User speech transcriptions
`realtime.transcriptions.agent`	Counter	—	Agent speech transcriptions
`realtime.errors`	Counter	—	Realtime errors

VLM / Vision Metrics

Metric	Type	Unit	Description
`vlm.inference.latency.ms`	Histogram	ms	VLM inference latency
`vlm.inferences`	Counter	—	Inference requests
`vlm.tokens.input`	Counter	—	Input tokens (text + image)
`vlm.tokens.output`	Counter	—	Output tokens
`vlm.errors`	Counter	—	VLM errors

Video Processor Metrics

Metric	Type	Unit	Description
`video.frames.processed`	Counter	—	Frames processed
`video.processing.latency.ms`	Histogram	ms	Frame processing latency
`video.detections`	Counter	—	Objects/items detected

AgentMetrics

For in-process metrics without external infrastructure, access aggregated metrics directly from the agent:

# After running your agent
metrics = agent.metrics

# STT
print(f"Average STT latency: {metrics.stt_latency_ms__avg.value()} ms")
print(f"Total audio processed: {metrics.stt_audio_duration_ms__total.value()} ms")

# TTS
print(f"Average TTS latency: {metrics.tts_latency_ms__avg.value()} ms")
print(f"Characters synthesized: {metrics.tts_characters__total.value()}")

# LLM
print(f"Average LLM latency: {metrics.llm_latency_ms__avg.value()} ms")
print(f"Input tokens: {metrics.llm_input_tokens__total.value()}")
print(f"Output tokens: {metrics.llm_output_tokens__total.value()}")
print(f"Tool calls: {metrics.llm_tool_calls__total.value()}")

Available AgentMetrics

Metric	Type	Description
`stt_latency_ms__avg`	Average	Average STT processing latency
`stt_audio_duration_ms__total`	Counter	Total audio duration processed
`tts_latency_ms__avg`	Average	Average TTS synthesis latency
`tts_audio_duration_ms__total`	Counter	Total synthesized audio duration
`tts_characters__total`	Counter	Total characters synthesized
`llm_latency_ms__avg`	Average	Average LLM response latency
`llm_time_to_first_token_ms__avg`	Average	Average time to first token
`llm_input_tokens__total`	Counter	Total input tokens
`llm_output_tokens__total`	Counter	Total output tokens
`llm_tool_calls__total`	Counter	Total tool calls
`llm_tool_latency_ms__avg`	Average	Average tool execution latency
`turn_duration_ms__avg`	Average	Average turn duration
`turn_trailing_silence_ms__avg`	Average	Average trailing silence
`realtime_audio_input_bytes__total`	Counter	Total audio bytes sent
`realtime_audio_output_bytes__total`	Counter	Total audio bytes received
`realtime_audio_input_duration_ms__total`	Counter	Total input audio duration
`realtime_audio_output_duration_ms__total`	Counter	Total output audio duration
`realtime_user_transcriptions__total`	Counter	Total user transcriptions
`realtime_agent_transcriptions__total`	Counter	Total agent transcriptions
`vlm_inference_latency_ms__avg`	Average	Average VLM inference latency
`vlm_inferences__total`	Counter	Total VLM inferences
`vlm_input_tokens__total`	Counter	Total VLM input tokens
`vlm_output_tokens__total`	Counter	Total VLM output tokens
`video_frames_processed__total`	Counter	Total frames processed
`video_processing_latency_ms__avg`	Average	Average frame processing latency

Prometheus Setup

Export metrics to Prometheus for monitoring dashboards and alerting. Step 1 — Install the exporter

uv add opentelemetry-exporter-prometheus prometheus-client

Step 2 — Configure OpenTelemetry

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

# Start HTTP server for Prometheus scraping
start_http_server(port=9464)

# Configure OpenTelemetry
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

Step 3 — Create and run your agent

from vision_agents.core import Agent, AgentLauncher, Runner

agent = Agent(...)
# MetricsCollector is automatically attached

# Run with CLI
Runner(AgentLauncher(create_agent=..., join_call=...)).cli()

View metrics at http://localhost:9464/metrics.

Tracing with Jaeger

Trace requests across components for debugging latency issues. Step 1 — Install the exporter

uv add opentelemetry-sdk opentelemetry-exporter-otlp

Step 2 — Configure tracing

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "my-agent"})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

Step 3 — Run Jaeger

docker run --rm -it \
         -e COLLECTOR_OTLP_ENABLED=true \
         -p 16686:16686 -p 4317:4317 -p 4318:4318 \
         jaegertracing/all-in-one:1.51

View traces at http://localhost:16686.

Complete Example

"""Prometheus metrics example with Vision Agents."""

# Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# Now import agents
from vision_agents.core import Agent, User, AgentLauncher, Runner
from vision_agents.plugins import deepgram, getstream, gemini, elevenlabs


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Metrics Agent", id="agent"),
        instructions="You're a helpful voice assistant.",
        llm=gemini.LLM("gemini-2.5-flash-lite"),
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    # MetricsCollector is automatically attached to the agent
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Hello! Metrics are being collected.")
        await agent.finish()

    # Print summary after call
    m = agent.metrics
    print(f"LLM latency: {m.llm_latency_ms__avg.value():.1f} ms")
    print(f"Tokens: {m.llm_input_tokens__total.value()} in / {m.llm_output_tokens__total.value()} out")


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

Run with:

uv run agent.py --call-type default --call-id test

Metrics available at http://localhost:9464/metrics.

Best Practices

Configure OpenTelemetry — Set up providers to enable metric collection. If no providers are configured, metrics are no-ops. MetricsCollector is automatic — Each Agent automatically creates a MetricsCollector internally. If no OpenTelemetry provider is configured, metrics are no-ops with no performance impact. Use AgentMetrics for simple logging — Access agent.metrics directly for in-process metrics without external infrastructure. Add resource attributes — Include service name and environment in your metrics:

from opentelemetry.sdk.resources import Resource

resource = Resource.create({
    "service.name": "my-agent",
    "service.version": "1.0.0",
    "deployment.environment": "production",
})
provider = MeterProvider(resource=resource, metric_readers=[reader])

Next Steps

Production Deployment - Docker, Kubernetes, health checks
Running Agents - Console mode and HTTP server for session management

Getting Started

AI Technologies

Core Architecture

Reference

Quick Start

MetricsCollector

Metric Attributes

Metrics Reference

STT Metrics

TTS Metrics

LLM Metrics

Turn Detection Metrics

Realtime LLM Metrics

VLM / Vision Metrics

Video Processor Metrics

AgentMetrics

Available AgentMetrics

Prometheus Setup

Tracing with Jaeger

Complete Example

Best Practices

Next Steps

Getting Started

AI Technologies

Core Architecture

Reference

​Quick Start

​MetricsCollector

​Metric Attributes

​Metrics Reference

​STT Metrics

​TTS Metrics

​LLM Metrics

​Turn Detection Metrics

​Realtime LLM Metrics

​VLM / Vision Metrics

​Video Processor Metrics

​AgentMetrics

​Available AgentMetrics

​Prometheus Setup

​Tracing with Jaeger

​Complete Example

​Best Practices

​Next Steps

Quick Start

MetricsCollector

Metric Attributes

Metrics Reference

STT Metrics

TTS Metrics

LLM Metrics

Turn Detection Metrics

Realtime LLM Metrics

VLM / Vision Metrics

Video Processor Metrics

AgentMetrics

Available AgentMetrics

Prometheus Setup

Tracing with Jaeger

Complete Example

Best Practices

Next Steps