Skip to main content
Vision Agents provides built-in observability through OpenTelemetry. Collect metrics and traces across all components to monitor performance, latency, and errors in your agents.

Quick Start

To enable metrics collection, configure OpenTelemetry:
# 1. Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# 2. Now import and create your agent
from vision_agents.core import Agent

agent = Agent(llm=..., stt=..., tts=...)
Metrics are now available at http://localhost:9464/metrics.

MetricsCollector

The MetricsCollector class subscribes to events from all agent components and records OpenTelemetry metrics automatically. Each Agent automatically creates a MetricsCollector internally, so metrics collection is enabled by default. If no OpenTelemetry providers are configured, metrics are no-ops and have no performance impact. The collector listens to events from:
  • LLM — Response latency, token usage, tool calls
  • STT — Transcription latency, audio duration
  • TTS — Synthesis latency, audio duration, characters
  • Turn Detection — Turn duration, trailing silence
  • Realtime LLM — Session metrics, audio I/O, transcriptions
  • VLM — Inference latency, token usage
  • Video Processors — Frame processing, detections

Metric Attributes

All metrics include contextual attributes:
AttributeDescription
providerThe plugin name (e.g., openai, deepgram)
modelModel identifier when available
error_typeException class name for error metrics
error_codeError code when available

Metrics Reference

All metrics use the vision_agents.core meter namespace.

STT Metrics

MetricTypeUnitDescription
stt.latency.msHistogrammsProcessing latency for speech-to-text
stt.audio_duration.msHistogrammsDuration of audio processed
stt.errorsCounterTotal STT errors

TTS Metrics

MetricTypeUnitDescription
tts.latency.msHistogrammsSynthesis latency
tts.audio_duration.msHistogrammsDuration of synthesized audio
tts.charactersCounterCharacters synthesized
tts.errorsCounterTotal TTS errors

LLM Metrics

MetricTypeUnitDescription
llm.latency.msHistogrammsResponse latency (request to complete)
llm.time_to_first_token.msHistogrammsTime to first token (streaming)
llm.tokens.inputCounterInput/prompt tokens consumed
llm.tokens.outputCounterOutput/completion tokens generated
llm.tool_callsCounterTool/function calls executed
llm.tool_latency.msHistogrammsTool execution latency
llm.errorsCounterTotal LLM errors

Turn Detection Metrics

MetricTypeUnitDescription
turn.duration.msHistogrammsDuration of detected speech turns
turn.trailing_silence.msHistogrammsSilence duration before turn end

Realtime LLM Metrics

For speech-to-speech models like OpenAI Realtime:
MetricTypeUnitDescription
realtime.sessionsCounterSessions started
realtime.session_duration.msHistogrammsSession duration
realtime.audio.input.bytesCounterbytesAudio bytes sent to LLM
realtime.audio.output.bytesCounterbytesAudio bytes received from LLM
realtime.audio.input.duration.msCountermsAudio duration sent
realtime.audio.output.duration.msCountermsAudio duration received
realtime.responsesCounterComplete responses received
realtime.transcriptions.userCounterUser speech transcriptions
realtime.transcriptions.agentCounterAgent speech transcriptions
realtime.errorsCounterRealtime errors

VLM / Vision Metrics

MetricTypeUnitDescription
vlm.inference.latency.msHistogrammsVLM inference latency
vlm.inferencesCounterInference requests
vlm.tokens.inputCounterInput tokens (text + image)
vlm.tokens.outputCounterOutput tokens
vlm.errorsCounterVLM errors

Video Processor Metrics

MetricTypeUnitDescription
video.frames.processedCounterFrames processed
video.processing.latency.msHistogrammsFrame processing latency
video.detectionsCounterObjects/items detected

AgentMetrics

For in-process metrics without external infrastructure, access aggregated metrics directly from the agent:
# After running your agent
metrics = agent.metrics

# STT
print(f"Average STT latency: {metrics.stt_latency_ms__avg.value()} ms")
print(f"Total audio processed: {metrics.stt_audio_duration_ms__total.value()} ms")

# TTS
print(f"Average TTS latency: {metrics.tts_latency_ms__avg.value()} ms")
print(f"Characters synthesized: {metrics.tts_characters__total.value()}")

# LLM
print(f"Average LLM latency: {metrics.llm_latency_ms__avg.value()} ms")
print(f"Input tokens: {metrics.llm_input_tokens__total.value()}")
print(f"Output tokens: {metrics.llm_output_tokens__total.value()}")
print(f"Tool calls: {metrics.llm_tool_calls__total.value()}")

Available AgentMetrics

MetricTypeDescription
stt_latency_ms__avgAverageAverage STT processing latency
stt_audio_duration_ms__totalCounterTotal audio duration processed
tts_latency_ms__avgAverageAverage TTS synthesis latency
tts_audio_duration_ms__totalCounterTotal synthesized audio duration
tts_characters__totalCounterTotal characters synthesized
llm_latency_ms__avgAverageAverage LLM response latency
llm_time_to_first_token_ms__avgAverageAverage time to first token
llm_input_tokens__totalCounterTotal input tokens
llm_output_tokens__totalCounterTotal output tokens
llm_tool_calls__totalCounterTotal tool calls
llm_tool_latency_ms__avgAverageAverage tool execution latency
turn_duration_ms__avgAverageAverage turn duration
turn_trailing_silence_ms__avgAverageAverage trailing silence
realtime_audio_input_bytes__totalCounterTotal audio bytes sent
realtime_audio_output_bytes__totalCounterTotal audio bytes received
realtime_audio_input_duration_ms__totalCounterTotal input audio duration
realtime_audio_output_duration_ms__totalCounterTotal output audio duration
realtime_user_transcriptions__totalCounterTotal user transcriptions
realtime_agent_transcriptions__totalCounterTotal agent transcriptions
vlm_inference_latency_ms__avgAverageAverage VLM inference latency
vlm_inferences__totalCounterTotal VLM inferences
vlm_input_tokens__totalCounterTotal VLM input tokens
vlm_output_tokens__totalCounterTotal VLM output tokens
video_frames_processed__totalCounterTotal frames processed
video_processing_latency_ms__avgAverageAverage frame processing latency

Prometheus Setup

Export metrics to Prometheus for monitoring dashboards and alerting. Step 1 — Install the exporter
uv add opentelemetry-exporter-prometheus prometheus-client
Step 2 — Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

# Start HTTP server for Prometheus scraping
start_http_server(port=9464)

# Configure OpenTelemetry
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
Step 3 — Create and run your agent
from vision_agents.core import Agent, AgentLauncher, Runner

agent = Agent(...)
# MetricsCollector is automatically attached

# Run with CLI
Runner(AgentLauncher(create_agent=..., join_call=...)).cli()
View metrics at http://localhost:9464/metrics.

Tracing with Jaeger

Trace requests across components for debugging latency issues. Step 1 — Install the exporter
uv add opentelemetry-sdk opentelemetry-exporter-otlp
Step 2 — Configure tracing
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "my-agent"})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
Step 3 — Run Jaeger
docker run --rm -it \
         -e COLLECTOR_OTLP_ENABLED=true \
         -p 16686:16686 -p 4317:4317 -p 4318:4318 \
         jaegertracing/all-in-one:1.51
View traces at http://localhost:16686.

Complete Example

"""Prometheus metrics example with Vision Agents."""

# Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# Now import agents
from vision_agents.core import Agent, User, AgentLauncher, Runner
from vision_agents.plugins import deepgram, getstream, gemini, elevenlabs


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Metrics Agent", id="agent"),
        instructions="You're a helpful voice assistant.",
        llm=gemini.LLM("gemini-2.5-flash-lite"),
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    # MetricsCollector is automatically attached to the agent
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Hello! Metrics are being collected.")
        await agent.finish()

    # Print summary after call
    m = agent.metrics
    print(f"LLM latency: {m.llm_latency_ms__avg.value():.1f} ms")
    print(f"Tokens: {m.llm_input_tokens__total.value()} in / {m.llm_output_tokens__total.value()} out")


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
Run with:
uv run agent.py --call-type default --call-id test
Metrics available at http://localhost:9464/metrics.

Best Practices

Configure OpenTelemetry — Set up providers to enable metric collection. If no providers are configured, metrics are no-ops. MetricsCollector is automatic — Each Agent automatically creates a MetricsCollector internally. If no OpenTelemetry provider is configured, metrics are no-ops with no performance impact. Use AgentMetrics for simple logging — Access agent.metrics directly for in-process metrics without external infrastructure. Add resource attributes — Include service name and environment in your metrics:
from opentelemetry.sdk.resources import Resource

resource = Resource.create({
    "service.name": "my-agent",
    "service.version": "1.0.0",
    "deployment.environment": "production",
})
provider = MeterProvider(resource=resource, metric_readers=[reader])

Next Steps