Events Reference - Vision Agents

Complete reference of all events available in Vision Agents. Events are emitted by components during agent execution and can be subscribed to using the @agent.events.subscribe decorator.

For usage patterns and examples, see the Event System Guide.

Base Event Structure

All events inherit from BaseEvent and share these common fields:

Field	Type	Description
`type`	`str`	Event type identifier (e.g., `"plugin.stt_transcript"`)
`event_id`	`str`	Unique UUID for this event instance
`timestamp`	`datetime`	When the event was created (UTC)
`session_id`	`str	None`	Current session identifier
`participant`	`Participant	None`	Participant metadata from the call

Plugin events extend PluginBaseEvent which adds:

Field	Type	Description
`plugin_name`	`str	None`	Name of the plugin that emitted the event
`plugin_version`	`str	None`	Version of the plugin

Call Session Events

Events for participant activity on calls. These come from the Stream Video SDK. Import: from vision_agents.core.events import ...

CallSessionParticipantJoinedEvent

Emitted when a participant joins the call.

from vision_agents.core.events import CallSessionParticipantJoinedEvent

@agent.events.subscribe
async def on_join(event: CallSessionParticipantJoinedEvent):
    user = event.participant.user
    print(f"{user.name} joined (id: {user.id})")

Field	Type	Description
`call_cid`	`str`	Call channel ID
`session_id`	`str`	Session identifier
`participant`	`CallParticipantResponse`	Joined participant info

CallSessionParticipantLeftEvent

Emitted when a participant leaves the call.

from vision_agents.core.events import CallSessionParticipantLeftEvent

@agent.events.subscribe
async def on_leave(event: CallSessionParticipantLeftEvent):
    print(f"{event.participant.user.name} left")
    print(f"Duration: {event.duration_seconds}s")

Field	Type	Description
`call_cid`	`str`	Call channel ID
`session_id`	`str`	Session identifier
`participant`	`CallParticipantResponse`	Left participant info
`duration_seconds`	`int`	How long participant was in call
`reason`	`str	None`	Why they left

Other Call Events

Event	Description
`CallCreatedEvent`	Call was created
`CallEndedEvent`	Call ended
`CallSessionStartedEvent`	Session started
`CallSessionEndedEvent`	Session ended
`CallSessionParticipantCountsUpdatedEvent`	Participant count changed
`CallUpdatedEvent`	Call settings updated
`CallMemberAddedEvent`	Member added to call
`CallMemberRemovedEvent`	Member removed from call
`CallRecordingStartedEvent`	Recording started
`CallRecordingStoppedEvent`	Recording stopped
`CallTranscriptionStartedEvent`	Transcription started
`CallTranscriptionStoppedEvent`	Transcription stopped
`ClosedCaptionEvent`	Closed caption received

Speech-to-Text (STT) Events

Events from speech recognition. Import: from vision_agents.core.stt.events import ...

STTTranscriptEvent

Emitted when a complete transcript is available.

from vision_agents.core.stt.events import STTTranscriptEvent

@agent.events.subscribe
async def on_transcript(event: STTTranscriptEvent):
    print(f"Text: {event.text}")
    print(f"Confidence: {event.confidence}")
    print(f"Language: {event.language}")

Field	Type	Description
`text`	`str`	Transcribed text (required, non-empty)
`confidence`	`float	None`	Recognition confidence (0.0-1.0)
`language`	`str	None`	Detected language code
`processing_time_ms`	`float	None`	Time to process audio
`audio_duration_ms`	`float	None`	Duration of audio processed
`model_name`	`str	None`	Model used for recognition

STTPartialTranscriptEvent

Emitted during speech for interim results.

Field	Type	Description
`text`	`str`	Partial transcribed text
`confidence`	`float	None`	Recognition confidence
`language`	`str	None`	Detected language

STTErrorEvent

Emitted when STT encounters an error.

from vision_agents.core.stt.events import STTErrorEvent

@agent.events.subscribe
async def on_stt_error(event: STTErrorEvent):
    print(f"Error: {event.error_message}")
    print(f"Recoverable: {event.is_recoverable}")

Field	Type	Description
`error`	`Exception	None`	The exception that occurred
`error_code`	`str	None`	Error code identifier
`context`	`str	None`	Additional context
`retry_count`	`int`	Number of retry attempts
`is_recoverable`	`bool`	Whether the error is recoverable
`error_message`	`str`	Property: human-readable error message

STTConnectionEvent

Emitted when STT connection state changes.

Field	Type	Description
`connection_state`	`ConnectionState`	New state (`CONNECTED`, `DISCONNECTED`, `RECONNECTING`, `ERROR`)
`provider`	`str	None`	STT provider name
`details`	`dict	None`	Additional connection details
`reconnect_attempts`	`int`	Number of reconnection attempts

Text-to-Speech (TTS) Events

Events from speech synthesis. Import: from vision_agents.core.tts.events import ...

TTSAudioEvent

Emitted when TTS audio data is available.

from vision_agents.core.tts.events import TTSAudioEvent

@agent.events.subscribe
async def on_audio(event: TTSAudioEvent):
    print(f"Chunk {event.chunk_index}, final: {event.is_final_chunk}")

Field	Type	Description
`data`	`PcmData	None`	Audio data
`chunk_index`	`int`	Index of this chunk
`is_final_chunk`	`bool`	Whether this is the last chunk
`text_source`	`str	None`	Original text being synthesized
`synthesis_id`	`str	None`	Unique ID for this synthesis
`epoch`	`int`	Interruption epoch counter. Increments each time `tts.interrupt()` is called. Compare against `tts.epoch` to detect stale audio events emitted before an interruption.

TTSSynthesisStartEvent

Emitted when TTS synthesis begins.

Field	Type	Description
`text`	`str	None`	Text being synthesized
`synthesis_id`	`str`	Unique ID for this synthesis
`model_name`	`str	None`	TTS model name
`voice_id`	`str	None`	Voice identifier
`estimated_duration_ms`	`float	None`	Estimated audio duration

TTSSynthesisCompleteEvent

Emitted when TTS synthesis finishes.

from vision_agents.core.tts.events import TTSSynthesisCompleteEvent

@agent.events.subscribe
async def on_complete(event: TTSSynthesisCompleteEvent):
    print(f"Synthesis took {event.synthesis_time_ms}ms")
    print(f"Audio duration: {event.audio_duration_ms}ms")

Field	Type	Description
`synthesis_id`	`str	None`	Unique ID for this synthesis
`text`	`str	None`	Text that was synthesized
`total_audio_bytes`	`int`	Total bytes of audio
`synthesis_time_ms`	`float`	Processing time
`audio_duration_ms`	`float	None`	Resulting audio duration
`chunk_count`	`int`	Number of chunks produced
`real_time_factor`	`float	None`	Synthesis speed vs real-time

TTSErrorEvent

Emitted when TTS encounters an error.

Field	Type	Description
`error`	`Exception	None`	The exception that occurred
`error_code`	`str	None`	Error code identifier
`context`	`str	None`	Additional context
`text_source`	`str	None`	Text being synthesized
`synthesis_id`	`str	None`	Synthesis identifier
`is_recoverable`	`bool`	Whether the error is recoverable

TTSConnectionEvent

Emitted when TTS connection state changes.

Field	Type	Description
`connection_state`	`ConnectionState`	New connection state
`provider`	`str	None`	TTS provider name
`details`	`dict	None`	Additional details

LLM Events

Events from language model interactions. Import: from vision_agents.core.llm.events import ...

LLMResponseCompletedEvent

Emitted when the LLM finishes a response.

from vision_agents.core.llm.events import LLMResponseCompletedEvent

@agent.events.subscribe
async def on_response(event: LLMResponseCompletedEvent):
    print(f"Response: {event.text}")
    print(f"Model: {event.model}")
    print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")
    print(f"Latency: {event.latency_ms}ms")

Field	Type	Description
`text`	`str`	Complete response text
`original`	`Any`	Raw response from provider
`item_id`	`str	None`	Response item identifier
`latency_ms`	`float	None`	Total request to response time
`time_to_first_token_ms`	`float	None`	Time to first token (streaming)
`input_tokens`	`int	None`	Input/prompt tokens used
`output_tokens`	`int	None`	Output tokens generated
`total_tokens`	`int	None`	Total tokens used
`model`	`str	None`	Model identifier

LLMResponseChunkEvent

Emitted for each chunk during streaming responses.

Field	Type	Description
`delta`	`str	None`	Text delta for this chunk
`content_index`	`int	None`	Index of content part
`item_id`	`str	None`	Response item identifier
`output_index`	`int	None`	Output index
`sequence_number`	`int	None`	Sequence number
`is_first_chunk`	`bool`	Whether this is the first chunk
`time_to_first_token_ms`	`float	None`	Time to this first chunk

LLMRequestStartedEvent

Emitted when an LLM request begins.

Field	Type	Description
`request_id`	`str`	Unique request identifier
`model`	`str	None`	Model being used
`streaming`	`bool`	Whether streaming is enabled

LLMErrorEvent

Emitted when a non-realtime LLM error occurs.

Field	Type	Description
`error`	`Exception	None`	The exception
`error_code`	`str	None`	Error code
`context`	`str	None`	Additional context
`request_id`	`str	None`	Request identifier
`is_recoverable`	`bool`	Whether error is recoverable

Realtime LLM Events

Events specific to realtime LLM connections (like OpenAI Realtime API). Import: from vision_agents.core.llm.events import ...

RealtimeConnectedEvent

Emitted when realtime connection is established.

Field	Type	Description
`provider`	`str	None`	Provider name
`session_id`	`str	None`	Session identifier
`session_config`	`dict	None`	Session configuration
`capabilities`	`list[str]	None`	Available capabilities

RealtimeDisconnectedEvent

Emitted when realtime connection closes.

Field	Type	Description
`provider`	`str	None`	Provider name
`session_id`	`str	None`	Session identifier
`reason`	`str	None`	Disconnection reason
`was_clean`	`bool`	Whether disconnect was clean

RealtimeUserSpeechTranscriptionEvent

Emitted when user speech is transcribed by the realtime API.

from vision_agents.core.llm.events import RealtimeUserSpeechTranscriptionEvent

@agent.events.subscribe
async def on_user_speech(event: RealtimeUserSpeechTranscriptionEvent):
    print(f"User said: {event.text}")

Field	Type	Description
`text`	`str`	Transcribed user speech
`original`	`Any`	Raw event from provider

RealtimeAgentSpeechTranscriptionEvent

Emitted when agent speech is transcribed by the realtime API.

from vision_agents.core.llm.events import RealtimeAgentSpeechTranscriptionEvent

@agent.events.subscribe
async def on_agent_speech(event: RealtimeAgentSpeechTranscriptionEvent):
    print(f"Agent said: {event.text}")

Field	Type	Description
`text`	`str`	Transcribed agent speech
`original`	`Any`	Raw event from provider

RealtimeAudioInputEvent

Emitted when audio is sent to the realtime session.

Field	Type	Description
`data`	`PcmData	None`	Audio data sent

RealtimeAudioOutputEvent

Emitted when audio is received from the realtime session.

Field	Type	Description
`data`	`PcmData	None`	Audio data received
`response_id`	`str	None`	Response identifier
`epoch`	`int`	Interruption epoch counter. Increments on interruption so stale audio output events from a previous response can be identified and dropped.

RealtimeResponseEvent

Emitted when the realtime session provides a response.

Field	Type	Description
`text`	`str	None`	Response text
`original`	`str	None`	Raw response
`response_id`	`str`	Response identifier
`is_complete`	`bool`	Whether response is complete
`conversation_item_id`	`str	None`	Conversation item ID

RealtimeConversationItemEvent

Emitted for conversation item updates.

Field	Type	Description
`item_id`	`str	None`	Item identifier
`item_type`	`str	None`	Type: `"message"`, `"function_call"`, `"function_call_output"`
`status`	`str	None`	Status: `"completed"`, `"in_progress"`, `"incomplete"`
`role`	`str	None`	Role: `"user"`, `"assistant"`, `"system"`
`content`	`list[dict]	None`	Item content

RealtimeErrorEvent

Emitted when a realtime error occurs.

Field	Type	Description
`error`	`Exception	None`	The exception
`error_code`	`str	None`	Error code
`context`	`str	None`	Additional context
`is_recoverable`	`bool`	Whether error is recoverable

Tool Events

Events for function calling / tool use. Import: from vision_agents.core.llm.events import ...

ToolStartEvent

Emitted when tool execution begins.

from vision_agents.core.llm.events import ToolStartEvent

@agent.events.subscribe
async def on_tool_start(event: ToolStartEvent):
    print(f"Calling {event.tool_name}")
    print(f"Args: {event.arguments}")

Field	Type	Description
`tool_name`	`str`	Name of the tool being called
`arguments`	`dict	None`	Arguments passed to the tool
`tool_call_id`	`str	None`	Unique call identifier

ToolEndEvent

Emitted when tool execution completes.

from vision_agents.core.llm.events import ToolEndEvent

@agent.events.subscribe
async def on_tool_end(event: ToolEndEvent):
    if event.success:
        print(f"{event.tool_name} returned: {event.result}")
        print(f"Took {event.execution_time_ms}ms")
    else:
        print(f"{event.tool_name} failed: {event.error}")

Field	Type	Description
`tool_name`	`str`	Name of the tool
`success`	`bool`	Whether execution succeeded
`result`	`Any`	Return value (if success)
`error`	`str	None`	Error message (if failed)
`tool_call_id`	`str	None`	Unique call identifier
`execution_time_ms`	`float	None`	Execution duration

VLM Events

Events for vision/multimodal language models. Import: from vision_agents.core.llm.events import ...

VLMInferenceStartEvent

Emitted when a VLM (Vision Language Model) inference starts. Event Type: plugin.vlm_inference_start

Field	Type	Description
`inference_id`	`str`	Unique identifier for this inference
`model`	`str	None`	Model identifier
`frames_count`	`int`	Number of frames to process

VLMInferenceCompletedEvent

Emitted when a VLM inference completes. Contains timing metrics, token usage, and detection counts. Event Type: plugin.vlm_inference_completed

from vision_agents.core.llm.events import VLMInferenceCompletedEvent

@agent.events.subscribe
async def on_vlm_complete(event: VLMInferenceCompletedEvent):
    print(f"VLM response: {event.text}")
    print(f"Processed {event.frames_processed} frames")
    print(f"Detected {event.detections} objects")

Field	Type	Description
`inference_id`	`str	None`	Unique identifier for this inference
`model`	`str	None`	Model identifier
`text`	`str`	Generated text response
`latency_ms`	`float	None`	Total time from request to complete response
`input_tokens`	`int	None`	Number of input tokens (text + image tokens)
`output_tokens`	`int	None`	Number of output tokens generated
`frames_processed`	`int`	Number of video frames processed
`detections`	`int`	Number of objects/items detected

This event is used by MetricsCollector to record VLM metrics. See Telemetry for details.

VLMErrorEvent

Emitted when a VLM error occurs. Event Type: plugin.vlm_error

Field	Type	Description
`error`	`Exception	None`	The exception that occurred
`error_code`	`str	None`	Error code if available
`context`	`str	None`	Additional context about the error
`inference_id`	`str	None`	ID of the failed inference
`is_recoverable`	`bool`	Whether the error is recoverable

Video Processor Events

Events from video processing plugins (roboflow, ultralytics, etc.). Import: from vision_agents.core.events import VideoProcessorDetectionEvent

VideoProcessorDetectionEvent

Emitted when a video processor detects objects in a frame.

from vision_agents.core.events import VideoProcessorDetectionEvent

@agent.events.subscribe
async def on_detection(event: VideoProcessorDetectionEvent):
    print(f"Detected {event.detection_count} objects")
    print(f"Inference took {event.inference_time_ms}ms")

Field	Type	Description
`model_id`	`str	None`	Identifier of the model used
`inference_time_ms`	`float	None`	Time taken for inference
`detection_count`	`int`	Number of objects detected

This event is used by MetricsCollector to record video processing metrics. See Telemetry for details.

OpenAI Plugin Events

Events specific to the OpenAI plugin. Import: from vision_agents.plugins.openai.events import ...

OpenAIStreamEvent

Emitted when OpenAI provides a streaming chunk.

Field	Type	Description
`chunk`	`Any`	Raw streaming chunk from OpenAI

VAD Events

Voice Activity Detection events. Import: from vision_agents.core.vad.events import ...

VADSpeechStartEvent

Emitted when VAD detects the start of speech.

Field	Type	Description
`timestamp`	`datetime`	When speech started

VADSpeechEndEvent

Emitted when VAD detects the end of speech.

Field	Type	Description
`timestamp`	`datetime`	When speech ended
`duration_ms`	`float	None`	Duration of speech segment

VADErrorEvent

Emitted when VAD encounters an error.

Field	Type	Description
`error`	`Exception	None`	The exception that occurred
`error_code`	`str	None`	Error code if available
`context`	`str	None`	Additional context

Turn Detection Events

Events for detecting when speakers start and stop talking. Import: from vision_agents.core.turn_detection.events import ...

TurnStartedEvent

Emitted when a speaker starts their turn. Event Type: plugin.turn_detection.turn_started

from vision_agents.core.turn_detection.events import TurnStartedEvent

@agent.events.subscribe
async def on_turn_start(event: TurnStartedEvent):
    print(f"Turn started (confidence: {event.confidence})")

Field	Type	Description
`participant`	`Participant	None`	Who started speaking
`participant_id`	`str	None`	ID of the participant speaking
`confidence`	`float	None`	Detection confidence (0.0-1.0)
`custom`	`dict	None`	Additional metadata

TurnEndedEvent

Emitted when a speaker completes their turn. Event Type: plugin.turn_detection.turn_ended

from vision_agents.core.turn_detection.events import TurnEndedEvent

@agent.events.subscribe
async def on_turn_end(event: TurnEndedEvent):
    print(f"Turn ended after {event.duration_ms}ms")
    print(f"Silence: {event.trailing_silence_ms}ms")

Field	Type	Description
`participant`	`Participant	None`	Who stopped speaking
`participant_id`	`str	None`	ID of the participant
`confidence`	`float	None`	Detection confidence
`duration_ms`	`float	None`	Duration of the turn in milliseconds
`trailing_silence_ms`	`float	None`	Silence duration before turn end
`custom`	`dict	None`	Additional metadata
`eager_end_of_turn`	`bool`	Early end detection flag

This event is used by MetricsCollector to record turn detection metrics. See Telemetry for details.

xAI Plugin Events

Events specific to the xAI plugin. Import: from vision_agents.plugins.xai.events import ...

XAIChunkEvent

Emitted for xAI streaming response chunks.

Field	Type	Description
`chunk`	`Any`	Raw streaming chunk from xAI

Qwen Plugin Events

Events specific to the Qwen plugin. Import: from vision_agents.plugins.qwen.events import ...

QwenLLMErrorEvent

Emitted when Qwen LLM encounters an error.

Field	Type	Description
`error`	`Exception	None`	The exception that occurred
`error_code`	`str	None`	Error code if available
`context`	`str	None`	Additional context

ConnectionState Enum

Used in connection events to indicate state. Import: from vision_agents.core.events import ConnectionState

Value	Description
`DISCONNECTED`	Not connected
`CONNECTING`	Connection in progress
`CONNECTED`	Successfully connected
`RECONNECTING`	Attempting to reconnect
`ERROR`	Connection error

Subscribing to Events

All events can be subscribed to using the @agent.events.subscribe decorator:

@agent.events.subscribe
async def my_handler(event: EventType):
    # Handle event
    pass

Subscribe to multiple event types using union types:

@agent.events.subscribe
async def my_handler(event: STTTranscriptEvent | STTPartialTranscriptEvent):
    print(f"Transcript: {event.text}")

Event handlers must be async functions. Non-async handlers will raise a RuntimeError.

Getting Started

AI Technologies

Core Architecture

Reference

​Base Event Structure

​Call Session Events

​CallSessionParticipantJoinedEvent

​CallSessionParticipantLeftEvent

​Other Call Events

​Speech-to-Text (STT) Events

​STTTranscriptEvent

​STTPartialTranscriptEvent

​STTErrorEvent

​STTConnectionEvent

​Text-to-Speech (TTS) Events

​TTSAudioEvent

​TTSSynthesisStartEvent

​TTSSynthesisCompleteEvent

​TTSErrorEvent

​TTSConnectionEvent

​LLM Events

​LLMResponseCompletedEvent

​LLMResponseChunkEvent

​LLMRequestStartedEvent

​LLMErrorEvent

​Realtime LLM Events

​RealtimeConnectedEvent

​RealtimeDisconnectedEvent

​RealtimeUserSpeechTranscriptionEvent

​RealtimeAgentSpeechTranscriptionEvent

​RealtimeAudioInputEvent

​RealtimeAudioOutputEvent

​RealtimeResponseEvent

​RealtimeConversationItemEvent

​RealtimeErrorEvent

​Tool Events

​ToolStartEvent

​ToolEndEvent

​VLM Events

​VLMInferenceStartEvent

​VLMInferenceCompletedEvent

​VLMErrorEvent

​Video Processor Events

​VideoProcessorDetectionEvent

​OpenAI Plugin Events

​OpenAIStreamEvent

​VAD Events

​VADSpeechStartEvent

​VADSpeechEndEvent

​VADErrorEvent

​Turn Detection Events

​TurnStartedEvent

​TurnEndedEvent

​xAI Plugin Events

​XAIChunkEvent

​Qwen Plugin Events

​QwenLLMErrorEvent

​ConnectionState Enum

​Subscribing to Events

Base Event Structure

Call Session Events

CallSessionParticipantJoinedEvent

CallSessionParticipantLeftEvent

Other Call Events

Speech-to-Text (STT) Events

STTTranscriptEvent

STTPartialTranscriptEvent

STTErrorEvent

STTConnectionEvent

Text-to-Speech (TTS) Events

TTSAudioEvent

TTSSynthesisStartEvent

TTSSynthesisCompleteEvent

TTSErrorEvent

TTSConnectionEvent

LLM Events

LLMResponseCompletedEvent

LLMResponseChunkEvent

LLMRequestStartedEvent

LLMErrorEvent

Realtime LLM Events

RealtimeConnectedEvent

RealtimeDisconnectedEvent

RealtimeUserSpeechTranscriptionEvent

RealtimeAgentSpeechTranscriptionEvent

RealtimeAudioInputEvent

RealtimeAudioOutputEvent

RealtimeResponseEvent

RealtimeConversationItemEvent

RealtimeErrorEvent

Tool Events

ToolStartEvent

ToolEndEvent

VLM Events

VLMInferenceStartEvent

VLMInferenceCompletedEvent

VLMErrorEvent

Video Processor Events

VideoProcessorDetectionEvent

OpenAI Plugin Events

OpenAIStreamEvent

VAD Events

VADSpeechStartEvent

VADSpeechEndEvent

VADErrorEvent

Turn Detection Events

TurnStartedEvent

TurnEndedEvent

xAI Plugin Events

XAIChunkEvent

Qwen Plugin Events

QwenLLMErrorEvent

ConnectionState Enum

Subscribing to Events