Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
Complete reference of all events available in Vision Agents. Events are emitted by components during agent execution and can be subscribed to using the @agent.events.subscribe decorator.
Base Event Structure
All events inherit from BaseEvent and share these common fields:
| Field | Type | Description | |
|---|
type | str | Event type identifier (e.g., "plugin.stt_transcript") | |
event_id | str | Unique UUID for this event instance | |
timestamp | datetime | When the event was created (UTC) | |
session_id | `str | None` | Current session identifier |
participant | `Participant | None` | Participant metadata from the call |
Plugin events extend PluginBaseEvent which adds:
| Field | Type | Description | |
|---|
plugin_name | `str | None` | Name of the plugin that emitted the event |
plugin_version | `str | None` | Version of the plugin |
Call Session Events
Events for participant activity on calls. These come from the Stream Video SDK.
Import: from vision_agents.core.events import ...
CallSessionParticipantJoinedEvent
Emitted when a participant joins the call.
from vision_agents.core.events import CallSessionParticipantJoinedEvent
@agent.events.subscribe
async def on_join(event: CallSessionParticipantJoinedEvent):
user = event.participant.user
print(f"{user.name} joined (id: {user.id})")
| Field | Type | Description |
|---|
call_cid | str | Call channel ID |
session_id | str | Session identifier |
participant | CallParticipantResponse | Joined participant info |
CallSessionParticipantLeftEvent
Emitted when a participant leaves the call.
from vision_agents.core.events import CallSessionParticipantLeftEvent
@agent.events.subscribe
async def on_leave(event: CallSessionParticipantLeftEvent):
print(f"{event.participant.user.name} left")
print(f"Duration: {event.duration_seconds}s")
| Field | Type | Description | |
|---|
call_cid | str | Call channel ID | |
session_id | str | Session identifier | |
participant | CallParticipantResponse | Left participant info | |
duration_seconds | int | How long participant was in call | |
reason | `str | None` | Why they left |
Other Call Events
| Event | Description |
|---|
CallCreatedEvent | Call was created |
CallEndedEvent | Call ended |
CallSessionStartedEvent | Session started |
CallSessionEndedEvent | Session ended |
CallSessionParticipantCountsUpdatedEvent | Participant count changed |
CallUpdatedEvent | Call settings updated |
CallMemberAddedEvent | Member added to call |
CallMemberRemovedEvent | Member removed from call |
CallRecordingStartedEvent | Recording started |
CallRecordingStoppedEvent | Recording stopped |
CallTranscriptionStartedEvent | Transcription started |
CallTranscriptionStoppedEvent | Transcription stopped |
ClosedCaptionEvent | Closed caption received |
Speech-to-Text (STT) Events
Events from speech recognition.
Import: from vision_agents.core.stt.events import ...
STTTranscriptEvent
Emitted when a complete transcript is available.
from vision_agents.core.stt.events import STTTranscriptEvent
@agent.events.subscribe
async def on_transcript(event: STTTranscriptEvent):
print(f"Text: {event.text}")
print(f"Confidence: {event.confidence}")
print(f"Language: {event.language}")
| Field | Type | Description | |
|---|
text | str | Transcribed text (required, non-empty) | |
confidence | `float | None` | Recognition confidence (0.0-1.0) |
language | `str | None` | Detected language code |
processing_time_ms | `float | None` | Time to process audio |
audio_duration_ms | `float | None` | Duration of audio processed |
model_name | `str | None` | Model used for recognition |
STTPartialTranscriptEvent
Emitted during speech for interim results.
| Field | Type | Description | |
|---|
text | str | Partial transcribed text | |
confidence | `float | None` | Recognition confidence |
language | `str | None` | Detected language |
STTErrorEvent
Emitted when STT encounters an error.
from vision_agents.core.stt.events import STTErrorEvent
@agent.events.subscribe
async def on_stt_error(event: STTErrorEvent):
print(f"Error: {event.error_message}")
print(f"Recoverable: {event.is_recoverable}")
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code identifier |
context | `str | None` | Additional context |
retry_count | int | Number of retry attempts | |
is_recoverable | bool | Whether the error is recoverable | |
error_message | str | Property: human-readable error message | |
STTConnectionEvent
Emitted when STT connection state changes.
| Field | Type | Description | |
|---|
connection_state | ConnectionState | New state (CONNECTED, DISCONNECTED, RECONNECTING, ERROR) | |
provider | `str | None` | STT provider name |
details | `dict | None` | Additional connection details |
reconnect_attempts | int | Number of reconnection attempts | |
Text-to-Speech (TTS) Events
Events from speech synthesis.
Import: from vision_agents.core.tts.events import ...
TTSAudioEvent
Emitted when TTS audio data is available.
from vision_agents.core.tts.events import TTSAudioEvent
@agent.events.subscribe
async def on_audio(event: TTSAudioEvent):
print(f"Chunk {event.chunk_index}, final: {event.is_final_chunk}")
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data |
chunk_index | int | Index of this chunk | |
is_final_chunk | bool | Whether this is the last chunk | |
text_source | `str | None` | Original text being synthesized |
synthesis_id | `str | None` | Unique ID for this synthesis |
epoch | int | Interruption epoch counter. Increments each time tts.interrupt() is called. Compare against tts.epoch to detect stale audio events emitted before an interruption. | |
TTSSynthesisStartEvent
Emitted when TTS synthesis begins.
| Field | Type | Description | |
|---|
text | `str | None` | Text being synthesized |
synthesis_id | str | Unique ID for this synthesis | |
model_name | `str | None` | TTS model name |
voice_id | `str | None` | Voice identifier |
estimated_duration_ms | `float | None` | Estimated audio duration |
TTSSynthesisCompleteEvent
Emitted when TTS synthesis finishes.
from vision_agents.core.tts.events import TTSSynthesisCompleteEvent
@agent.events.subscribe
async def on_complete(event: TTSSynthesisCompleteEvent):
print(f"Synthesis took {event.synthesis_time_ms}ms")
print(f"Audio duration: {event.audio_duration_ms}ms")
| Field | Type | Description | |
|---|
synthesis_id | `str | None` | Unique ID for this synthesis |
text | `str | None` | Text that was synthesized |
total_audio_bytes | int | Total bytes of audio | |
synthesis_time_ms | float | Processing time | |
audio_duration_ms | `float | None` | Resulting audio duration |
chunk_count | int | Number of chunks produced | |
real_time_factor | `float | None` | Synthesis speed vs real-time |
TTSErrorEvent
Emitted when TTS encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code identifier |
context | `str | None` | Additional context |
text_source | `str | None` | Text being synthesized |
synthesis_id | `str | None` | Synthesis identifier |
is_recoverable | bool | Whether the error is recoverable | |
TTSConnectionEvent
Emitted when TTS connection state changes.
| Field | Type | Description | |
|---|
connection_state | ConnectionState | New connection state | |
provider | `str | None` | TTS provider name |
details | `dict | None` | Additional details |
LLM Events
Events from language model interactions.
Import: from vision_agents.core.llm.events import ...
LLMResponseCompletedEvent
Emitted when the LLM finishes a response.
from vision_agents.core.llm.events import LLMResponseCompletedEvent
@agent.events.subscribe
async def on_response(event: LLMResponseCompletedEvent):
print(f"Response: {event.text}")
print(f"Model: {event.model}")
print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")
print(f"Latency: {event.latency_ms}ms")
| Field | Type | Description | |
|---|
text | str | Complete response text | |
original | Any | Raw response from provider | |
item_id | `str | None` | Response item identifier |
latency_ms | `float | None` | Total request to response time |
time_to_first_token_ms | `float | None` | Time to first token (streaming) |
input_tokens | `int | None` | Input/prompt tokens used |
output_tokens | `int | None` | Output tokens generated |
total_tokens | `int | None` | Total tokens used |
model | `str | None` | Model identifier |
LLMResponseChunkEvent
Emitted for each chunk during streaming responses.
| Field | Type | Description | |
|---|
delta | `str | None` | Text delta for this chunk |
content_index | `int | None` | Index of content part |
item_id | `str | None` | Response item identifier |
output_index | `int | None` | Output index |
sequence_number | `int | None` | Sequence number |
is_first_chunk | bool | Whether this is the first chunk | |
time_to_first_token_ms | `float | None` | Time to this first chunk |
LLMRequestStartedEvent
Emitted when an LLM request begins.
| Field | Type | Description | |
|---|
request_id | str | Unique request identifier | |
model | `str | None` | Model being used |
streaming | bool | Whether streaming is enabled | |
LLMErrorEvent
Emitted when a non-realtime LLM error occurs.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception |
error_code | `str | None` | Error code |
context | `str | None` | Additional context |
request_id | `str | None` | Request identifier |
is_recoverable | bool | Whether error is recoverable | |
Realtime LLM Events
Events specific to realtime LLM connections (like OpenAI Realtime API).
Import: from vision_agents.core.llm.events import ...
RealtimeConnectedEvent
Emitted when realtime connection is established.
| Field | Type | Description | |
|---|
provider | `str | None` | Provider name |
session_id | `str | None` | Session identifier |
session_config | `dict | None` | Session configuration |
capabilities | `list[str] | None` | Available capabilities |
RealtimeDisconnectedEvent
Emitted when realtime connection closes.
| Field | Type | Description | |
|---|
provider | `str | None` | Provider name |
session_id | `str | None` | Session identifier |
reason | `str | None` | Disconnection reason |
was_clean | bool | Whether disconnect was clean | |
RealtimeUserSpeechTranscriptionEvent
Emitted when user speech is transcribed by the realtime API.
from vision_agents.core.llm.events import RealtimeUserSpeechTranscriptionEvent
@agent.events.subscribe
async def on_user_speech(event: RealtimeUserSpeechTranscriptionEvent):
print(f"User said: {event.text}")
| Field | Type | Description |
|---|
text | str | Transcribed user speech |
original | Any | Raw event from provider |
RealtimeAgentSpeechTranscriptionEvent
Emitted when agent speech is transcribed by the realtime API.
from vision_agents.core.llm.events import RealtimeAgentSpeechTranscriptionEvent
@agent.events.subscribe
async def on_agent_speech(event: RealtimeAgentSpeechTranscriptionEvent):
print(f"Agent said: {event.text}")
| Field | Type | Description |
|---|
text | str | Transcribed agent speech |
original | Any | Raw event from provider |
Emitted when audio is sent to the realtime session.
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data sent |
RealtimeAudioOutputEvent
Emitted when audio is received from the realtime session.
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data received |
response_id | `str | None` | Response identifier |
epoch | int | Interruption epoch counter. Increments on interruption so stale audio output events from a previous response can be identified and dropped. | |
RealtimeAudioOutputDoneEvent
Emitted when audio output is complete for a response. In realtime mode, this event signals the end of the model’s audio reply. When the user interrupts mid-response, the interrupted flag is True.
| Field | Type | Description | |
|---|
response_id | `str | None` | Response identifier |
interrupted | bool | True if the response was cut short by a user interruption, False if it completed normally | |
RealtimeResponseEvent
Emitted when the realtime session provides a response.
| Field | Type | Description | |
|---|
text | `str | None` | Response text |
original | `str | None` | Raw response |
response_id | str | Response identifier | |
is_complete | bool | Whether response is complete | |
conversation_item_id | `str | None` | Conversation item ID |
RealtimeConversationItemEvent
Emitted for conversation item updates.
| Field | Type | Description | |
|---|
item_id | `str | None` | Item identifier |
item_type | `str | None` | Type: "message", "function_call", "function_call_output" |
status | `str | None` | Status: "completed", "in_progress", "incomplete" |
role | `str | None` | Role: "user", "assistant", "system" |
content | `list[dict] | None` | Item content |
RealtimeErrorEvent
Emitted when a realtime error occurs.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception |
error_code | `str | None` | Error code |
context | `str | None` | Additional context |
is_recoverable | bool | Whether error is recoverable | |
Events for function calling / tool use.
Import: from vision_agents.core.llm.events import ...
Emitted when tool execution begins.
from vision_agents.core.llm.events import ToolStartEvent
@agent.events.subscribe
async def on_tool_start(event: ToolStartEvent):
print(f"Calling {event.tool_name}")
print(f"Args: {event.arguments}")
| Field | Type | Description | |
|---|
tool_name | str | Name of the tool being called | |
arguments | `dict | None` | Arguments passed to the tool |
tool_call_id | `str | None` | Unique call identifier |
Emitted when tool execution completes.
from vision_agents.core.llm.events import ToolEndEvent
@agent.events.subscribe
async def on_tool_end(event: ToolEndEvent):
if event.success:
print(f"{event.tool_name} returned: {event.result}")
print(f"Took {event.execution_time_ms}ms")
else:
print(f"{event.tool_name} failed: {event.error}")
| Field | Type | Description | |
|---|
tool_name | str | Name of the tool | |
success | bool | Whether execution succeeded | |
result | Any | Return value (if success) | |
error | `str | None` | Error message (if failed) |
tool_call_id | `str | None` | Unique call identifier |
execution_time_ms | `float | None` | Execution duration |
VLM Events
Events for vision/multimodal language models.
Import: from vision_agents.core.llm.events import ...
VLMInferenceStartEvent
Emitted when a VLM (Vision Language Model) inference starts.
Event Type: plugin.vlm_inference_start
| Field | Type | Description | |
|---|
inference_id | str | Unique identifier for this inference | |
model | `str | None` | Model identifier |
frames_count | int | Number of frames to process | |
VLMInferenceCompletedEvent
Emitted when a VLM inference completes. Contains timing metrics, token usage, and detection counts.
Event Type: plugin.vlm_inference_completed
from vision_agents.core.llm.events import VLMInferenceCompletedEvent
@agent.events.subscribe
async def on_vlm_complete(event: VLMInferenceCompletedEvent):
print(f"VLM response: {event.text}")
print(f"Processed {event.frames_processed} frames")
print(f"Detected {event.detections} objects")
| Field | Type | Description | |
|---|
inference_id | `str | None` | Unique identifier for this inference |
model | `str | None` | Model identifier |
text | str | Generated text response | |
latency_ms | `float | None` | Total time from request to complete response |
input_tokens | `int | None` | Number of input tokens (text + image tokens) |
output_tokens | `int | None` | Number of output tokens generated |
frames_processed | int | Number of video frames processed | |
detections | int | Number of objects/items detected | |
This event is used by MetricsCollector to record VLM metrics. See Telemetry for details.
VLMErrorEvent
Emitted when a VLM error occurs.
Event Type: plugin.vlm_error
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context about the error |
inference_id | `str | None` | ID of the failed inference |
is_recoverable | bool | Whether the error is recoverable | |
Video Processor Events
Events from video processing plugins (roboflow, ultralytics, etc.).
Import: from vision_agents.core.events import VideoProcessorDetectionEvent
VideoProcessorDetectionEvent
Emitted when a video processor detects objects in a frame.
from vision_agents.core.events import VideoProcessorDetectionEvent
@agent.events.subscribe
async def on_detection(event: VideoProcessorDetectionEvent):
print(f"Detected {event.detection_count} objects")
print(f"Inference took {event.inference_time_ms}ms")
| Field | Type | Description | |
|---|
model_id | `str | None` | Identifier of the model used |
inference_time_ms | `float | None` | Time taken for inference |
detection_count | int | Number of objects detected | |
This event is used by MetricsCollector to record video processing metrics. See Telemetry for details.
OpenAI Plugin Events
Events specific to the OpenAI plugin.
Import: from vision_agents.plugins.openai.events import ...
OpenAIStreamEvent
Emitted when OpenAI provides a streaming chunk.
| Field | Type | Description |
|---|
chunk | Any | Raw streaming chunk from OpenAI |
VAD Events
Voice Activity Detection events.
Import: from vision_agents.core.vad.events import ...
VADSpeechStartEvent
Emitted when VAD detects the start of speech.
| Field | Type | Description |
|---|
timestamp | datetime | When speech started |
VADSpeechEndEvent
Emitted when VAD detects the end of speech.
| Field | Type | Description | |
|---|
timestamp | datetime | When speech ended | |
duration_ms | `float | None` | Duration of speech segment |
VADErrorEvent
Emitted when VAD encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context |
Turn Detection Events
Events for detecting when speakers start and stop talking.
Import: from vision_agents.core.turn_detection.events import ...
TurnStartedEvent
Emitted when a speaker starts their turn.
Event Type: plugin.turn_detection.turn_started
from vision_agents.core.turn_detection.events import TurnStartedEvent
@agent.events.subscribe
async def on_turn_start(event: TurnStartedEvent):
print(f"Turn started (confidence: {event.confidence})")
| Field | Type | Description | |
|---|
participant | `Participant | None` | Who started speaking |
participant_id | `str | None` | ID of the participant speaking |
confidence | `float | None` | Detection confidence (0.0-1.0) |
custom | `dict | None` | Additional metadata |
TurnEndedEvent
Emitted when a speaker completes their turn.
Event Type: plugin.turn_detection.turn_ended
from vision_agents.core.turn_detection.events import TurnEndedEvent
@agent.events.subscribe
async def on_turn_end(event: TurnEndedEvent):
print(f"Turn ended after {event.duration_ms}ms")
print(f"Silence: {event.trailing_silence_ms}ms")
| Field | Type | Description | |
|---|
participant | `Participant | None` | Who stopped speaking |
participant_id | `str | None` | ID of the participant |
confidence | `float | None` | Detection confidence |
duration_ms | `float | None` | Duration of the turn in milliseconds |
trailing_silence_ms | `float | None` | Silence duration before turn end |
custom | `dict | None` | Additional metadata |
eager_end_of_turn | bool | Early end detection flag | |
This event is used by MetricsCollector to record turn detection metrics. See Telemetry for details.
xAI Plugin Events
Events specific to the xAI plugin.
Import: from vision_agents.plugins.xai.events import ...
XAIChunkEvent
Emitted for xAI streaming response chunks.
| Field | Type | Description |
|---|
chunk | Any | Raw streaming chunk from xAI |
Qwen Plugin Events
Events specific to the Qwen plugin.
Import: from vision_agents.plugins.qwen.events import ...
QwenLLMErrorEvent
Emitted when Qwen LLM encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context |
ConnectionState Enum
Used in connection events to indicate state.
Import: from vision_agents.core.events import ConnectionState
| Value | Description |
|---|
DISCONNECTED | Not connected |
CONNECTING | Connection in progress |
CONNECTED | Successfully connected |
RECONNECTING | Attempting to reconnect |
ERROR | Connection error |
Subscribing to Events
All events can be subscribed to using the @agent.events.subscribe decorator:
@agent.events.subscribe
async def my_handler(event: EventType):
# Handle event
pass
Subscribe to multiple event types using union types:
@agent.events.subscribe
async def my_handler(event: STTTranscriptEvent | STTPartialTranscriptEvent):
print(f"Transcript: {event.text}")
Event handlers must be async functions. Non-async handlers will raise a RuntimeError.