Deepgram

Deepgram is a powerful Speech-to-Text (STT) platform that provides fast, accurate, and customizable transcription services. It’s designed for real-time and batch audio processing, with support for features like word-level timestamps, speaker diarization, and multilingual transcription. The Deepgram plugin in the Vision Agents SDK enables real-time transcription of voice input, making it ideal for voice agents, call analysis, meeting transcriptions, and more.

Installation

Install the Deepgram plugin with

uv add vision-agents[deepgram]

Example

Check out our Deepgram example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.

Initialisation

The Deepgram plugin is exposed via the STT class:

from vision_agents.plugins import deepgram

# Initialize with default settings
stt = deepgram.STT()

# Or with custom options
stt = deepgram.STT(
    api_key="your-api-key",
    sample_rate=48000,
    language="en-US",
    interim_results=True
)

To initialise without passing in the API key, make sure the DEEPGRAM_API_KEY is available as an environment variable. You can do this either by defining it in a .env file or exporting it directly in your terminal.

Parameters

These are the parameters available in the DeepgramSTT plugin for you to customise:

Name	Type	Default	Description
`api_key`	`str` or `None`	`None`	Your Deepgram API key. If not provided, the plugin will use the `DEEPGRAM_API_KEY` environment variable.
`options`	`LiveOptions` or `None`	`None`	Optional Deepgram configuration options, such as tier, model, or features like punctuation or diarization.
`sample_rate`	`int`	`48000`	The sample rate (in Hz) of the audio stream being transcribed.
`language`	`str`	`"en-US"`	Language code for transcription.
`keep_alive_interval`	`float`	`3.0`	Interval (in seconds) for sending keep-alive messages to maintain the WebSocket connection.
`interim_results`	`bool`	`True`	Whether to receive `partial_transcript` events when speaking.

Functionality

Process Audio

The Deepgram STT service automatically starts when you initialize it and connects to Deepgram’s streaming API. When used with an Agent, audio is automatically processed through the STT service. If using standalone, you can process audio directly:

from getstream.video.rtc.track_util import PcmData

# Process audio through Deepgram STT
await stt.process_audio(pcm_data, participant)

Events

Transcript Event

The transcript event is triggered when a final transcript is available from Deepgram:

from vision_agents.core.stt.events import STTTranscriptEvent

@stt.events.subscribe
async def on_transcript(event: STTTranscriptEvent):
    print(f"Final transcript: {event.text}")
    print(f"User: {event.participant.user_id}")
    print(f"Confidence: {event.response.confidence}")

Partial Transcript Event

The partial transcript event is fired in real time as Deepgram generates intermediate (partial) transcriptions:

from vision_agents.core.stt.events import STTPartialTranscriptEvent

@stt.events.subscribe
async def on_partial_transcript(event: STTPartialTranscriptEvent):
    print(f"Partial transcript: {event.text}")
    print(f"User: {event.participant.user_id}")

Error Event

If an error occurs, an error event is fired:

from vision_agents.core.stt.events import STTErrorEvent

@stt.events.subscribe
async def on_stt_error(event: STTErrorEvent):
    print(f"STT error: {event.error}")
    print(f"Context: {event.context}")

Close

The Deepgram STT service automatically manages its WebSocket connection lifecycle. When used with an Agent, cleanup is handled automatically. For standalone usage, you can close the connection:

await stt.close()

Overview

AI Providers

Custom Integrations

Installation

Example

Initialisation

Parameters

Functionality

Process Audio

Events

Transcript Event

Partial Transcript Event

Error Event

Close

Overview

AI Providers

Custom Integrations

​Installation

​Example

​Initialisation

​Parameters

​Functionality

​Process Audio

​Events

​Transcript Event

​Partial Transcript Event

​Error Event

​Close

Installation

Example

Initialisation

Parameters

Functionality

Process Audio

Events

Transcript Event

Partial Transcript Event

Error Event

Close