Fast-Whisper

Fast-Whisper is a high-performance local STT using CTranslate2. Provides 2-4x faster inference than standard Whisper with support for CPU and GPU acceleration.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[fast-whisper]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import fast_whisper, gemini, elevenlabs, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=fast_whisper.STT(model_size="base"),
    tts=elevenlabs.TTS(),
)

Fast-Whisper runs locally. No API key required. Models download automatically on first use.

Parameters

Name	Type	Default	Description
`model_size`	`str`	`"base"`	Model size (`"tiny"`, `"base"`, `"small"`, `"medium"`, `"large-v3"`)
`language`	`str`	`None`	Language code or `None` for auto-detect
`device`	`str`	`"cpu"`	Device (`"cpu"`, `"cuda"`, `"auto"`)
`compute_type`	`str`	`"int8"`	Precision (`"int8"`, `"float16"`, `"float32"`)

Model Sizes

Model	Speed	Use Case
`tiny`	Fastest	Real-time, resource-constrained
`base`	Very Fast	General purpose
`small`	Fast	Balanced
`medium`	Moderate	Higher accuracy
`large-v3`	Slower	Maximum accuracy

Optimization

# CPU (default) - use int8 for best performance
stt = fast_whisper.STT(device="cpu", compute_type="int8")

# GPU - use float16 for speed and accuracy
stt = fast_whisper.STT(device="cuda", compute_type="float16")

Overview

AI Providers

Custom Integrations

Installation

Quick Start

Parameters

Model Sizes

Optimization

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick Start

​Parameters

​Model Sizes

​Optimization

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Quick Start

Parameters

Model Sizes

Optimization

Next Steps