Skip to main content
Fast-Whisper is a high-performance local STT using CTranslate2. Provides 2-4x faster inference than standard Whisper with support for CPU and GPU acceleration.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[fast-whisper]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import fast_whisper, gemini, elevenlabs, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=fast_whisper.STT(model_size="base"),
    tts=elevenlabs.TTS(),
)
Fast-Whisper runs locally. No API key required. Models download automatically on first use.

Parameters

NameTypeDefaultDescription
model_sizestr"base"Model size ("tiny", "base", "small", "medium", "large-v3")
languagestrNoneLanguage code or None for auto-detect
devicestr"cpu"Device ("cpu", "cuda", "auto")
compute_typestr"int8"Precision ("int8", "float16", "float32")

Model Sizes

ModelSpeedUse Case
tinyFastestReal-time, resource-constrained
baseVery FastGeneral purpose
smallFastBalanced
mediumModerateHigher accuracy
large-v3SlowerMaximum accuracy

Optimization

# CPU (default) - use int8 for best performance
stt = fast_whisper.STT(device="cpu", compute_type="int8")

# GPU - use float16 for speed and accuracy
stt = fast_whisper.STT(device="cuda", compute_type="float16")

Next Steps