TurboPuffer

TurboPuffer is a high-performance vector database with native hybrid search (vector + BM25). The plugin provides RAG with precise control over chunking, embeddings, and search strategies.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[turbopuffer]

Quick Start

from vision_agents.plugins import turbopuffer

# Initialize RAG
rag = turbopuffer.TurboPufferRAG(namespace="my-knowledge")
await rag.add_directory("./knowledge")

# Hybrid search (default)
results = await rag.search("How does the chat API work?")

Set TURBO_PUFFER_KEY and GOOGLE_API_KEY (for Gemini embeddings) in your environment.

Parameters

Name	Type	Default	Description
`namespace`	`str`	Required	TurboPuffer namespace
`embedding_model`	`str`	`"models/gemini-embedding-001"`	Embedding model
`chunk_size`	`int`	`10000`	Text chunk size
`chunk_overlap`	`int`	`200`	Overlap between chunks

Search Modes

# Hybrid (recommended) - combines vector and BM25
results = await rag.search(query, mode="hybrid")

# Vector only - semantic similarity
results = await rag.search(query, mode="vector")

# BM25 only - keyword matching
results = await rag.search(query, mode="bm25")

How Hybrid Search Works

Hybrid search combines vector and BM25 using Reciprocal Rank Fusion (RRF):

Vector search catches semantic meaning even when exact words differ
BM25 catches exact matches (product names, SKUs, technical terms)
RRF balances both without requiring tuning

With Function Calling

@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
    return await rag.search(query, top_k=5, mode="hybrid")

Cache Warming

For low-latency queries, TurboPuffer supports cache warming (called automatically after add_directory()):

await rag.warm_cache()

See the RAG Guide for more details.

Next Steps

Build a Voice Agent

Get started with voice

Build a Video Agent

Add video processing

Overview

AI Providers

Custom Integrations

Installation

Quick Start

Parameters

Search Modes

How Hybrid Search Works

With Function Calling

Cache Warming

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick Start

​Parameters

​Search Modes

​How Hybrid Search Works

​With Function Calling

​Cache Warming

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Quick Start

Parameters

Search Modes

How Hybrid Search Works

With Function Calling

Cache Warming

Next Steps