Skip to main content
TurboPuffer is a high-performance vector database with native hybrid search (vector + BM25). The plugin provides RAG with precise control over chunking, embeddings, and search strategies.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[turbopuffer]

Quick Start

from vision_agents.plugins import turbopuffer

# Initialize RAG
rag = turbopuffer.TurboPufferRAG(namespace="my-knowledge")
await rag.add_directory("./knowledge")

# Hybrid search (default)
results = await rag.search("How does the chat API work?")
Set TURBO_PUFFER_KEY and GOOGLE_API_KEY (for Gemini embeddings) in your environment.

Parameters

NameTypeDefaultDescription
namespacestrRequiredTurboPuffer namespace
embedding_modelstr"models/gemini-embedding-001"Embedding model
chunk_sizeint10000Text chunk size
chunk_overlapint200Overlap between chunks

Search Modes

# Hybrid (recommended) - combines vector and BM25
results = await rag.search(query, mode="hybrid")

# Vector only - semantic similarity
results = await rag.search(query, mode="vector")

# BM25 only - keyword matching
results = await rag.search(query, mode="bm25")

How Hybrid Search Works

Hybrid search combines vector and BM25 using Reciprocal Rank Fusion (RRF):
  • Vector search catches semantic meaning even when exact words differ
  • BM25 catches exact matches (product names, SKUs, technical terms)
  • RRF balances both without requiring tuning

With Function Calling

@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
    return await rag.search(query, top_k=5, mode="hybrid")

Cache Warming

For low-latency queries, TurboPuffer supports cache warming (called automatically after add_directory()):
await rag.warm_cache()
See the RAG Guide for more details.

Next Steps