TurboPuffer is a high-performance vector database with native hybrid search (vector + BM25). The plugin provides RAG with precise control over chunking, embeddings, and search strategies.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[turbopuffer]
Quick Start
from vision_agents.plugins import turbopuffer
# Initialize RAG
rag = turbopuffer.TurboPufferRAG(namespace="my-knowledge")
await rag.add_directory("./knowledge")
# Hybrid search (default)
results = await rag.search("How does the chat API work?")
Set TURBO_PUFFER_KEY and GOOGLE_API_KEY (for Gemini embeddings) in your environment.
Parameters
| Name | Type | Default | Description |
|---|
namespace | str | Required | TurboPuffer namespace |
embedding_model | str | "models/gemini-embedding-001" | Embedding model |
chunk_size | int | 10000 | Text chunk size |
chunk_overlap | int | 200 | Overlap between chunks |
Search Modes
# Hybrid (recommended) - combines vector and BM25
results = await rag.search(query, mode="hybrid")
# Vector only - semantic similarity
results = await rag.search(query, mode="vector")
# BM25 only - keyword matching
results = await rag.search(query, mode="bm25")
How Hybrid Search Works
Hybrid search combines vector and BM25 using Reciprocal Rank Fusion (RRF):
- Vector search catches semantic meaning even when exact words differ
- BM25 catches exact matches (product names, SKUs, technical terms)
- RRF balances both without requiring tuning
With Function Calling
@llm.register_function(description="Search the knowledge base")
async def search_knowledge(query: str) -> str:
return await rag.search(query, top_k=5, mode="hybrid")
Cache Warming
For low-latency queries, TurboPuffer supports cache warming (called automatically after add_directory()):
See the RAG Guide for more details.
Next Steps