Running Agents as a Server

The Runner class provides two modes for running your agents:

a single-agent console mode for development,
and an HTTP server mode that spawns agents on demand for production deployments.

For a complete working example, see 08_agent_server_example in the Vision Agents repository.

Core Components

Running agents as a server requires four components:

create_agent() - A factory function that configures and returns an Agent instance
join_call() - Defines what happens when an agent joins a call
AgentLauncher - Responsible for running and monitoring the agents
Runner - a wrapper on top of AgentLauncher, providing CLI commands for console and server modes

Basic Example

import logging
from dotenv import load_dotenv
from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream

load_dotenv()
logging.basicConfig(level=logging.INFO)


async def create_agent(**kwargs) -> Agent:
    """Factory function that creates and configures an agent."""
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You are a helpful voice assistant.",
        llm=gemini.LLM("gemini-2.5-flash-lite"),
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(eager_turn_detection=True),
    )

    @agent.llm.register_function(description="Get the current weather for a location")
    def get_weather(location: str) -> str:
        return f"The weather in {location} is sunny and 72°F."

    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    """Called when the agent should join a call."""
    call = await agent.create_call(call_type, call_id)

    async with agent.join(call):
        await agent.simple_response("Hello! How can I help you today?")
        await agent.finish()


if __name__ == "__main__":
    runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))
    runner.cli()

Running the Server

Start the HTTP server with the serve command:

uv run  <your_agent.py> serve

The server starts on http://127.0.0.1:8000 by default. The interactive API documentation can be found at http://127.0.0.1:8000/docs (Swagger UI).

CLI Options

Option	Default	Description
`--host`	`127.0.0.1`	Server host
`--port`	`8000`	Server port
`--agents-log-level`	`INFO`	Log level for agents
`--http-log-level`	`INFO`	Log level for HTTP server

uv run agent.py serve --host 0.0.0.0 --port 8000 --agents-log-level DEBUG

Console Mode

For development and testing, use console mode to run a single agent:

uv run <your_agent.py> run

API Endpoints

The server exposes these endpoints:

Method	Endpoint	Purpose
POST	`/sessions`	Spawn a new agent for a call
DELETE	`/sessions/{session_id}`	Stop an agent session
POST	`/sessions/{session_id}/close`	Stop an agent session via sendBeacon
GET	`/sessions/{session_id}`	Get session information
GET	`/sessions/{session_id}/metrics`	Real-time performance metrics
GET	`/health`	Liveness check
GET	`/ready`	Readiness check

Creating a Session:

curl -X POST http://127.0.0.1:8000/sessions \
  -H "Content-Type: application/json" \
  -d '{"call_type": "default", "call_id": "my-call-123"}'

Response:

{
  "session_id": "abc-123",
  "call_id": "my-call-123",
  "session_started_at": "2025-01-15T10:30:00Z"
}

Getting Session Metrics:

curl http://127.0.0.1:8000/sessions/abc-123/metrics

Response:

{
  "session_id": "abc-123",
  "call_id": "my-call-123",
  "session_started_at": "2025-01-15T10:30:00Z",
  "metrics_generated_at": "2025-01-15T10:35:00Z",
  "metrics": {
    "llm_latency_ms__avg": 245.5,
    "llm_time_to_first_token_ms__avg": 120.3,
    "llm_input_tokens__total": 1500,
    "llm_output_tokens__total": 800,
    "stt_latency_ms__avg": 85.2,
    "tts_latency_ms__avg": 95.1
  }
}

Configuration with ServeOptions

The HTTP server behavior can be customized using ServeOptions:

from vision_agents.core import Runner, AgentLauncher, ServeOptions

runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(
        cors_allow_origins=["https://myapp.com"],
        cors_allow_methods=["GET", "POST", "DELETE"],
        cors_allow_headers=["Authorization"],
        cors_allow_credentials=True,
    ),
)

CORS Options

Option	Description	Default
`cors_allow_origins`	Allowed origins	`["*"]`
`cors_allow_methods`	Allowed HTTP methods	`["*"]`
`cors_allow_headers`	Allowed headers	`["*"]`
`cors_allow_credentials`	Allow credentials	`True`

Authentication & Permissions

Use authentication and permission callbacks to secure your agent server and control who can start, view, or close sessions. These callbacks are standard FastAPI dependencies, giving you access to headers, query parameters, and dependency injection.

Option	Default	Description
`can_start_session`	allow all	Permission check for starting sessions
`can_close_session`	allow all	Permission check for closing sessions
`can_view_session`	allow all	Permission check for viewing sessions
`can_view_metrics`	allow all	Permission check for viewing metrics
`get_current_user`	no-op	Callable to determine current user

Custom User Identification

Define a get_current_user callback to identify the user making requests:

from dataclasses import dataclass

from fastapi import Header, HTTPException


@dataclass
class User:
    id: str
    name: str

    def has_permission(self, permission: str) -> bool:
        # Implement your permission logic here
        return True


async def get_current_user(authorization: str = Header(None)) -> User:
    """Extract and validate the current user from the request."""
    if not authorization:
        raise HTTPException(status_code=401, detail="Authorization required")

    # Validate the token and retrieve user from your database
    user = await validate_token_and_get_user(authorization)
    return user


runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(
        get_current_user=get_current_user,
    ),
)

Permission Callbacks

Control who can start, view, and close sessions:

from typing import Optional

from fastapi import Depends, HTTPException

from vision_agents.core import AgentSession
from vision_agents.core.runner.http.dependencies import get_session


async def can_start_session(
    user: User = Depends(get_current_user),
) -> bool:
    """Check if the user can start a new agent session."""
    # Check user permissions
    if not user.has_permission("start_session"):
        raise HTTPException(status_code=403, detail="Permission denied")
    return True


# In your permission callbacks, you can access the requested AgentSession
async def can_view_session(
    user: User = Depends(get_current_user),
    session: Optional[AgentSession] = Depends(get_session),
) -> bool:
    """Check if the user can view a session."""
    if session and session.created_by != user.id:
        raise HTTPException(status_code=403, detail="Not your session")
    return True


async def can_close_session(
    user: User = Depends(get_current_user),
    session: Optional[AgentSession] = Depends(get_session),
):
    """Only allow users to close their own sessions."""
    if session and session.created_by != user.id:
        raise HTTPException(
            status_code=403, detail="Cannot close another user's session"
        )


runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(
        get_current_user=get_current_user,
        can_start_session=can_start_session,
        can_view_session=can_view_session,
        can_close_session=can_close_session,
    ),
)

Customizing the Default FastAPI App

The Runner exposes its FastAPI instance via runner.fast_api, allowing you to add custom routes, middlewares, and other configurations after initialization.

from fastapi.middleware.gzip import GZipMiddleware

runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))

# Adding a custom endpoint
@runner.fast_api.get("/custom")
def custom_endpoint():
    return {"message": "Custom endpoint"}

# Add custom middleware
runner.fast_api.add_middleware(GZipMiddleware, minimum_size=1000)

Using a Custom FastAPI Instance

For full control over the FastAPI configuration, provide your own instance via ServeOptions:

from fastapi import FastAPI

app = FastAPI(
    title="My Agent Server",
    description="Custom agent server with additional features",
    version="1.0.0",
)

# Add your own routes before passing to Runner
@app.get("/custom")
def custom_endpoint():
    return {"message": "Custom endpoint"}

runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(fast_api=app),
)

When providing a custom FastAPI app via ServeOptions(fast_api=app), the Runner will use it as-is without any configuration.It will not register the default endpoints (/sessions, /health, /ready, etc.) nor apply CORS settings. You are responsible for assembling the application yourself.

Session Limits & Resource Management

AgentLauncher provides options to control session lifecycle and resource usage:

Parameter	Type	Default	Description
`max_concurrent_sessions`	`int \| None`	`None`	Maximum concurrent sessions across all calls
`max_sessions_per_call`	`int \| None`	`None`	Maximum sessions allowed per call_id
`max_session_duration_seconds`	`float \| None`	`None`	Maximum duration before session is auto-closed
`agent_idle_timeout`	`float`	`60.0`	Seconds agent stays alone on call before auto-close
`cleanup_interval`	`float`	`5.0`	Interval between cleanup checks for expired sessions

runner = Runner(
    AgentLauncher(
        create_agent=create_agent,
        join_call=join_call,
        max_concurrent_sessions=10,       # Limit total concurrent agents
        max_sessions_per_call=1,          # One agent per call
        max_session_duration_seconds=3600, # 1 hour max per session
        agent_idle_timeout=120.0,         # Disconnect after 2 min alone
    )
)

max_concurrent_sessions - Prevents resource exhaustion by capping how many agents can run simultaneously. Useful for cost control and server capacity planning.
max_sessions_per_call - Prevents duplicate agents from joining the same call. Set to 1 to ensure only one agent per conversation.
max_session_duration_seconds - Automatically terminates long-running sessions. Protects against runaway sessions that could accumulate costs.
agent_idle_timeout - Cleans up agents when all other participants have left the call. The agent disconnects after being alone for this duration.

How-to Guides

Running Agents as a Server

Core Components

Basic Example

Running the Server

CLI Options

Console Mode

API Endpoints

Configuration with ServeOptions

CORS Options

Authentication & Permissions

Custom User Identification

Permission Callbacks

Customizing the Default FastAPI App

Using a Custom FastAPI Instance

Session Limits & Resource Management

Next Steps

Production Deployment

Agent Server Example

How-to Guides

​Core Components

​Basic Example

​Running the Server

​CLI Options

​Console Mode

​API Endpoints

​Configuration with ServeOptions

​CORS Options

​Authentication & Permissions

​Custom User Identification

​Permission Callbacks

​Customizing the Default FastAPI App

​Using a Custom FastAPI Instance

​Session Limits & Resource Management

​Next Steps

Production Deployment

Agent Server Example

Core Components

Basic Example

Running the Server

CLI Options

Console Mode

API Endpoints

Configuration with ServeOptions

CORS Options

Authentication & Permissions

Custom User Identification

Permission Callbacks

Customizing the Default FastAPI App

Using a Custom FastAPI Instance

Session Limits & Resource Management

Next Steps