## **Serving an Agent with FastAPI**

### **What is FastAPI and Why Use It for Agents?**

FastAPI is a modern, high-performance web framework for building APIs with Python. Released in 2018, it has quickly gained popularity due to its combination of speed, ease of use, and developer-friendly features.

As its core, FastAPI is design to create REST APIs that can serve requests efficiently while providing robust validation and documentation. For AI Agent deployment, FastAPI offers several critical advantages:
- **Asynchronous Support**: AI agents often need to handle concurrent requests efficiently. FastAPI's native async/await support enables handling thousands of simultaneous connections, perfect for serving multiple agent requests in parallel without blocking.
- **Streaming Responses**: Agents frequently generate content incrementally (token by token). FastAPI's streaming response capabilities allow for real-time transmission of agent outputs as they're generated, creating a more responsive user experience.
- **Type Validation**: When working with agents, ensuring proper input formats is curcial. FastAPI uses Pydantic for automatic request validation, catching malformed inputs before they reach your agent and providing clear error messages.
- **Performance**: Built on Starlette and Uvicorn, FastAPI offers near-native performance. For compute-intensive agent applications, this means your infrastructure handles API overhead efficiently, allowing more resources for the actual agent processing.
- **Automatic Documentation**: When exposing an agent API to multiple users or teams, documentation becomes essential. FastAPI automatically generates interactive API documentation via Swagger UI and ReDoc, making it easy for others to understand and use your agent.
- **Schema Enforcement**: Pydantic models ensure that both requests to your agent and reponses from it conform to predefined schemas, making agent behavior more predictable and easier to integrate with other systems.

In this tutorial, we'll build a complete API that serves an AI Agent with both synchronous and streaming endpoints, demonstrating how FastAPI's features address the specific challenges of deploying agents in productions.


### **Prerequisite**

In [2]:
!uv add fastapi uvicorn pydantic python-dotenv sse-starlette --quiet

### **Agent Quick Recap**

Let's start by defining a simple agent that we'll expose via our API. This could be any agent implementation, but for this tutorial, we'll create a basic example that simylates an AI agent responding to user queries:

In [3]:
class SimpleAgent:
    def __init__(self, name="FastAPI Agent"):
        self.name = name

    def generate_response(self, query):
        """Generate a synchronous response to a user query."""
        return f"Agent {self.name} received: '{query}'\nResponse: This is a simulated agent response."

    async def generate_response_stream(self, query):
        """Generate a streaming response to a user query."""

        import asyncio

        prefix = f"Agent {self.name} received: '{query}'\n"
        response = "This is a simulated agent response that streams token by token."

        # Yield the prefix as a single chunk
        yield prefix

        # Stream the response token by token with small delays
        for token in response.split():
            await asyncio.sleep(0.1)
            yield token + " "

In [4]:
agent = SimpleAgent()
test_query = "Hi there, how are you?"
print(agent.generate_response(test_query))

Agent FastAPI Agent received: 'Hi there, how are you?'
Response: This is a simulated agent response.


### **Minial FastAPI App**

In [5]:
from fastapi import FastAPI

# Initialize FastAPI app
app = FastAPI(
    title = "Agent API",
    description = "A simple API that serves an AI agent.",
    version = "0.0.1"
)

# Create an instance of the agent
agent = SimpleAgent()

# Create a simple endpoint
@app.get("/")
def health_check():
    """Check if the API is running"""
    return {"status": "ok", "message": "API is operational"}

In [6]:
import threading
import uvicorn

def run_server(port):
    uvicorn.run(app, host="0.0.0.0", port=port)

def run_main(port):
    thread = threading.Thread(target=run_server, daemon=True, args=[port])
    thread.start()

### POST /agent - Synchronous Endpoint

In [7]:
from pydantic import BaseModel, ConfigDict
from typing import Optional

# Define request and response models
class QueryRequest(BaseModel):
    query: str
    context: Optional[str] = None

    model_config = ConfigDict(
        json_schema_extra = {
            "example": [
                {
                    "query": "What is FastAPI?",
                    "context": "I'm a beginner programmer."
                }
            ]
        }
    )

class QueryResponse(BaseModel):
    response: str

    model_config = ConfigDict(
        json_schema_extra = {
            "example": [
                {
                    "response": "FastAPI is a modern, high-performance web framework for building APIs with Python."
                }
            ]
        }
    )

In [8]:
# Create a synchronous endpoint for the agent
@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest):
    """Get a reponse from agent"""
    response = agent.generate_response(request.query)
    return QueryResponse(response=response)

This endpoint accepts POST requests with a JSON body containing a "query" field and an optional "context" field. It returns a JSON response with the agent's answer. 

In [9]:
run_main(5001)

INFO:     Started server process [3868]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)


### **POST /agent/stream - Token Streaming**

In [10]:
from fastapi.responses import StreamingResponse
import json

@app.post("/agent/stream")
async def stream_agent(request: QueryRequest):
    """Stream a response from the agent token by token"""

    async def event_generator():
        async for token in agent.generate_response_stream(request.query):
            # Format in JSON Object
            data = json.dumps({"token": token})
            yield f"data: {data}\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

In [11]:
run_main(5002)

INFO:     Started server process [3868]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5002 (Press CTRL+C to quit)


In [12]:
from sse_starlette.sse import EventSourceResponse

@app.post("/agent/stream-sse")
async def stream_agent_sse(request: QueryRequest):
    """Stream a response using SSE with the sse-starlette package"""

    async def event_generator():
        async for token in agent.generate_response_stream(request.query):
            yield {"data": json.dumps({"token": token})}
            
    return EventSourceResponse(event_generator())

In [13]:
run_main(5003)

INFO:     Started server process [3868]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5003 (Press CTRL+C to quit)


### **Creating the full application**

#### 1. Imports

In [14]:
from fastapi import FastAPI, Depends, HTTPException, Header
from fastapi.responses import StreamingResponse
from pydantic import BaseModel, ConfigDict
from typing import Optional
import json
import os
import asyncio

#### 2. Define Agent Class

In [15]:
class SimpleAgent:
    
    def __init__(self, name="FastAPI Agent"):
        self.name = name

    def generate_response(self, query):
        """Generate a synchronous response to a user query"""
        return f"Agent {self.name} received: '{query}'\nResponse: This is a simulated agent response."
    
    async def generate_reponse_stream(self, query):
        """Generate a streaming reponse to a user query"""
        
        prefix = f"Agent {self.name} is thinking about: '{query}'\n"
        response = "This is a simulated agent reponse that streams token by token."

        yield prefix

        for token in response.split():
            await asyncio.sleep(0.1)
            yield token + " "

#### 3. Define the request and response models

In [16]:
class QueryRequest(BaseModel):
    query: str
    context: Optional[str] = None

    model_config = ConfigDict(
        json_schema_extra={
            "example": [
                {
                    "query": "What is FastAPI?",
                    "context": "I'm a beginner programmer."
                }
            ]
        }
    )

class QueryResponse(BaseModel):
    response: str

    model_config = ConfigDict(
        json_schema_extra={
            "example": [
                {
                    "response": "FastAPI is a modern, high-performance web framework for building APIs with Python."
                }
            ]
        }
    )

#### 4. Application

In [17]:
app = FastAPI(
    title="Agent API",
    description="A simple API that serves an AI agent",
    version="0.1.0"
)

agent = SimpleAgent()

# Create endpoints
@app.get("/health")
def health_check():
    """Check if the API is running"""
    return {"status": "ok", "message": "API is operational"}

@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest):
    """Get a response from user input"""
    response = agent.generate_response(request.query)
    return QueryResponse(response=response)

@app.post("/agent/stream", response_model=QueryResponse)
async def stream_agent(request: QueryRequest):
    """Stream a response from the agent token by token"""
    
    async def event_generator():
        async for token in agent.generate_reponse_stream(request.query):
            data = json.dumps({"token": token})
            yield f"data: {data}\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

#### 5. Running Server

In [None]:
import threading
import uvicorn

def run_server(port):
    uvicorn.run(app, host="0.0.0.0", port=port)

def run_main(port):
    thread = threading.Thread(target=run_server, daemon=True, args=[port])
    thread.start()

run_main(5007)

INFO:     Started server process [3868]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5007 (Press CTRL+C to quit)


INFO:     127.0.0.1:57961 - "POST /agent HTTP/1.1" 200 OK
INFO:     127.0.0.1:57983 - "POST /agent/stream HTTP/1.1" 200 OK


### **Simple Client Test**

In [19]:
import requests
import json

# Test the synchronous endpoint
response = requests.post(
    "http://localhost:5007/agent", 
    json={"query": "What is FastAPI?"}
)
print("Synchronous Response:")
print(response.json())
print("\n" + "-" * 40 + "\n")

# Test the streaming endpoint
response = requests.post(
    "http://localhost:5007/agent/stream",
    json={"query": "Tell me about streaming"},
    stream=True
)

print("Streaming Response:")
for line in response.iter_lines():
    if line:
        # Parse the SSE format
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = json.loads(line[6:])
            print(data["token"], end="")

Synchronous Response:
{'response': "Agent FastAPI Agent received: 'What is FastAPI?'\nResponse: This is a simulated agent response."}

----------------------------------------

Streaming Response:
Agent FastAPI Agent is thinking about: 'Tell me about streaming'
This is a simulated agent reponse that streams token by token. 

### **Adding Basic Auth Key (Optional)**

In [21]:
from fastapi import Depends, HTTPException, Header
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["API_KEY"] = os.getenv("TEST_API_KEY")

# Function to validate the API key
async def verify_api_key(x_api_key: str = Header(None)):
    """Verify the API key provided in the X-API-Key header"""

    api_key = os.environ.get("API_KEY")

    if not api_key:
        return True
    
    if not x_api_key:
        raise HTTPException(status_code=401, detail="API_KEY is missing")
    
    if x_api_key != api_key:
        raise HTTPException(status_code=403, detail="API_KEY is invalid")
    
    return True

In [22]:
# Update endpoints to include the API key dependency
@app.post("/agent", response_model=QueryResponse)
def query_agent(request: QueryRequest, auth: bool = Depends(verify_api_key)):
    """Get a reponse from agent"""

    response = agent.generate_response(request.query)
    return QueryResponse(response=response) 

@app.post("/agent/stream")
async def stream_agent(request: QueryRequest, auth: bool = Depends(verify_api_key)):
    """Stream a response from the agent token by token"""

    async def event_generator():
        async for token in agent.generate_reponse_stream(request.query):
            data = json.dumps({"token": token})
            yield f"data: {data}\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

In [None]:
import threading
import uvicorn

def run_server(port):
    uvicorn.run(app, host="0.0.0.0", port=port)

def run_main(port):
    thread = threading.Thread(target=run_server, daemon=True, args=[port])
    thread.start()

run_main(5011)

INFO:     Started server process [3868]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5011 (Press CTRL+C to quit)


INFO:     127.0.0.1:58009 - "POST /agent HTTP/1.1" 200 OK
INFO:     127.0.0.1:58011 - "POST /agent/stream HTTP/1.1" 200 OK


In [24]:
import requests
import json

# Test the synchronous endpoint
response = requests.post(
    "http://localhost:5011/agent", 
    json={"query": "What is FastAPI?"}
)
print("Synchronous Response:")
print(response.json())
print("\n" + "-" * 40 + "\n")

# Test the streaming endpoint
response = requests.post(
    "http://localhost:5011/agent/stream",
    json={"query": "Tell me about streaming"},
    stream=True
)

print("Streaming Response:")
for line in response.iter_lines():
    if line:
        # Parse the SSE format
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = json.loads(line[6:])
            print(data["token"], end="")

Synchronous Response:
{'response': "Agent FastAPI Agent received: 'What is FastAPI?'\nResponse: This is a simulated agent response."}

----------------------------------------

Streaming Response:
Agent FastAPI Agent is thinking about: 'Tell me about streaming'
This is a simulated agent reponse that streams token by token. 

### **Unit Tests**

### **Conclusions**

In this tutorial, we've built a FastAPI application that serves a simple AI Agent with both synchronous and streaming endpoints. We've covered the basics of setting up FastAPI, defining Pydantic models for request/response validation, implementing both synchronous and streaming endpoints, and adding simple authentication.

FastAPI's combination of performance, automatic documentation, and developer-friendly features makes it an excellent choice for serving AI agents in production. By following the patterns in this tutorial, you can create robust, production-ready APIs for your own AI agents.

### **Next steps**

Now that you have a basic FastAPI agent service running, here are some ideas for next steps:
- **Add more advanced agents**: Replace the simple agent with your production-ready agent
- **Implement authentication and rate limiting**: Add more sophisticated authentication and rate limiting for production use
- **Add middleware for logging and monitoring**: Implement middleware for request logging and performance monitoring
- **Set up deployment**: Deploy your FastAPI application to a production enviroment using Docker, Kubernetes, or a cloud service
- **Implement async database connections**: Add database integrations for storing conversation history or other data
- **Add background tasks**: Use FastAPI's background tasks for long-running operations