![Thinkube AI Lab](../icons/tk_full_logo.svg)

# Production Agent Deployment 🚀

Deploy agent systems to production:
- FastAPI integration
- Langfuse tracing
- Error handling and resilience
- Rate limiting and quotas
- Kubernetes deployment
- Health checks and monitoring

## Production Considerations

Moving agents to production requires:

- **API Layer**: Expose agents via REST API
- **Observability**: Track all agent interactions
- **Error Handling**: Graceful failures and retries
- **Rate Limiting**: Protect against abuse
- **Async Execution**: Handle long-running tasks
- **Scalability**: Horizontal scaling
- **Security**: Authentication and authorization

## FastAPI Integration

Create API endpoints for agents:

In [None]:
# FastAPI agent endpoint
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel

# TODO: Define request model
# TODO: Define response model  
# TODO: Create FastAPI app
# TODO: Create endpoint that runs agent
# TODO: Add async execution for long tasks
# TODO: Return task ID for status checking

## Langfuse Integration

Full observability in production:

In [None]:
# Add Langfuse tracing to API
from langfuse import Langfuse
from langfuse.decorators import observe

# TODO: Setup Langfuse client
# TODO: Decorate agent functions with @observe
# TODO: Create trace for each API request
# TODO: Log user ID and request metadata
# TODO: Track costs per user
# TODO: Display trace link in response

## Error Handling

Robust error handling patterns:

In [None]:
# Error handling and retries
from tenacity import retry, stop_after_attempt, wait_exponential

# TODO: Add retry decorator to agent calls
# TODO: Handle specific exceptions (rate limits, timeouts)
# TODO: Implement circuit breaker pattern
# TODO: Log errors to Langfuse
# TODO: Return user-friendly error messages

## Rate Limiting

Protect your API:

In [None]:
# Rate limiting with Valkey
import redis
from fastapi import HTTPException

# TODO: Connect to Valkey
# TODO: Implement sliding window rate limiter
# TODO: Track requests per user
# TODO: Raise HTTPException when limit exceeded
# TODO: Add rate limit headers to response

## Async Task Queue

Handle long-running agent tasks:

In [None]:
# Async task processing with Celery/NATS

# TODO: Define task queue using NATS
# TODO: Submit agent task to queue
# TODO: Return task ID immediately
# TODO: Create status endpoint
# TODO: Store results in PostgreSQL or Valkey
# TODO: Implement result retrieval

## Kubernetes Deployment

Deploy as a Kubernetes service:

In [None]:
# Kubernetes manifest example

# TODO: Show Dockerfile for agent API
# TODO: Define Deployment manifest
# TODO: Add resource limits (CPU, memory)
# TODO: Configure environment variables
# TODO: Add liveness and readiness probes
# TODO: Create Service and Ingress

## Health Checks

Monitor service health:

In [None]:
# Health check endpoints

# TODO: Create /health endpoint (liveness)
# TODO: Create /ready endpoint (readiness)
# TODO: Check LLM connectivity
# TODO: Check vector database connectivity
# TODO: Check NATS connectivity
# TODO: Return detailed health status

## Monitoring and Alerting

Track metrics in production:

In [None]:
# Prometheus metrics
from prometheus_client import Counter, Histogram

# TODO: Define metrics (request count, latency, errors)
# TODO: Instrument agent endpoints
# TODO: Track agent execution time
# TODO: Track LLM token usage
# TODO: Expose /metrics endpoint
# TODO: Configure alerts in Prometheus

## Security Considerations

Secure your agent API:

In [None]:
# Authentication and authorization

# TODO: Add Keycloak OAuth2 authentication
# TODO: Validate JWT tokens
# TODO: Implement RBAC for agent access
# TODO: Sanitize user inputs
# TODO: Add CORS configuration
# TODO: Enable HTTPS only

## Cost Tracking

Monitor and control costs:

In [None]:
# Cost tracking and budgets

# TODO: Track token usage per user
# TODO: Calculate costs using Langfuse
# TODO: Set user budgets in database
# TODO: Check budget before agent execution
# TODO: Send alerts when approaching limits
# TODO: Display cost dashboard

## Testing in Production

Validate deployments:

In [None]:
# Integration tests

# TODO: Create test suite for API endpoints
# TODO: Test agent workflows end-to-end
# TODO: Test error scenarios
# TODO: Test rate limiting
# TODO: Load testing with locust
# TODO: Measure p95/p99 latencies

## Best Practices

- ✅ Always use async execution for long tasks
- ✅ Implement comprehensive error handling
- ✅ Add observability from day one
- ✅ Set resource limits in Kubernetes
- ✅ Use circuit breakers for external services
- ✅ Implement graceful shutdown
- ✅ Version your agent API
- ✅ Monitor costs continuously
- ✅ Test failure scenarios
- ✅ Document API thoroughly

## Resources

- Thinkube Documentation: https://docs.thinkube.com
- Langfuse Docs: https://langfuse.com/docs
- FastAPI Docs: https://fastapi.tiangolo.com