# Notebook 12: Production Deployment

## Learning Objectives
By the end of this notebook, you will be able to:
1. Deploy LangChain applications as APIs
2. Containerize LangChain applications
3. Implement production error handling and monitoring
4. Apply security best practices
5. Optimize performance and costs

## Prerequisites
- Completion of notebooks 00-11
- Understanding of REST APIs
- Basic knowledge of Docker (helpful but not required)

## Setup

In [None]:
# Install required packages
!pip install -q langchain langchain-openai fastapi uvicorn pydantic python-dotenv prometheus-client

In [None]:
import os
import json
import logging
from typing import Dict, List, Optional, Any
from datetime import datetime, timedelta
from pydantic import BaseModel, Field
import time
from functools import wraps
import hashlib

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema import HumanMessage, AIMessage
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import get_openai_callback
from langchain.schema.runnable import RunnablePassthrough

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Set OpenAI API key (replace with your key)
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

## Part 1: Deploying LangChain as an API

### Instructor Activity 1: Building a Production-Ready FastAPI Application

Let me show you how to create a production-ready API service with proper request/response models, error handling, and rate limiting.

In [None]:
# Request and Response Models
class ChatRequest(BaseModel):
    """Request model for chat endpoint"""
    message: str = Field(..., description="User message")
    session_id: str = Field(..., description="Session ID for conversation tracking")
    temperature: float = Field(0.7, ge=0, le=2, description="Model temperature")
    max_tokens: Optional[int] = Field(None, description="Max tokens in response")

class ChatResponse(BaseModel):
    """Response model for chat endpoint"""
    response: str
    session_id: str
    tokens_used: int
    cost: float
    timestamp: str

class HealthCheckResponse(BaseModel):
    """Health check response"""
    status: str
    model_loaded: bool
    timestamp: str

# Rate Limiter
class RateLimiter:
    """Simple in-memory rate limiter"""
    def __init__(self, max_requests: int = 10, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = {}
    
    def is_allowed(self, key: str) -> bool:
        now = time.time()
        
        # Clean old requests
        if key in self.requests:
            self.requests[key] = [
                timestamp for timestamp in self.requests[key]
                if now - timestamp < self.window_seconds
            ]
        else:
            self.requests[key] = []
        
        # Check if allowed
        if len(self.requests[key]) < self.max_requests:
            self.requests[key].append(now)
            return True
        return False

# Production LangChain Service
class LangChainService:
    """Production-ready LangChain service"""
    
    def __init__(self):
        self.llm = None
        self.sessions = {}
        self.rate_limiter = RateLimiter(max_requests=10, window_seconds=60)
        self.initialize_model()
    
    def initialize_model(self):
        """Initialize the model with error handling"""
        try:
            self.llm = ChatOpenAI(
                model="gpt-3.5-turbo",
                temperature=0.7,
                max_retries=3,
                request_timeout=30
            )
            logger.info("Model initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize model: {e}")
            raise
    
    def get_or_create_session(self, session_id: str) -> ConversationBufferMemory:
        """Get or create a conversation session"""
        if session_id not in self.sessions:
            self.sessions[session_id] = ConversationBufferMemory(
                return_messages=True
            )
        return self.sessions[session_id]
    
    async def process_chat(self, request: ChatRequest) -> ChatResponse:
        """Process chat request with full error handling"""
        
        # Rate limiting
        if not self.rate_limiter.is_allowed(request.session_id):
            raise Exception("Rate limit exceeded")
        
        # Get session memory
        memory = self.get_or_create_session(request.session_id)
        
        # Create prompt
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a helpful assistant."),
            MessagesPlaceholder(variable_name="history"),
            ("human", "{input}")
        ])
        
        # Create chain
        chain = prompt | self.llm
        
        # Process with token tracking
        with get_openai_callback() as cb:
            response = chain.invoke({
                "input": request.message,
                "history": memory.chat_memory.messages
            })
            
            # Update memory
            memory.chat_memory.add_user_message(request.message)
            memory.chat_memory.add_ai_message(response.content)
            
            return ChatResponse(
                response=response.content,
                session_id=request.session_id,
                tokens_used=cb.total_tokens,
                cost=cb.total_cost,
                timestamp=datetime.now().isoformat()
            )

# Create FastAPI app (save as app.py)
api_code = '''
from fastapi import FastAPI, HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import uvicorn

app = FastAPI(
    title="LangChain Production API",
    version="1.0.0",
    description="Production-ready LangChain API"
)

# CORS configuration
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Configure appropriately for production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Initialize service
service = LangChainService()

@app.get("/health", response_model=HealthCheckResponse)
async def health_check():
    """Health check endpoint"""
    return HealthCheckResponse(
        status="healthy",
        model_loaded=service.llm is not None,
        timestamp=datetime.now().isoformat()
    )

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    """Main chat endpoint"""
    try:
        return await service.process_chat(request)
    except Exception as e:
        logger.error(f"Chat error: {e}")
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/sessions/{session_id}/history")
async def get_session_history(session_id: str):
    """Get conversation history for a session"""
    if session_id in service.sessions:
        messages = service.sessions[session_id].chat_memory.messages
        return {
            "session_id": session_id,
            "messages": [
                {"role": "human" if isinstance(m, HumanMessage) else "ai",
                 "content": m.content}
                for m in messages
            ]
        }
    raise HTTPException(status_code=404, detail="Session not found")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
'''

print("FastAPI application created!")
print("\nTo run: python app.py")
print("API will be available at: http://localhost:8000")
print("Documentation at: http://localhost:8000/docs")

### Learner Activity 1: Create Your Own API Service

Build a production API that includes:
1. Multiple endpoints (completion, summarization, translation)
2. API key authentication
3. Request validation and error handling

In [None]:
# TODO: Create a multi-endpoint API service
# Requirements:
# 1. Add authentication with API keys
# 2. Create endpoints for different tasks
# 3. Implement proper error handling
# 4. Add request logging

# Your code here:


In [None]:
# SOLUTION (Hidden - Run this cell to see the solution)
solution_1 = '''
from fastapi import Header, HTTPException
from typing import Optional
import secrets

# Authentication
class APIKeyManager:
    """Manage API keys"""
    def __init__(self):
        # In production, store in database
        self.api_keys = {
            "demo-key-123": {"name": "Demo User", "tier": "free"},
            "premium-key-456": {"name": "Premium User", "tier": "premium"}
        }
    
    def validate_key(self, api_key: str) -> Dict:
        if api_key in self.api_keys:
            return self.api_keys[api_key]
        raise HTTPException(status_code=401, detail="Invalid API key")

# Request Models
class CompletionRequest(BaseModel):
    prompt: str
    max_tokens: int = 100
    temperature: float = 0.7

class SummarizationRequest(BaseModel):
    text: str
    max_length: int = 100
    style: str = "bullet_points"  # bullet_points, paragraph, key_points

class TranslationRequest(BaseModel):
    text: str
    source_language: str = "auto"
    target_language: str

# Enhanced Service
class MultiServiceAPI:
    """Multi-endpoint LangChain service"""
    
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")
        self.key_manager = APIKeyManager()
        self.request_log = []
    
    def log_request(self, endpoint: str, user: str, tokens: int):
        """Log API requests"""
        self.request_log.append({
            "timestamp": datetime.now().isoformat(),
            "endpoint": endpoint,
            "user": user,
            "tokens": tokens
        })
    
    async def complete_text(self, request: CompletionRequest, api_key: str):
        """Text completion endpoint"""
        user = self.key_manager.validate_key(api_key)
        
        prompt = ChatPromptTemplate.from_template("{prompt}")
        chain = prompt | self.llm
        
        with get_openai_callback() as cb:
            result = chain.invoke({"prompt": request.prompt})
            self.log_request("completion", user["name"], cb.total_tokens)
            
            return {
                "completion": result.content,
                "tokens_used": cb.total_tokens,
                "cost": cb.total_cost
            }
    
    async def summarize_text(self, request: SummarizationRequest, api_key: str):
        """Text summarization endpoint"""
        user = self.key_manager.validate_key(api_key)
        
        # Different prompts based on style
        prompts = {
            "bullet_points": "Summarize in bullet points:\\n{text}",
            "paragraph": "Summarize in a paragraph:\\n{text}",
            "key_points": "Extract key points:\\n{text}"
        }
        
        prompt = ChatPromptTemplate.from_template(
            prompts.get(request.style, prompts["bullet_points"])
        )
        chain = prompt | self.llm
        
        with get_openai_callback() as cb:
            result = chain.invoke({"text": request.text})
            self.log_request("summarization", user["name"], cb.total_tokens)
            
            return {
                "summary": result.content,
                "style": request.style,
                "tokens_used": cb.total_tokens
            }
    
    async def translate_text(self, request: TranslationRequest, api_key: str):
        """Text translation endpoint"""
        user = self.key_manager.validate_key(api_key)
        
        prompt = ChatPromptTemplate.from_template(
            "Translate the following text to {target_language}:\\n{text}"
        )
        chain = prompt | self.llm
        
        with get_openai_callback() as cb:
            result = chain.invoke({
                "text": request.text,
                "target_language": request.target_language
            })
            self.log_request("translation", user["name"], cb.total_tokens)
            
            return {
                "translation": result.content,
                "source_language": request.source_language,
                "target_language": request.target_language,
                "tokens_used": cb.total_tokens
            }

# FastAPI endpoints
api = MultiServiceAPI()

@app.post("/complete")
async def complete(
    request: CompletionRequest,
    api_key: str = Header(..., alias="X-API-Key")
):
    try:
        return await api.complete_text(request, api_key)
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"Completion error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

@app.post("/summarize")
async def summarize(
    request: SummarizationRequest,
    api_key: str = Header(..., alias="X-API-Key")
):
    try:
        return await api.summarize_text(request, api_key)
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"Summarization error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

@app.post("/translate")
async def translate(
    request: TranslationRequest,
    api_key: str = Header(..., alias="X-API-Key")
):
    try:
        return await api.translate_text(request, api_key)
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"Translation error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

@app.get("/usage")
async def get_usage(
    api_key: str = Header(..., alias="X-API-Key")
):
    """Get API usage statistics"""
    user = api.key_manager.validate_key(api_key)
    user_logs = [log for log in api.request_log if log["user"] == user["name"]]
    
    return {
        "user": user["name"],
        "tier": user["tier"],
        "total_requests": len(user_logs),
        "total_tokens": sum(log["tokens"] for log in user_logs),
        "recent_requests": user_logs[-10:]  # Last 10 requests
    }

print("Multi-endpoint API with authentication created!")
'''

print(solution_1 if input("Show solution? (y/n): ").lower() == 'y' else "Keep trying!")

## Part 2: Containerization and Deployment

### Instructor Activity 2: Docker Deployment with Best Practices

Let me demonstrate how to containerize your LangChain application with production best practices.

In [None]:
# Create Dockerfile
dockerfile_content = '''
# Multi-stage build for smaller image
FROM python:3.9-slim as builder

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .

# Install Python dependencies
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.9-slim

# Create non-root user
RUN useradd -m -u 1000 appuser

# Set working directory
WORKDIR /app

# Copy dependencies from builder
COPY --from=builder /root/.local /home/appuser/.local

# Copy application code
COPY --chown=appuser:appuser . .

# Switch to non-root user
USER appuser

# Add user local bin to PATH
ENV PATH=/home/appuser/.local/bin:$PATH

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
'''

# Create docker-compose.yml
docker_compose_content = '''
version: '3.8'

services:
  langchain-api:
    build: .
    container_name: langchain-prod
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - LOG_LEVEL=INFO
      - MAX_WORKERS=4
    volumes:
      - ./logs:/app/logs
      - ./data:/app/data
    restart: unless-stopped
    networks:
      - langchain-network
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

  redis:
    image: redis:alpine
    container_name: langchain-redis
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    networks:
      - langchain-network
    command: redis-server --appendonly yes

  prometheus:
    image: prom/prometheus
    container_name: langchain-prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    networks:
      - langchain-network
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

networks:
  langchain-network:
    driver: bridge

volumes:
  redis-data:
  prometheus-data:
'''

# Create requirements.txt
requirements_content = '''
langchain==0.1.0
langchain-openai==0.0.5
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
python-dotenv==1.0.0
redis==5.0.1
prometheus-client==0.19.0
httpx==0.25.2
python-multipart==0.0.6
'''

# Environment configuration
class ProductionConfig:
    """Production configuration management"""
    
    def __init__(self):
        self.load_environment()
    
    def load_environment(self):
        """Load environment variables with validation"""
        from dotenv import load_dotenv
        load_dotenv()
        
        # Required variables
        self.openai_api_key = os.getenv("OPENAI_API_KEY")
        if not self.openai_api_key:
            raise ValueError("OPENAI_API_KEY is required")
        
        # Optional with defaults
        self.log_level = os.getenv("LOG_LEVEL", "INFO")
        self.max_workers = int(os.getenv("MAX_WORKERS", "4"))
        self.redis_url = os.getenv("REDIS_URL", "redis://localhost:6379")
        self.enable_metrics = os.getenv("ENABLE_METRICS", "true").lower() == "true"
    
    def get_logging_config(self):
        """Get logging configuration"""
        return {
            "version": 1,
            "disable_existing_loggers": False,
            "formatters": {
                "default": {
                    "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
                },
                "json": {
                    "format": "%(asctime)s %(name)s %(levelname)s %(message)s",
                    "class": "pythonjsonlogger.jsonlogger.JsonFormatter"
                }
            },
            "handlers": {
                "console": {
                    "class": "logging.StreamHandler",
                    "level": self.log_level,
                    "formatter": "default",
                    "stream": "ext://sys.stdout"
                },
                "file": {
                    "class": "logging.handlers.RotatingFileHandler",
                    "level": self.log_level,
                    "formatter": "json",
                    "filename": "logs/app.log",
                    "maxBytes": 10485760,  # 10MB
                    "backupCount": 5
                }
            },
            "root": {
                "level": self.log_level,
                "handlers": ["console", "file"]
            }
        }

print("Docker configuration created!")
print("\nTo build and run:")
print("1. docker-compose build")
print("2. docker-compose up -d")
print("3. docker-compose logs -f")

### Learner Activity 2: Build a Complete Deployment Pipeline

Create a deployment pipeline with:
1. Multi-stage Docker build
2. Environment-specific configurations
3. Kubernetes deployment manifests

In [None]:
# TODO: Create Kubernetes deployment configuration
# Requirements:
# 1. Create deployment.yaml with proper resource limits
# 2. Create service.yaml for load balancing
# 3. Create configmap.yaml for configuration
# 4. Create secret.yaml for sensitive data

# Your code here:


In [None]:
# SOLUTION (Hidden - Run this cell to see the solution)
solution_2 = '''
# Kubernetes Deployment Configuration

# deployment.yaml
deployment_yaml = """\napiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api
  labels:
    app: langchain
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain
  template:
    metadata:
      labels:
        app: langchain
    spec:
      containers:
      - name: langchain-api
        image: langchain-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: langchain-secrets
              key: openai-api-key
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: langchain-config
              key: log.level
        - name: REDIS_URL
          value: redis://redis-service:6379
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
"""

# service.yaml
service_yaml = """\napiVersion: v1
kind: Service
metadata:
  name: langchain-service
  labels:
    app: langchain
spec:
  type: LoadBalancer
  selector:
    app: langchain
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600
"""

# configmap.yaml
configmap_yaml = """\napiVersion: v1
kind: ConfigMap
metadata:
  name: langchain-config
data:
  log.level: "INFO"
  max.workers: "4"
  rate.limit: "100"
  cache.ttl: "3600"
"""

# secret.yaml
secret_yaml = """\napiVersion: v1
kind: Secret
metadata:
  name: langchain-secrets
type: Opaque
data:
  # Base64 encoded values
  openai-api-key: <base64-encoded-api-key>
  database-url: <base64-encoded-db-url>
"""

# hpa.yaml (Horizontal Pod Autoscaler)
hpa_yaml = """\napiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: langchain-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langchain-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
"""

# ingress.yaml
ingress_yaml = """\napiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: langchain-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - api.yourdomain.com
    secretName: langchain-tls
  rules:
  - host: api.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: langchain-service
            port:
              number: 80
"""

# Helm values.yaml for more advanced deployment
helm_values = """\nreplicaCount: 3

image:
  repository: your-registry/langchain-api
  pullPolicy: IfNotPresent
  tag: "1.0.0"

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: api.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: langchain-tls
      hosts:
        - api.yourdomain.com

resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 250m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

env:
  - name: LOG_LEVEL
    value: "INFO"
  - name: REDIS_URL
    value: "redis://redis:6379"

secrets:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: langchain-secrets
        key: openai-api-key
"""

print("Kubernetes deployment configuration created!")
print("\\nTo deploy:")
print("1. kubectl apply -f configmap.yaml")
print("2. kubectl apply -f secret.yaml")
print("3. kubectl apply -f deployment.yaml")
print("4. kubectl apply -f service.yaml")
print("5. kubectl apply -f hpa.yaml")
print("6. kubectl apply -f ingress.yaml")
'''

print(solution_2 if input("Show solution? (y/n): ").lower() == 'y' else "Keep trying!")

## Part 3: Monitoring, Security, and Optimization

### Instructor Activity 3: Production Monitoring and Security

Let me show you how to implement comprehensive monitoring, security, and optimization for production.

In [None]:
from prometheus_client import Counter, Histogram, Gauge, generate_latest
import asyncio
from typing import Callable
from functools import lru_cache
import jwt

# Metrics
request_count = Counter('langchain_requests_total', 'Total requests', ['endpoint', 'status'])
request_duration = Histogram('langchain_request_duration_seconds', 'Request duration', ['endpoint'])
active_sessions = Gauge('langchain_active_sessions', 'Active chat sessions')
token_usage = Counter('langchain_tokens_total', 'Total tokens used', ['model', 'operation'])

# Security Manager
class SecurityManager:
    """Handle security concerns"""
    
    def __init__(self, secret_key: str):
        self.secret_key = secret_key
        self.blocked_ips = set()
        self.rate_limits = {}
    
    def generate_jwt(self, user_id: str, tier: str) -> str:
        """Generate JWT token"""
        payload = {
            "user_id": user_id,
            "tier": tier,
            "exp": datetime.utcnow() + timedelta(hours=24),
            "iat": datetime.utcnow()
        }
        return jwt.encode(payload, self.secret_key, algorithm="HS256")
    
    def verify_jwt(self, token: str) -> Dict:
        """Verify JWT token"""
        try:
            return jwt.decode(token, self.secret_key, algorithms=["HS256"])
        except jwt.ExpiredSignatureError:
            raise HTTPException(status_code=401, detail="Token expired")
        except jwt.InvalidTokenError:
            raise HTTPException(status_code=401, detail="Invalid token")
    
    def sanitize_input(self, text: str) -> str:
        """Sanitize user input"""
        # Remove potential injection patterns
        dangerous_patterns = [
            r"<script.*?>.*?</script>",
            r"javascript:",
            r"on\w+=",
            r"eval\(",
            r"exec\("
        ]
        
        import re
        for pattern in dangerous_patterns:
            text = re.sub(pattern, "", text, flags=re.IGNORECASE)
        
        return text.strip()
    
    def check_content_safety(self, text: str) -> bool:
        """Check content for safety issues"""
        # Implement content moderation
        # In production, use OpenAI's moderation API or similar
        return True

# Caching Layer
class CacheManager:
    """Manage response caching"""
    
    def __init__(self, redis_url: str = None):
        self.cache = {}
        self.ttl = 3600  # 1 hour
        
        # Use Redis in production
        if redis_url:
            import redis
            self.redis_client = redis.from_url(redis_url)
        else:
            self.redis_client = None
    
    def get_cache_key(self, prompt: str, params: Dict) -> str:
        """Generate cache key"""
        content = f"{prompt}:{json.dumps(params, sort_keys=True)}"
        return hashlib.md5(content.encode()).hexdigest()
    
    async def get(self, key: str) -> Optional[str]:
        """Get cached response"""
        if self.redis_client:
            value = self.redis_client.get(key)
            return value.decode() if value else None
        return self.cache.get(key)
    
    async def set(self, key: str, value: str, ttl: int = None):
        """Set cached response"""
        ttl = ttl or self.ttl
        if self.redis_client:
            self.redis_client.setex(key, ttl, value)
        else:
            self.cache[key] = value

# Performance Optimizer
class PerformanceOptimizer:
    """Optimize LangChain performance"""
    
    def __init__(self):
        self.model_pool = {}
        self.batch_queue = []
        self.batch_size = 10
        self.batch_timeout = 1.0
    
    @lru_cache(maxsize=5)
    def get_model(self, model_name: str, temperature: float):
        """Get cached model instance"""
        return ChatOpenAI(
            model=model_name,
            temperature=temperature,
            max_retries=3
        )
    
    async def batch_process(self, requests: List[Dict]) -> List[Dict]:
        """Process requests in batches"""
        # Batch similar requests together
        batches = {}
        for req in requests:
            key = f"{req['model']}:{req['temperature']}"
            if key not in batches:
                batches[key] = []
            batches[key].append(req)
        
        results = []
        for key, batch in batches.items():
            model_name, temp = key.split(":")
            model = self.get_model(model_name, float(temp))
            
            # Process batch
            batch_results = await asyncio.gather(*[
                self.process_single(model, req) for req in batch
            ])
            results.extend(batch_results)
        
        return results
    
    async def process_single(self, model, request):
        """Process single request"""
        # Implementation here
        pass

# Complete Production System
class ProductionLangChain:
    """Production-ready LangChain system"""
    
    def __init__(self, config: ProductionConfig):
        self.config = config
        self.security = SecurityManager(secret_key="your-secret-key")
        self.cache = CacheManager(redis_url=config.redis_url)
        self.optimizer = PerformanceOptimizer()
        self.llm = ChatOpenAI(model="gpt-3.5-turbo")
    
    async def process_request(self, request: Dict) -> Dict:
        """Process request with all production features"""
        
        # Start metrics
        start_time = time.time()
        
        try:
            # Security checks
            sanitized_input = self.security.sanitize_input(request["message"])
            if not self.security.check_content_safety(sanitized_input):
                raise ValueError("Content safety check failed")
            
            # Check cache
            cache_key = self.cache.get_cache_key(
                sanitized_input, 
                {"temperature": request.get("temperature", 0.7)}
            )
            cached_response = await self.cache.get(cache_key)
            
            if cached_response:
                logger.info("Cache hit")
                return json.loads(cached_response)
            
            # Process with LangChain
            with get_openai_callback() as cb:
                response = await self.llm.ainvoke(sanitized_input)
                
                # Update metrics
                token_usage.labels(
                    model="gpt-3.5-turbo",
                    operation="chat"
                ).inc(cb.total_tokens)
                
                result = {
                    "response": response.content,
                    "tokens": cb.total_tokens,
                    "cost": cb.total_cost,
                    "cached": False
                }
                
                # Cache response
                await self.cache.set(cache_key, json.dumps(result))
                
                return result
        
        except Exception as e:
            # Log error
            logger.error(f"Request processing error: {e}")
            request_count.labels(endpoint="chat", status="error").inc()
            raise
        
        finally:
            # Record metrics
            duration = time.time() - start_time
            request_duration.labels(endpoint="chat").observe(duration)
            request_count.labels(endpoint="chat", status="success").inc()

# Cost Optimization
class CostOptimizer:
    """Optimize API costs"""
    
    def __init__(self):
        self.model_costs = {
            "gpt-4": {"input": 0.03, "output": 0.06},
            "gpt-3.5-turbo": {"input": 0.001, "output": 0.002},
            "gpt-3.5-turbo-16k": {"input": 0.003, "output": 0.004}
        }
        self.usage_history = []
    
    def select_model(self, task_complexity: str, max_cost: float) -> str:
        """Select appropriate model based on task and budget"""
        if task_complexity == "simple" and max_cost < 0.01:
            return "gpt-3.5-turbo"
        elif task_complexity == "complex" and max_cost > 0.1:
            return "gpt-4"
        else:
            return "gpt-3.5-turbo-16k"
    
    def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost for a request"""
        costs = self.model_costs.get(model, self.model_costs["gpt-3.5-turbo"])
        input_cost = (input_tokens / 1000) * costs["input"]
        output_cost = (output_tokens / 1000) * costs["output"]
        return input_cost + output_cost
    
    def get_usage_report(self) -> Dict:
        """Generate usage and cost report"""
        total_cost = sum(item["cost"] for item in self.usage_history)
        total_tokens = sum(item["tokens"] for item in self.usage_history)
        
        return {
            "total_cost": total_cost,
            "total_tokens": total_tokens,
            "average_cost_per_request": total_cost / len(self.usage_history) if self.usage_history else 0,
            "requests_count": len(self.usage_history)
        }

print("Production monitoring and security system created!")
print("\nKey features implemented:")
print("- JWT authentication")
print("- Input sanitization")
print("- Response caching")
print("- Prometheus metrics")
print("- Cost optimization")
print("- Performance monitoring")

### Learner Activity 3: Build a Complete Production System

Create a production-ready system with:
1. Complete observability (logs, metrics, traces)
2. A/B testing capability
3. Automatic failover and circuit breakers
4. Cost tracking and optimization

In [None]:
# TODO: Build a complete production system
# Requirements:
# 1. Implement distributed tracing
# 2. Add A/B testing for model selection
# 3. Implement circuit breakers
# 4. Create cost tracking dashboard

# Your code here:


In [None]:
# SOLUTION (Hidden - Run this cell to see the solution)
solution_3 = '''
from opentelemetry import trace
from opentelemetry.exporter.jaeger import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import random
from enum import Enum

# Circuit Breaker States
class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

# Circuit Breaker
class CircuitBreaker:
    """Circuit breaker for fault tolerance"""
    
    def __init__(self, failure_threshold: int = 5, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
    
    def call(self, func: Callable, *args, **kwargs):
        """Execute function with circuit breaker"""
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            if self.state == CircuitState.HALF_OPEN:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
            return result
        
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()
            
            if self.failure_count >= self.failure_threshold:
                self.state = CircuitState.OPEN
            
            raise e

# A/B Testing Manager
class ABTestManager:
    """Manage A/B testing for models"""
    
    def __init__(self):
        self.experiments = {}
        self.results = {}
    
    def create_experiment(self, name: str, variants: Dict[str, Dict]):
        """Create new A/B test"""
        self.experiments[name] = {
            "variants": variants,
            "results": {v: {"count": 0, "success": 0, "tokens": 0, "cost": 0} 
                       for v in variants}
        }
    
    def select_variant(self, experiment_name: str, user_id: str = None) -> str:
        """Select variant for user"""
        if user_id:
            # Consistent hashing for user
            hash_value = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
            variants = list(self.experiments[experiment_name]["variants"].keys())
            return variants[hash_value % len(variants)]
        else:
            # Random selection
            return random.choice(list(self.experiments[experiment_name]["variants"].keys()))
    
    def record_result(self, experiment: str, variant: str, success: bool, 
                     tokens: int, cost: float):
        """Record experiment result"""
        results = self.experiments[experiment]["results"][variant]
        results["count"] += 1
        if success:
            results["success"] += 1
        results["tokens"] += tokens
        results["cost"] += cost
    
    def get_results(self, experiment: str) -> Dict:
        """Get experiment results with statistical significance"""
        results = self.experiments[experiment]["results"]
        
        # Calculate metrics
        for variant, data in results.items():
            if data["count"] > 0:
                data["success_rate"] = data["success"] / data["count"]
                data["avg_tokens"] = data["tokens"] / data["count"]
                data["avg_cost"] = data["cost"] / data["count"]
        
        return results

# Distributed Tracing
class TracingManager:
    """Manage distributed tracing"""
    
    def __init__(self):
        # Configure Jaeger exporter
        trace.set_tracer_provider(TracerProvider())
        self.tracer = trace.get_tracer(__name__)
        
        # In production, configure actual Jaeger endpoint
        # jaeger_exporter = JaegerExporter(
        #     agent_host_name="localhost",
        #     agent_port=6831,
        # )
        # span_processor = BatchSpanProcessor(jaeger_exporter)
        # trace.get_tracer_provider().add_span_processor(span_processor)
    
    def trace_operation(self, name: str):
        """Create trace span"""
        return self.tracer.start_as_current_span(name)

# Complete Production System
class ProductionSystemComplete:
    """Complete production-ready system"""
    
    def __init__(self):
        self.circuit_breakers = {
            "openai": CircuitBreaker(failure_threshold=5, timeout=60),
            "anthropic": CircuitBreaker(failure_threshold=3, timeout=30)
        }
        self.ab_testing = ABTestManager()
        self.tracing = TracingManager()
        self.cost_tracker = CostTracker()
        
        # Setup A/B test
        self.ab_testing.create_experiment(
            "model_selection",
            {
                "gpt-3.5": {"model": "gpt-3.5-turbo", "temperature": 0.7},
                "gpt-4": {"model": "gpt-4", "temperature": 0.5}
            }
        )
    
    async def process_with_failover(self, request: Dict) -> Dict:
        """Process with automatic failover"""
        
        with self.tracing.trace_operation("process_request"):
            # Select variant for A/B testing
            variant = self.ab_testing.select_variant(
                "model_selection", 
                request.get("user_id")
            )
            model_config = self.ab_testing.experiments["model_selection"]["variants"][variant]
            
            # Try primary provider
            try:
                with self.tracing.trace_operation("primary_call"):
                    result = await self.circuit_breakers["openai"].call(
                        self.call_openai, 
                        request["message"],
                        model_config
                    )
                    
                    # Record A/B test result
                    self.ab_testing.record_result(
                        "model_selection",
                        variant,
                        True,
                        result["tokens"],
                        result["cost"]
                    )
                    
                    # Track costs
                    self.cost_tracker.record_usage(
                        model_config["model"],
                        result["tokens"],
                        result["cost"]
                    )
                    
                    return result
            
            except Exception as e:
                logger.warning(f"Primary provider failed: {e}")
                
                # Failover to secondary provider
                with self.tracing.trace_operation("failover_call"):
                    try:
                        return await self.circuit_breakers["anthropic"].call(
                            self.call_anthropic,
                            request["message"]
                        )
                    except Exception as e2:
                        logger.error(f"All providers failed: {e2}")
                        raise
    
    async def call_openai(self, message: str, config: Dict) -> Dict:
        """Call OpenAI API"""
        llm = ChatOpenAI(model=config["model"], temperature=config["temperature"])
        
        with get_openai_callback() as cb:
            response = await llm.ainvoke(message)
            return {
                "response": response.content,
                "tokens": cb.total_tokens,
                "cost": cb.total_cost,
                "provider": "openai"
            }
    
    async def call_anthropic(self, message: str) -> Dict:
        """Call Anthropic API (fallback)"""
        # Implementation for Anthropic
        return {
            "response": "Fallback response",
            "tokens": 100,
            "cost": 0.01,
            "provider": "anthropic"
        }

# Cost Tracking Dashboard
class CostTracker:
    """Track and visualize costs"""
    
    def __init__(self):
        self.usage_data = []
        self.budget_limits = {
            "daily": 100.0,
            "monthly": 2000.0
        }
    
    def record_usage(self, model: str, tokens: int, cost: float):
        """Record usage data"""
        self.usage_data.append({
            "timestamp": datetime.now(),
            "model": model,
            "tokens": tokens,
            "cost": cost
        })
        
        # Check budget
        if self.is_over_budget():
            logger.warning("Budget limit exceeded!")
    
    def is_over_budget(self) -> bool:
        """Check if over budget"""
        today_cost = sum(
            item["cost"] for item in self.usage_data
            if item["timestamp"].date() == datetime.now().date()
        )
        return today_cost > self.budget_limits["daily"]
    
    def get_dashboard_data(self) -> Dict:
        """Get data for cost dashboard"""
        now = datetime.now()
        
        # Calculate metrics
        today_data = [d for d in self.usage_data if d["timestamp"].date() == now.date()]
        month_data = [d for d in self.usage_data if d["timestamp"].month == now.month]
        
        return {
            "today": {
                "cost": sum(d["cost"] for d in today_data),
                "tokens": sum(d["tokens"] for d in today_data),
                "requests": len(today_data),
                "budget_remaining": self.budget_limits["daily"] - sum(d["cost"] for d in today_data)
            },
            "month": {
                "cost": sum(d["cost"] for d in month_data),
                "tokens": sum(d["tokens"] for d in month_data),
                "requests": len(month_data),
                "budget_remaining": self.budget_limits["monthly"] - sum(d["cost"] for d in month_data)
            },
            "by_model": self.get_costs_by_model(),
            "hourly_trend": self.get_hourly_trend()
        }
    
    def get_costs_by_model(self) -> Dict:
        """Get costs breakdown by model"""
        costs = {}
        for item in self.usage_data:
            model = item["model"]
            if model not in costs:
                costs[model] = {"cost": 0, "tokens": 0, "count": 0}
            costs[model]["cost"] += item["cost"]
            costs[model]["tokens"] += item["tokens"]
            costs[model]["count"] += 1
        return costs
    
    def get_hourly_trend(self) -> List:
        """Get hourly usage trend"""
        hourly = {}
        for item in self.usage_data:
            hour = item["timestamp"].hour
            if hour not in hourly:
                hourly[hour] = {"cost": 0, "requests": 0}
            hourly[hour]["cost"] += item["cost"]
            hourly[hour]["requests"] += 1
        
        return [
            {"hour": h, **data} 
            for h, data in sorted(hourly.items())
        ]

# Usage example
system = ProductionSystemComplete()

# Process request with all production features
async def main():
    request = {
        "message": "Tell me about Python",
        "user_id": "user123"
    }
    
    result = await system.process_with_failover(request)
    print(f"Result: {result}")
    
    # Get A/B test results
    ab_results = system.ab_testing.get_results("model_selection")
    print(f"A/B Test Results: {ab_results}")
    
    # Get cost dashboard
    dashboard = system.cost_tracker.get_dashboard_data()
    print(f"Cost Dashboard: {dashboard}")

print("Complete production system with:")
print("- Circuit breakers for fault tolerance")
print("- A/B testing for model selection")
print("- Distributed tracing with Jaeger")
print("- Cost tracking and budget management")
print("- Automatic failover between providers")
'''

print(solution_3 if input("Show solution? (y/n): ").lower() == 'y' else "Keep trying!")

## Summary and Next Steps

### What You've Learned
1. **API Deployment**: FastAPI with proper models, authentication, and rate limiting
2. **Containerization**: Docker best practices and Kubernetes deployment
3. **Security**: JWT authentication, input sanitization, content moderation
4. **Monitoring**: Prometheus metrics, distributed tracing, logging
5. **Optimization**: Caching, batching, model selection, cost tracking
6. **Resilience**: Circuit breakers, failover, A/B testing

### Production Checklist
- [ ] API documentation and versioning
- [ ] Comprehensive error handling
- [ ] Security measures (auth, sanitization, rate limiting)
- [ ] Monitoring and alerting
- [ ] Performance optimization
- [ ] Cost tracking and optimization
- [ ] Deployment automation (CI/CD)
- [ ] Disaster recovery plan
- [ ] Load testing and capacity planning
- [ ] Compliance and data privacy

### Best Practices
1. **Start small**: Deploy MVP, then iterate
2. **Monitor everything**: You can't improve what you don't measure
3. **Plan for failure**: Everything will fail eventually
4. **Optimize costs**: Track and optimize API usage
5. **Security first**: Never compromise on security
6. **Document everything**: Future you will thank you

### Further Learning
- LangSmith for production monitoring
- LangServe for deployment
- Cloud provider specific optimizations (AWS, GCP, Azure)
- Advanced caching strategies
- Multi-region deployment
- GraphQL APIs for LangChain

### Final Project Ideas
1. Build a production-ready chatbot with full observability
2. Create a multi-tenant LangChain SaaS platform
3. Implement a cost-optimized document processing pipeline
4. Deploy a globally distributed AI API with <100ms latency

Congratulations! You've completed the LangChain for Beginners course and are ready to deploy production applications! 🚀"