# Machine Learning Model Deployment Techniques with FastAPI

This tutorial demonstrates various techniques for deploying machine learning models in production environments using FastAPI. We'll cover the following deployment strategies:

1. Single Deployment
2. Silent Deployment
3. Canary Deployment
4. Multi-armed Bandit Deployment

We'll use a simple example scenario with two models: a current model in production and a new model we want to deploy. We'll implement these deployments using FastAPI and discuss their pros and cons.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
import random

# Set random seed for reproducibility
np.random.seed(42)

## Data Generation and Model Training

Let's start by generating some synthetic data and training two models: a "current" model (Logistic Regression) and a "new" model (Random Forest) that we want to deploy.

In [None]:
# Generate synthetic data
X, y = make_classification(n_samples=10000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train "current" model (Logistic Regression)
current_model = LogisticRegression(random_state=42)
current_model.fit(X_train, y_train)

# Train "new" model (Random Forest)
new_model = RandomForestClassifier(n_estimators=100, random_state=42)
new_model.fit(X_train, y_train)

print(f"Current model accuracy: {accuracy_score(y_test, current_model.predict(X_test)):.4f}")
print(f"New model accuracy: {accuracy_score(y_test, new_model.predict(X_test)):.4f}")

# Save models
joblib.dump(current_model, 'current_model.joblib')
joblib.dump(new_model, 'new_model.joblib')

## FastAPI Setup

Now, let's set up a basic FastAPI application and define a Pydantic model for our input data.

In [None]:
app = FastAPI()

class InputData(BaseModel):
    features: list

@app.get("/")
def read_root():
    return {"message": "Welcome to the ML Model Deployment API"}

## 1. Single Deployment

In single deployment, we replace the current model with the new model all at once.

In [None]:
# Load the new model
model = joblib.load('new_model.joblib')

@app.post("/predict/single")
def predict_single(data: InputData):
    try:
        features = np.array(data.features).reshape(1, -1)
        prediction = model.predict(features)[0]
        return {"prediction": int(prediction)}
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

To run this single deployment:

```python
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
```

Pros of Single Deployment:
- Simple to implement and understand
- Quick to execute

Cons of Single Deployment:
- Risky - if the new model performs poorly, it affects all users immediately
- No gradual transition period
- Difficult to revert quickly if issues arise

## 2. Silent Deployment

In silent deployment, we deploy the new model alongside the current model, but only use it for logging and comparison without affecting the actual output.

In [None]:
# Load both models
current_model = joblib.load('current_model.joblib')
new_model = joblib.load('new_model.joblib')

@app.post("/predict/silent")
def predict_silent(data: InputData):
    try:
        features = np.array(data.features).reshape(1, -1)
        current_prediction = current_model.predict(features)[0]
        new_prediction = new_model.predict(features)[0]
        
        # Log the predictions (in a real scenario, you'd use a proper logging system)
        print(f"Current model prediction: {current_prediction}, New model prediction: {new_prediction}")
        
        # Return only the current model's prediction
        return {"prediction": int(current_prediction)}
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

Pros of Silent Deployment:
- Very safe - no impact on users
- Allows thorough testing and comparison in the production environment
- Provides real-world performance data for the new model

Cons of Silent Deployment:
- Requires additional computational resources
- Delays the benefits of the new model (if it performs better)
- May require significant logging and analysis infrastructure

## 3. Canary Deployment

In canary deployment, we gradually roll out the new model to a small percentage of users, increasing over time if performance is satisfactory.

In [None]:
# Load both models
current_model = joblib.load('current_model.joblib')
new_model = joblib.load('new_model.joblib')

# Canary deployment settings
canary_percentage = 0.1  # 10% of traffic goes to the new model

@app.post("/predict/canary")
def predict_canary(data: InputData):
    try:
        features = np.array(data.features).reshape(1, -1)
        
        if random.random() < canary_percentage:
            # Use new model
            prediction = new_model.predict(features)[0]
            model_used = "new"
        else:
            # Use current model
            prediction = current_model.predict(features)[0]
            model_used = "current"
        
        return {"prediction": int(prediction), "model_used": model_used}
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

Pros of Canary Deployment:
- Reduces risk by limiting exposure to the new model
- Allows for gradual rollout and monitoring
- Easier to rollback if issues are detected

Cons of Canary Deployment:
- More complex to implement and manage
- May lead to inconsistent user experiences during the transition
- Requires careful monitoring and decision-making during the rollout process

## 4. Multi-armed Bandit Deployment

The multi-armed bandit approach dynamically allocates traffic to the model that performs best, continually learning and adapting.

In [None]:
# Load both models
current_model = joblib.load('current_model.joblib')
new_model = joblib.load('new_model.joblib')

# Multi-armed bandit settings
epsilon = 0.1
current_model_correct = 0
new_model_correct = 0
current_model_count = 0
new_model_count = 0

@app.post("/predict/bandit")
def predict_bandit(data: InputData):
    global current_model_correct, new_model_correct, current_model_count, new_model_count
    
    try:
        features = np.array(data.features).reshape(1, -1)
        
        if random.random() < epsilon:
            # Explore: randomly choose a model
            use_new_model = random.choice([True, False])
        else:
            # Exploit: choose the model with the higher success rate
            current_rate = current_model_correct / current_model_count if current_model_count > 0 else 0
            new_rate = new_model_correct / new_model_count if new_model_count > 0 else 0
            use_new_model = new_rate >= current_rate
        
        if use_new_model:
            prediction = new_model.predict(features)[0]
            new_model_count += 1
            model_used = "new"
        else:
            prediction = current_model.predict(features)[0]
            current_model_count += 1
            model_used = "current"
        
        # In a real-world scenario, you'd update the correct predictions based on feedback
        # Here, we're simulating it with a random choice
        if random.random() < 0.8:  # Assuming 80% accuracy
            if use_new_model:
                new_model_correct += 1
            else:
                current_model_correct += 1
        
        return {"prediction": int(prediction), "model_used": model_used}
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

Pros of Multi-armed Bandit Deployment:
- Automatically adapts to the best-performing model
- Balances exploration of the new model with exploitation of the best-known model
- Can handle multiple models simultaneously

Cons of Multi-armed Bandit Deployment:
- More complex to implement and understand
- Requires a feedback mechanism to update model performance
- May lead to inconsistent user experiences

## Running the FastAPI Application

To run the FastAPI application with all deployment strategies:

In [None]:
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

## Choosing the Right Deployment Strategy

The choice of deployment strategy depends on various factors specific to your project and organization. Here are some guidelines:

1. **Single Deployment**: Use when you're very confident in the new model's performance and can afford a short downtime. Suitable for non-critical applications or when you have a robust rollback plan.

2. **Silent Deployment**: Ideal when you want to thoroughly test a new model in a production environment without any risk. Use this when you have the resources to run both models simultaneously and can afford the time for extended testing.

3. **Canary Deployment**: Good for gradually introducing a new model while closely monitoring its performance. Use this when you want to limit potential negative impacts and have the capability to quickly adjust the traffic distribution or rollback if needed.

4. **Multi-armed Bandit**: Best when you want to dynamically optimize model selection in real-time. Use this when you have multiple models to compare, can handle some inconsistency in user experience, and have a reliable feedback mechanism to evaluate model performance quickly.

## Conclusion

In this tutorial, we've explored four different strategies for deploying machine learning models using FastAPI:

1. Single Deployment
2. Silent Deployment
3. Canary Deployment
4. Multi-armed Bandit Deployment

Each strategy has its own strengths and weaknesses, and the choice depends on your specific requirements, risk tolerance, and resources. By understanding these deployment techniques, you can make informed decisions about how to roll out new models in your production environment.

Remember that successful deployment often involves a combination of these strategies and may require additional considerations such as:

- Monitoring and logging
- A/B testing frameworks
- Automated rollback mechanisms
- Performance optimization
- Scalability and load balancing

As you implement these strategies in real-world scenarios, you'll likely need to adapt and combine them to suit your specific needs. The key is to prioritize safe, controlled, and reversible deployments that allow you to confidently improve your machine learning models in production.