✅ **Module 3.3: Serving a Model via REST API** 

## 🎯 **Learning Objectives**

### 1️⃣ **Log a Model with MLflow for Serving**

* **What it means:**
  Logging a model with MLflow means saving your trained model in a standardized format, making it ready to deploy or share with others.

* **Detailed Steps:**

  * Train your machine learning model (e.g., Linear Regression, Random Forest, etc.).
  * Use MLflow functions like `mlflow.sklearn.log_model()` to store the model, parameters, and metadata.
  * MLflow stores this information so it can later serve the model or load it for predictions.

* **Why it matters:**
  Logging ensures your model is reproducible, organized, versioned, and easily retrievable for serving.

---

### 2️⃣ **Serve the Model Using `mlflow models serve`**

* **What it means:**
  Serving means deploying your logged model on a server, turning it into a RESTful API that accepts prediction requests and sends back predictions.

* **Detailed Steps:**

  * After logging the model, run the following command in your terminal:

    ```bash
    mlflow models serve -m runs:/<RUN_ID>/model_name -p 5001 --no-conda
    ```
  * `runs:/<RUN_ID>/model_name` specifies the model you want to serve.
  * `-p 5001` indicates the port number your API listens to.
  * `--no-conda` means it uses the current Python environment instead of creating a new Conda environment.

* **Why it matters:**
  Serving models through a REST API allows easy integration with web apps, mobile apps, and other software, enabling real-time predictions.

---

### 3️⃣ **Send REST API Requests with JSON Payload**

* **What it means:**
  Once your model is served, it expects data for predictions. REST API requests are how you send this data (input features) to your model.

* **Detailed Steps:**

  * Create input data as JSON (usually from a pandas DataFrame or Python dictionary):

    ```python
    import requests
    import pandas as pd

    url = "http://127.0.0.1:5001/invocations"
    data = pd.DataFrame({"feature1": [0.5], "feature2": [1.5]}).to_json(orient="records")
    headers = {"Content-Type": "application/json"}

    response = requests.post(url, data=data, headers=headers)
    print(response.json())
    ```
  * Use `requests.post()` to send the data to the model endpoint.

* **Why it matters:**
  JSON payloads are a universal way to send structured data over the web. Using JSON ensures your API can easily communicate with various software systems.

---

### 4️⃣ **Interpret API Response from `/invocations` Endpoint**

* **What it means:**
  The `/invocations` endpoint is where the REST API serves predictions. The response usually includes predictions made by the model based on the input provided.

* **Detailed Steps:**

  * When you send data to `/invocations`, the API returns predictions as JSON. For example:

    ```json
    [20.3]
    ```

    Or for classification:

    ```json
    {"predicted_label": 1, "probability": 0.85}
    ```
  * Your application interprets this response (e.g., showing predictions to users or making automated decisions).

* **Why it matters:**
  Clearly interpreting API responses allows your applications or workflows to make accurate and informed decisions based on real-time model predictions.



In [6]:
# 📓 Module 3.3: Serving a Model via REST API
# Goal: Serve MLflow models as REST endpoints and test them with requests

# ⚠️ This notebook shows the steps but serving only works locally or on cloud VMs (not in Colab)

# ✅ Step 1: Train and log a model to be served
!pip install -q mlflow scikit-learn pandas

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import pandas as pd
import numpy as np

# Generate data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=42)
X_df = pd.DataFrame(X, columns=["feature1", "feature2"])

# Train model
model = LinearRegression().fit(X, y)

# Log model
mlflow.set_experiment("serve-linear-model")

with mlflow.start_run():
    mlflow.sklearn.log_model(model, "linear_model")
    run_id = mlflow.active_run().info.run_id
    print("Model logged with run ID:", run_id)

# ✅ Step 2: Serve the model (run in terminal or CLI)
print("""
⚙️ To serve the model, run this command in your terminal:

mlflow models serve -m runs:/<RUN_ID>/linear_model -p 5001 --no-conda

Replace <RUN_ID> with the actual run_id printed above.

Then open another terminal or Python script to send a request.
""")

# ✅ Step 3: Send a request to the REST endpoint
print("""
📤 Example request (run separately in Python or terminal):

import requests
import pandas as pd

url = "http://127.0.0.1:5001/invocations"
data = pd.DataFrame({"feature1": [0.5], "feature2": [1.5]}).to_json(orient="records")
headers = {"Content-Type": "application/json"}

response = requests.post(url, data=data, headers=headers)
print("Prediction:", response.json())
""")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.7/24.7 MB[0m [31m94.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.9/114.9 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.0/85.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.0/677.0 kB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.4/203.4 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

2025/07/30 19:03:08 INFO mlflow.tracking.fluent: Experiment with name 'serve-linear-model' does not exist. Creating a new experiment.


Model logged with run ID: e1f6ac60eb134a0aa9d31d74d0ed7ecb

⚙️ To serve the model, run this command in your terminal:

mlflow models serve -m runs:/<RUN_ID>/linear_model -p 5001 --no-conda

Replace <RUN_ID> with the actual run_id printed above.

Then open another terminal or Python script to send a request.


📤 Example request (run separately in Python or terminal):

import requests
import pandas as pd

url = "http://127.0.0.1:5001/invocations"
data = pd.DataFrame({"feature1": [0.5], "feature2": [1.5]}).to_json(orient="records")
headers = {"Content-Type": "application/json"}

response = requests.post(url, data=data, headers=headers)
print("Prediction:", response.json())



#📝 Assessment: Serving a Model via REST API

### 📘 Multiple Choice (Correct answers in **bold**)    

**1. What is the purpose of the MLflow command `mlflow models serve`?**    
A. Register a model in the model registry    
**B. Start a REST server for the model endpoint** ✅    
C. Launch the MLflow UI
D. Trigger a batch training job

---

**2. Which endpoint does MLflow use for model inference by default?**   
A. `/predict`   
**B. `/invocations`** ✅   
C. `/model/serve`   
D. `/run`   

---

**3. Which content type must you specify in the REST request header when sending a pandas DataFrame in JSON?**   
A. `application/csv`   
**B. `application/json`** ✅   
C. `text/plain`   
D. `application/x-www-form-urlencoded`   

---

**4. What is the purpose of the `--no-conda` flag in the `mlflow models serve` command?**   
A. Disables environment logging   
**B. Prevents MLflow from creating a new Conda environment** ✅   
C. Skips model versioning   
D. Serves the model using Docker   

---

### ✏️ Short Answer   

**5. Why might you choose to serve your model via REST API instead of batch prediction?**   
*To enable real-time, on-demand inference where clients (e.g., web apps or microservices) can query the model interactively.*   

---

**6. What could cause a REST request to fail when serving a model? List two common reasons.**   

* The server is not running on the correct port   
* The input JSON is improperly formatted or missing required columns   

---

### 🧪 Mini Project

**7. Task:**   

* Log a simple classification model (e.g., logistic regression)   
* Serve it locally with `mlflow models serve`   
* Use Python’s `requests` module to POST a test sample in JSON format   
* Display the model’s prediction response   
