**Module 2.4: Model Logging with Flavors**
## 🎯 **Learning Objectives Expanded**

### 1️⃣ **Understand MLflow Model Flavors: sklearn, pyfunc, python\_function, and Custom**

* **What it means:**
  MLflow supports various formats ("flavors") to save and load models, each optimized for different model types or frameworks.

* **Detailed Explanation:**

  * **sklearn Flavor:** For Scikit-learn models; lets you save and load models directly from Scikit-learn.
  * **pyfunc Flavor:** Generic Python functions/models; allows wrapping arbitrary Python code for predictions.
  * **python\_function (alias to pyfunc):** Interchangeable term often used to mean the same as `pyfunc`.
  * **Custom Flavors:** Allow defining your own logic, packaging preprocessing, postprocessing, or non-standard prediction methods.

* **Why it matters:**
  Using appropriate flavors ensures your models are portable, easily deployable, and consistent across environments.

---

### 2️⃣ **Log Models Using the Scikit-learn Flavor**

* **What it means:**
  Saving trained Scikit-learn models using MLflow’s native sklearn format.

* **Detailed Steps:**

  * Train your Scikit-learn model:

    ```python
    from sklearn.linear_model import LogisticRegression
    model = LogisticRegression().fit(X_train, y_train)
    ```
  * Log the model with MLflow:

    ```python
    import mlflow.sklearn
    mlflow.sklearn.log_model(model, "sklearn_model")
    ```

* **Why it matters:**
  Ensures compatibility, ease of loading, and deployment for all Scikit-learn models.

---

### 3️⃣ **Log Manually Pickled Models as Generic Artifacts**

* **What it means:**
  Manually saving models (or any objects) using Python's pickle library and storing them as generic MLflow artifacts.

* **Detailed Steps:**

  * Serialize (pickle) your model:

    ```python
    import pickle
    with open("model.pkl", "wb") as f:
        pickle.dump(model, f)
    ```
  * Log this pickle file as a generic artifact:

    ```python
    mlflow.log_artifact("model.pkl", artifact_path="pickle_model")
    ```

* **Why it matters:**
  Lets you store custom, non-standardized objects, models, or files that don't directly fit predefined MLflow flavors.

---

### 4️⃣ **Reload and Use Logged Models Programmatically**

* **What it means:**
  After logging models, MLflow lets you easily retrieve and load them for predictions or further analysis.

* **Detailed Steps:**

  * Reload an sklearn flavor model:

    ```python
    loaded_model = mlflow.sklearn.load_model("runs:/<RUN_ID>/sklearn_model")
    predictions = loaded_model.predict(X_test)
    ```
  * Reload a generic pickle artifact:

    ```python
    import pickle
    import mlflow
    local_path = mlflow.artifacts.download_artifacts(run_id="<RUN_ID>", artifact_path="pickle_model/model.pkl")
    with open(local_path, "rb") as f:
        model = pickle.load(f)
    predictions = model.predict(X_test)
    ```

* **Why it matters:**
  Easily retrieving models ensures smooth transition between training and deployment, facilitating reproducibility and model governance.



In [1]:
# 📓 Module 2.4: Model Logging with Flavors
# Goal: Understand how MLflow supports multiple model flavors and how to log and load them

# ✅ Step 1: Install required packages
!pip install -q mlflow scikit-learn

# ✅ Step 2: Import libraries
import mlflow
import mlflow.sklearn
import pickle
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# ✅ Step 3: Load dataset and prepare splits
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ✅ Step 4: Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# ✅ Step 5: Set experiment and log using sklearn flavor
mlflow.set_experiment("model-flavors-demo")

with mlflow.start_run():
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_param("model_type", "logistic_regression")

    # Log using sklearn flavor (standard)
    mlflow.sklearn.log_model(model, artifact_path="sklearn_model")

    # Export model manually with pickle and log as a generic artifact
    with open("manual_model.pkl", "wb") as f:
        pickle.dump(model, f)
    mlflow.log_artifact("manual_model.pkl", artifact_path="pickle_model")

    print("Run logged with sklearn and pickle flavors.")

# ✅ Step 6: Load model using sklearn flavor
print("\n🔄 Loading logged model with sklearn flavor:")
sk_model = mlflow.sklearn.load_model("runs:/" + mlflow.last_active_run().info.run_id + "/sklearn_model")
print(f"Reloaded model accuracy: {accuracy_score(y_test, sk_model.predict(X_test)):.4f}")


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.7/24.7 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.9/114.9 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.0/85.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.0/677.0 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.4/203.4 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

2025/07/30 20:42:50 INFO mlflow.tracking.fluent: Experiment with name 'model-flavors-demo' does not exist. Creating a new experiment.


Run logged with sklearn and pickle flavors.

🔄 Loading logged model with sklearn flavor:


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

Reloaded model accuracy: 1.0000


## 📝 Assessment: Model Logging with Flavors

### 📘 Multiple Choice (Choose the best answer)

**1. What is a “model flavor” in MLflow?**    
A. A type of dataset used in model training    
**B. A standardized way to save and load models in different ML libraries** ✅    
C. A Python script for tracking experiments    
D. A configuration file for serving models    

---

**2. Which MLflow method is used to log a Scikit-learn model with the `sklearn` flavor?**    
A. `mlflow.log_model()`    
**B. `mlflow.sklearn.log_model()`** ✅    
C. `mlflow.pyfunc.log_model()`    
D. `mlflow.save_model()`    

---

**3. What happens when you use `mlflow.log_artifact()` to store a file like a pickle model?**    
A. It converts the model to a REST API    
**B. It uploads the file to the run’s artifact directory** ✅    
C. It deploys the model automatically    
D. It encrypts the file and logs parameters    

---

**4. How do you load a model that was logged using the `sklearn` flavor?**    
A. `pickle.load()`    
B. `mlflow.load_pickle()`    
**C. `mlflow.sklearn.load_model("runs:/<run_id>/sklearn_model")`** ✅    
D. `mlflow.retrieve_model()`    

---

### ✏️ Short Answer

**5. What are the benefits of using MLflow model flavors?**    
*Standardizes saving and loading across libraries like Scikit-learn, XGBoost, PyTorch. Enables cross-platform model deployment and consistent serving.*

---

**6. When might you use `mlflow.log_artifact()` instead of `mlflow.log_model()`?**    
*When logging custom objects or manually serialized files like pickles, logs, or preprocessing scripts.*

---

### 🧪 Mini Project    

**7. Task:**    

* Train a new `RandomForestClassifier` on the Iris dataset    
* Log it using the Scikit-learn flavor (`mlflow.sklearn.log_model()`)    
* Also export and log it using `pickle` + `mlflow.log_artifact()`    
* Load the model back using the `sklearn` flavor and confirm accuracy    
