✅ **Module 3.1: Custom PythonModel for Real-World Use**

### 1️⃣ **Use `mlflow.pyfunc.PythonModel` to Define Custom Inference Logic**

* **What it means:**
  Create a custom MLflow model class where **you control the logic of what happens during prediction**, including preprocessing, logic branching, transformations, or custom scoring.

* **Example Use Cases:**

  * Add business rules to prediction logic (e.g., if prediction < threshold → “reject”).
  * Bundle both preprocessing + model inference in one model.
  * Apply transformations before or after the model runs.

* **Code Example:**

  ```python
  import mlflow.pyfunc

  class MyModel(mlflow.pyfunc.PythonModel):
      def load_context(self, context):
          import joblib
          self.model = joblib.load(context.artifacts["model_path"])

      def predict(self, context, model_input):
          model_input["log_income"] = model_input["income"].apply(lambda x: np.log(x + 1))
          return self.model.predict(model_input[["log_income"]])
  ```

* **Why it matters:**
  This lets you tailor predictions to **real-world requirements**, like handling messy inputs or packaging complex logic into one model object.

---

### 2️⃣ **Package a Custom Model with Artifacts (e.g., Parameters, Files)**

* **What it means:**
  Save any files your model depends on (e.g., scaler objects, model files, configs) and load them at runtime using `context.artifacts`.

* **Artifacts Include:**

  * Pickled models (`.pkl`)
  * Preprocessing pipelines
  * Lookup tables, threshold values, configs

* **Example in `log_model()`:**

  ```python
  mlflow.pyfunc.log_model(
      artifact_path="custom_model",
      python_model=MyModel(),
      artifacts={"model_path": "model.pkl"}
  )
  ```

* **Why it matters:**
  This turns your model into a **self-contained unit**, ready for deployment with all dependencies.

---

### 3️⃣ **Log and Load the Model Using the `pyfunc` Flavor**

* **What it means:**
  Save and retrieve the custom model in MLflow using the generic `pyfunc` interface—standard across different deployment targets (CLI, REST, batch scoring).

* **Log:**

  ```python
  mlflow.pyfunc.log_model(...)
  ```

* **Load:**

  ```python
  loaded_model = mlflow.pyfunc.load_model("runs:/<run_id>/custom_model")
  ```

* **Why it matters:**
  This makes your custom logic deployable anywhere MLflow is supported (e.g., cloud serving, REST API, Docker, batch pipelines).

---

### 4️⃣ **Run Predictions Using Arbitrary Python Logic**

* **What it means:**
  Your `PythonModel` class can apply **any logic** to the inputs before, during, or after predictions—just like a regular Python function.

* **Example:**

  ```python
  df = pd.DataFrame({"income": [10000, 20000, 30000]})
  preds = loaded_model.predict(df)
  ```

* **Why it matters:**
  Gives you **full flexibility** to handle non-standard data, apply rules, transform outputs, or simulate real-world decisions.


In [1]:
# 📓 Module 3.1: Custom PythonModel for Real-World Use
# Goal: Build a custom MLflow model with pre-processing and custom logic using the PythonModel class

# ✅ Step 1: Install requirements
!pip install -q mlflow scikit-learn pandas

# ✅ Step 2: Import libraries
import mlflow.pyfunc
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import joblib
import os

# ✅ Step 3: Create a preprocessing + model pipeline
X = pd.DataFrame({"feature1": [1, 2, 3, 4], "feature2": [10, 20, 30, 40]})
y = [2, 4, 6, 8]

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("model", LinearRegression())
])
pipeline.fit(X, y)

# ✅ Step 4: Save pipeline to a file for reuse as an artifact
os.makedirs("artifacts", exist_ok=True)
joblib.dump(pipeline, "artifacts/pipeline.pkl")

# ✅ Step 5: Create a custom PythonModel wrapper
class RealWorldModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        import joblib
        self.model = joblib.load(context.artifacts["model_file"])

    def predict(self, context, model_input):
        # Add a custom rule: if any input is negative, return -1
        if (model_input < 0).any().any():
            return [-1] * len(model_input)
        return self.model.predict(model_input)

# ✅ Step 6: Log the custom model with artifact
artifacts = {"model_file": "artifacts/pipeline.pkl"}
model_path = "realworld_pyfunc_model"

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        artifact_path=model_path,
        python_model=RealWorldModel(),
        artifacts=artifacts
    )
    print("✅ Custom real-world model logged.")

# ✅ Step 7: Load and test the model
loaded_model = mlflow.pyfunc.load_model(f"runs:/{mlflow.last_active_run().info.run_id}/{model_path}")

test_input = pd.DataFrame({"feature1": [5, -1], "feature2": [50, 60]})
predictions = loaded_model.predict(test_input)
print("\n🔮 Predictions with custom logic:")
print(predictions)




✅ Custom real-world model logged.

🔮 Predictions with custom logic:
[-1, -1]


## 📝 Assessment: Custom PythonModel for Real-World Use

### 📘 Multiple Choice (Answers in **bold**)

**1. What does `load_context()` allow your custom `PythonModel` to do?**   
A. Set model parameters manually   
**B. Load external artifacts like pipelines or tokenizers** ✅   
C. Tune hyperparameters dynamically   
D. Fetch model inputs from MLflow UI   

---

**2. In the custom model example, what happens if a negative value appears in the input?**   
A. An exception is raised   
**B. The model returns -1 for each row** ✅   
C. The model returns NaN   
D. Prediction is skipped for that row   

---

**3. What is the correct method to store files like pickled pipelines in MLflow?**   
A. `mlflow.save_model()`   
B. `mlflow.log_file()`   
**C. `mlflow.pyfunc.log_model(..., artifacts={...})`** ✅   
D. `mlflow.register_artifact()`   
   
---

**4. Why would you use a custom `pyfunc` model over a standard flavor like `mlflow.sklearn`?**   
A. To reduce log file size   
B. To skip preprocessing   
**C. To wrap custom logic such as input checks, transformation, or ensemble voting** ✅   
D. To avoid using artifacts   

---

### ✏️ Short Answer
   
**5. What is the advantage of using a pipeline + custom logic in a `PythonModel`?**   
*Combines both feature engineering and model prediction logic in one deployable unit. This ensures consistency between training and inference environments.*   

---

**6. How do artifacts make your MLflow model more powerful and reusable?**   
*Artifacts allow models to include reusable components like encoders, scalers, vocabularies, or other external files necessary for prediction.*   

---

### 🧪 Mini Project

**7. Task:**   

* Modify the example so that instead of rejecting negative values, it replaces them with zero   
* Log and test the modified model   
* Use `mlflow.pyfunc.load_model()` and predict on `DataFrame([[10, -10], [0, 100]])`   
* Output the predictions   

