 **Module 2.2: Parameterization in MLflow Projects** 
 

## 🎯 **Learning Objectives Expanded**

### 1️⃣ **Define and Use Multiple Parameters in an MLflow Project**

* **What it means:**
  You can specify more than one input parameter in your `MLproject` file, such as `alpha`, `max_iter`, or `model_type`, and use them in your training script.

* **Example `MLproject` file:**

  ```yaml
  name: parameterized_project

  conda_env: conda.yaml

  entry_points:
    main:
      parameters:
        alpha: {type: float, default: 0.5}
        max_iter: {type: int, default: 100}
      command: "python train.py --alpha {alpha} --max_iter {max_iter}"
  ```

* **How it works:**
  These parameters are injected into your script when the project runs.

* **Why it matters:**
  Defining parameters at the project level makes it reusable and flexible for different configurations without modifying the code.

---

### 2️⃣ **Pass Parameters at Runtime Using `-P` Flags**

* **What it means:**
  When running your MLflow Project, you can override parameter defaults by specifying values directly from the command line.

* **Example:**

  ```bash
  mlflow run . -P alpha=0.1 -P max_iter=500
  ```

* **What happens:**
  MLflow parses those flags, passes the values to your script, and logs them automatically.

* **Why it matters:**
  This allows you to experiment with different configurations quickly—perfect for grid searches or tuning from the CLI or CI/CD pipelines.

---

### 3️⃣ **Understand the `MLproject` File Structure with Multiple Parameter Types**

* **What it means:**
  The `MLproject` file not only defines parameters but also their types (e.g., `int`, `float`, `string`, `file`) and default values.

* **Example with various types:**

  ```yaml
  entry_points:
    main:
      parameters:
        alpha: {type: float, default: 0.1}
        max_iter: {type: int, default: 100}
        tag: {type: string, default: "baseline"}
        config_path: {type: path}
      command: >
        python train.py --alpha {alpha} --max_iter {max_iter}
                       --tag {tag} --config_path {config_path}
  ```

* **Why it matters:**
  This makes your project highly configurable, robust, and ready to handle real-world deployment scenarios where different types of input files or parameters are common.




In [1]:
# 📓 Module 2.2: Parameterization in MLflow Projects
# Goal: Learn how to define and run MLflow Projects with multiple user-defined parameters.

# ✅ Step 1: Install MLflow
!pip install -q mlflow

# ✅ Step 2: Set up project folder
import os
project_dir = "mlflow_param_project"
os.makedirs(project_dir, exist_ok=True)

# ✅ Step 3: Write train.py with two parameters: alpha and max_iter
train_code = '''
import mlflow
import mlflow.sklearn
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import sys

# Read parameters from command-line
alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 1.0
max_iter = int(sys.argv[2]) if len(sys.argv) > 2 else 1000

# Enable autologging
mlflow.sklearn.autolog()

with mlflow.start_run():
    X, y = load_diabetes(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
    model = Ridge(alpha=alpha, max_iter=max_iter)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Alpha: {alpha}, Max Iter: {max_iter}, Test MSE: {mse:.4f}")
'''

with open(os.path.join(project_dir, "train.py"), "w") as f:
    f.write(train_code)

# ✅ Step 4: Write the MLproject file with multiple parameters
mlproject_content = '''
name: RidgeRegressionMultiParam

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: {type: float, default: 1.0}
      max_iter: {type: int, default: 1000}
    command: "python train.py {alpha} {max_iter}"
'''

with open(os.path.join(project_dir, "MLproject"), "w") as f:
    f.write(mlproject_content)

# ✅ Step 5: Write conda.yaml
conda_yaml = '''
name: mlflow-param-env
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.8
  - scikit-learn
  - pip
  - pip:
      - mlflow
'''

with open(os.path.join(project_dir, "conda.yaml"), "w") as f:
    f.write(conda_yaml)

# ✅ Step 6: Show run command (local use)
print("\n📦 Project created with multiple parameters. Run using:")
print(f"mlflow run {project_dir} -P alpha=0.3 -P max_iter=200")



📦 Project created with multiple parameters. Run using:
mlflow run mlflow_param_project -P alpha=0.3 -P max_iter=200


In [2]:
!mlflow run mlflow_param_project -P alpha=0.3 -P max_iter=200

Channels:
 - defaults
 - conda-forge
Platform: win-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

Downloading and Extracting Packages: ...working... done
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Installing pip dependencies: ...working... Ran pip subprocess with arguments:
['C:\\Users\\ryass\\anaconda3\\envs\\mlflow-e1fdd3e96956f7e6a5bf9c74d87a0d7ad86c5a0b\\python.exe', '-m', 'pip', 'install', '-U', '-r', 'c:\\Users\\ryass\\OneDrive\\Documents\\GitHub\\MLflow_learn\\MLflow_step_by_step\\mlflow_param_project\\condaenv.de068x4h.requirements.txt', '--exists-action=b']
Pip subprocess output:
Collecting mlflow (from -r c:\Users\ryass\OneDrive\Documents\GitHub\MLflow_learn\MLflow_step_by_step\mlflow_param_project\condaenv.de068x4h.requirements.txt (line 1))

  Using cached mlflow-2.17.2-py3-none-any.whl.metadata (29 kB)

Collecting mlflow-skinny==2.

2025/08/03 16:27:07 INFO mlflow.utils.conda: === Creating conda environment mlflow-e1fdd3e96956f7e6a5bf9c74d87a0d7ad86c5a0b ===
2025/08/03 16:29:23 INFO mlflow.projects.utils: === Created directory C:\Users\ryass\AppData\Local\Temp\tmpzya18ko9 for downloading remote URIs passed to arguments of type 'path' ===
2025/08/03 16:29:23 INFO mlflow.projects.backend.local: === Running command 'conda activate mlflow-e1fdd3e96956f7e6a5bf9c74d87a0d7ad86c5a0b && python train.py 0.3 200' in run with ID '0516aa1921df45a8b6ea8ac53e293771' === 
Traceback (most recent call last):
  File "train.py", line 17, in <module>
    with mlflow.start_run():
  File "C:\Users\ryass\anaconda3\envs\mlflow-e1fdd3e96956f7e6a5bf9c74d87a0d7ad86c5a0b\lib\site-packages\mlflow\tracking\fluent.py", line 338, in start_run
    active_run_obj = client.get_run(existing_run_id)
  File "C:\Users\ryass\anaconda3\envs\mlflow-e1fdd3e96956f7e6a5bf9c74d87a0d7ad86c5a0b\lib\site-packages\mlflow\tracking\client.py", line 226, in get_run
 

## 📝 Assessment: Parameterization in Projects   

### 📘 Multiple Choice (Choose the best answer)   

**1. How do you pass multiple parameters when running an MLflow Project?**   
A. `mlflow run project.py alpha=0.1 max_iter=200`   
**B. `mlflow run <project_path> -P alpha=0.1 -P max_iter=200`** ✅   
C. `mlflow train --alpha 0.1 --max_iter 200`   
D. `python train.py --alpha 0.1 --max_iter 200`   

---

**2. What happens if you omit a required parameter that doesn't have a default in the `MLproject` file?**   
A. MLflow uses a random value   
B. It skips the parameter   
**C. The run fails with a missing parameter error** ✅   
D. It sets the parameter to 0 by default   

---

**3. In an MLflow `MLproject` file, how are parameter types defined?**   
A. Inside `conda.yaml`   
B. Using Python’s `type()`   
**C. As YAML under the `parameters` block with type annotations** ✅   
D. With `log_param()` during training   

---

**4. Which of the following is a valid `parameters` section in an `MLproject` file?**   

```yaml   
parameters:   
  learning_rate: {type: float, default: 0.01}   
  max_iter: {type: int, default: 100}   
```

**A. This is valid** ✅   
B. This is invalid – type should be Python types   
C. This is invalid – parameters go in `train.py`   
D. This is invalid – use `requirements.txt` for parameters   

---

### ✏️ Short Answer   

**5. Why is parameterization useful in MLflow Projects?**   
*It allows running experiments with different configurations easily and reproducibly, enabling comparison across hyperparameter settings.*   

---

**6. What’s the difference between setting default values in the `MLproject` file vs providing them at runtime?**   
*Defaults ensure fallback values are available; runtime parameters override them, giving more flexibility during experimentation.*   

---

### 🧪 Mini Project      

**7. Task:**         
Extend the Ridge Regression MLflow Project to accept a third parameter: `solver`      

* Add it to the `MLproject` and `train.py`      
* Provide a default value of `"auto"`      
* Run the project with:      

  ```bash   
  mlflow run . -P alpha=0.5 -P max_iter=500 -P solver=svd     
  ```   
* Log all parameters and MSE metric   
* Compare results across different solvers   


