 **Module 1.1: Intro to MLflow & Tracking** 
## 🎯 **Learning Objectives Expanded**

### 1️⃣ **Installing MLflow**

* **What it means:**
  Getting MLflow installed and ready to use in your Python environment.

* **Detailed Steps:**

  * Using pip (most common method):

    ```bash
    pip install mlflow
    ```
  * Optional: Verify the installation by running:

    ```bash
    mlflow --version
    ```

* **Why it matters:**
  Proper installation is the first step to use MLflow’s powerful tools for tracking and deploying models.

---

### 2️⃣ **Running and Tracking a Simple Linear Regression Experiment**

* **What it means:**
  Training a basic linear regression model and using MLflow to keep track of your experiment.

* **Detailed Steps:**

  * Generate or load data:

    ```python
    from sklearn.datasets import make_regression
    X, y = make_regression(n_samples=100, n_features=1, noise=0.1)
    ```
  * Train a simple linear model:

    ```python
    from sklearn.linear_model import LinearRegression
    model = LinearRegression()
    model.fit(X, y)
    ```
  * Track the experiment with MLflow:

    ```python
    import mlflow
    mlflow.start_run()
    mlflow.sklearn.log_model(model, "linear_regression_model")
    mlflow.end_run()
    ```

* **Why it matters:**
  Tracking experiments lets you easily reproduce, share, and compare results later.

---

### 3️⃣ **Logging Parameters, Metrics, and Model Artifacts**

* **What it means:**
  Saving the details about your model training, including settings (parameters), performance scores (metrics), and the trained model itself (artifacts).

* **Detailed Steps:**

  ```python
  with mlflow.start_run():
      mlflow.log_param("fit_intercept", True)
      predictions = model.predict(X)
      mse = mean_squared_error(y, predictions)
      mlflow.log_metric("mse", mse)
      mlflow.sklearn.log_model(model, "linear_regression_model")
  ```

  * `log_param`: Logs hyperparameters or configuration details.
  * `log_metric`: Records performance measures like accuracy or mean squared error.
  * `log_model`: Stores the trained model file.

* **Why it matters:**
  This logging is crucial for reproducibility, easy comparison, and transparent experiment tracking.

---

### 4️⃣ **Viewing and Querying Results Using `mlflow.search_runs()`**

* **What it means:**
  Accessing and reviewing logged experiments directly in Python, allowing comparisons and analysis.

* **Detailed Steps:**

  * Fetching your run results:

    ```python
    runs_df = mlflow.search_runs(experiment_names=["my-experiment"])
    print(runs_df.head())
    ```
  * Filtering specific results:

    ```python
    runs_df[["run_id", "params.fit_intercept", "metrics.mse"]]
    ```

* **Why it matters:**
  Being able to easily review and query your experiments helps you quickly identify which models perform best and under what conditions.



In [1]:
# 📓 Module 1.1: Intro to MLflow & Tracking
# Goal: Understand basic MLflow usage for tracking experiments

# ✅ Step 1: Install and import MLflow
# Install MLflow silently if not already installed
!pip install -q mlflow

# Import necessary libraries
import mlflow
import os
import random
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# ✅ Step 2: Create and log a basic experiment

# Generate synthetic regression data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Set the experiment name to organize all related runs
# If the experiment doesn't exist, it will be created
mlflow.set_experiment("intro-mlflow-tracking")

# Start a new MLflow run context
with mlflow.start_run():
    # Log a random boolean parameter to experiment with reproducibility
    fit_intercept = random.choice([True, False])
    mlflow.log_param("fit_intercept", fit_intercept)  # Record the parameter value used in this run

    # Train a simple linear regression model with the chosen parameter
    model = LinearRegression(fit_intercept=fit_intercept)
    model.fit(X_train, y_train)

    # Predict on test set and calculate Mean Squared Error (MSE)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)

    # Log the evaluation metric
    mlflow.log_metric("mse", mse)  # Record how well the model performed

    # Save the trained model in MLflow using the sklearn flavor
    mlflow.sklearn.log_model(model, "linear_model")

    # Output the unique Run ID for reference
    print("Run ID:", mlflow.active_run().info.run_id)

# ✅ Step 3: Launch MLflow UI (for local use only)
# If you're running this notebook locally (not on Colab), you can view the MLflow UI with:
# !mlflow ui  # Opens a web UI at http://localhost:5000
%mlflow ui
# ✅ Step 4: View experiment results programmatically
# Retrieve all runs for the given experiment and view key columns
runs_df = mlflow.search_runs(experiment_names=["intro-mlflow-tracking"])
runs_df[['run_id', 'params.fit_intercept', 'metrics.mse']]


2025/08/02 21:25:59 INFO mlflow.tracking.fluent: Experiment with name 'intro-mlflow-tracking' does not exist. Creating a new experiment.


Run ID: 60a662e392614db3bce8fa6c732c31e1


Unnamed: 0,run_id,params.fit_intercept,metrics.mse
0,60a662e392614db3bce8fa6c732c31e1,True,104.202227


## 📝 Assessment Questions 
### **Multiple Choice (Choose One)**

1. What is the purpose of `mlflow.set_experiment("my-exp")`?

   * A. Launches MLflow UI
   * B. Starts a new run
   * C. Assigns all upcoming runs to a named experiment ✅
   * D. Resets all previous runs

2. Which function logs a model in MLflow?

   * A. `mlflow.log_model()`
   * B. `mlflow.model()`
   * C. `mlflow.sklearn.log_model()` ✅
   * D. `mlflow.save_model()`

3. What happens if `mlflow.start_run()` is not called?

   * A. No run will be created, and nothing is tracked ✅
   * B. An error occurs
   * C. MLflow uses a default run
   * D. MLflow creates a new experiment automatically

---

### **Short Answer**

4. What is the difference between `log_param()` and `log_metric()`?
5. How can you access the run history programmatically?

---

### **Mini Project**

Train a Decision Tree Regressor on the same dataset and log:

* Two parameters (`max_depth`, `min_samples_split`)
* Two metrics (MSE, R² score)
* The trained model artifact



✅ The notebook now includes clear explanations for each MLflow function used:

* `set_experiment()` — defines or creates an experiment
* `start_run()` — begins a tracking context
* `log_param()` and `log_metric()` — record inputs and results
* `log_model()` — stores the trained model artifact
* `active_run().info.run_id` — retrieves metadata
* `search_runs()` — programmatically queries past runs

