# MLflow Exercise Background

<img src="images/overview-mlflow-focus.jpg" width=800>

As shown in the figure above, the MLOps platform provides an MLflow service that stores ML model training information to a PostgreSQL database and training artifacts to a MinIO storage service. 

MLflow provides a [Python client](https://mlflow.org/docs/latest/python_api/index.html) for communicating with an MLflow service. For example, we can use the MLflow Python client to start an *MLflow run*, which is an execution of an ML training script:
```python
with mlflow.start_run():
    model = ElasticNet(alpha=..., l1_ratio=...)
    model.fit(train_x, train_y)
```
*MLflow runs* are organized into *MLflow experiments*. An MLflow experiment can be seen as a logical unit of one or more MLflow runs. For example, there can be an MLflow experiment for training an ElasticNet model, and there can be multiple MLflow runs under this experiment for exploring different hyperparameters and/or training datasets.

When starting  an MLflow run, we can record the relevant information, such as the configured hyperparameters and custom evaluation metrics. After the run is completed, we can also upload the produced model artifact to MLflow:
```python
with mlflow.start_run():
    model = ElasticNet(alpha=..., l1_ratio=...)
    model.fit(train_x, train_y)
    mlflow.log_param("alpha", ...)
    mlflow.log_param("l1_ratio", ...)
    mlflow.log_metric("rmse", ...)
    mlflow.sklearn.log_model(model, ...)
```
Below is a complete example. 

More reading material: [MLflow docs](https://mlflow.org/docs/latest/index.html)

# Check Connection to MLflow Server
Before starting the exercises, please ensure that you can connect to the MLflow server. Please use the mlops_eng environment which has setup the MLflow client for you.

In [None]:
import requests


def test_connection():
    for url in [
        "kserve-gateway.local",
        "ml-pipeline-ui.local",
        "mlflow-server.local",
        "mlflow-minio-ui.local",
        "mlflow-minio.local",
        "prometheus-server.local",
        "grafana-server.local",
        "evidently-monitor-ui.local",
    ]:
        try:
            requests.get(f"http://{url}")
        except Exception as e:
            print(f"Failed to connect to {url}: {e}")
            raise e

test_connection()


# Take a look at the dataset

In this example, we'll use sklearn to train a simple ElasticNet model that predicts red wine quality given some chemical attributes. The information of dataset used in this example can be found [here](https://archive.ics.uci.edu/dataset/186/wine+quality).

Deepchecks also provides convenient access to this dataset:

```python

from deepchecks.tabular.datasets.regression import wine_quality
train_data, test_data = wine_quality.load_data(as_train_test=True)
```

Below, we load the prepared training and test CSV files and inspect the first 10 rows of the training data.

In [None]:
from deepchecks.tabular import Dataset
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import pandas as pd

# Load data
train_data = pd.read_csv('data/1_data_train.csv')
test_data  = pd.read_csv('data/1_data_test.csv')
target = 'quality'

# TRAIN
y_train = train_data[target]
X_train = train_data.drop(columns=[target])

# TEST
y_test = test_data[target]
X_test = test_data.drop(columns=[target])

train_ds = Dataset(X_train, label=y_train, cat_features=[])
test_ds  = Dataset(X_test,  label=y_test,  cat_features=[])

X_train.nlargest(10, 'alcohol') # Get top 10 rows with largest alcohol values

# Create an MLflow run

Now, let's use the loaded dataset to create an MLflow run that trains an simple model and logs relevant parameters, metrics, and artifacts to the MLflow service.

## 1: Setup
The following code snippet exemplifies how to use the MLflow Python client to record training parameters and evaluation metrics as well as upload the trained model artifact to the MLflow service.

The following code snippet import the necessary Python packages and configure the environment for MLflow.
Do not modify this cell â€“ it sets up logging, MLflow endpoints, and access credentials needed for the exercises.

In [None]:
import os
import logging

import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

# Set an environmental variable named "MLFLOW_S3_ENDPOINT_URL" so that MLflow client knows where to save artifacts.
# The MinIO storage service can be accessed via http://mlflow-minio.local
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://mlflow-minio.local"

# Configure the credentials needed for accessing the MinIO storage service.
# "AWS_ACCESS_KEY_ID" has been configured in a ComfigMap and "AWS_SECRET_ACCESS_KEY" in a Secret in your K8s cluster when you set up the MLOps platform
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

## 2: Define MLflow experiment function
Instead of hardcoding a model, experiment, or model name, we will define a function run_mlflow_experiment(...) that:

- Trains a given model on the training dataset

- Evaluates it on the test dataset

- Logs parameters, metrics, and the trained model to MLflow

This allows you to easily experiment with different models and hyperparameters.

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import RandomForestRegressor

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    return rmse

def run_mlflow_experiment(
    X_train, y_train, X_test, y_test,
    model,
    experiment_name="default_experiment",
    model_name="my_model",
    tracking_uri="http://mlflow-server.local",
    params=None
):
    """Run a single MLflow experiment with a given model and log metrics + model."""
    
    mlflow.set_tracking_uri(tracking_uri)
    mlflow.set_experiment(experiment_name)

    with mlflow.start_run() as run:
        print("MLflow run_id:", run.info.run_id)
        
        # Train the model
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Compute metric
        rmse = eval_metrics(y_test, predictions)
        
        # Log params
        if params:
            for k, v in params.items():
                mlflow.log_param(k, v)
                
        # Log metric
        mlflow.log_metric("rmse", rmse)
        
        # Log model
        mlflow.sklearn.log_model(model, artifact_path="model", registered_model_name=model_name)
        print("Logged model artifact URI:", mlflow.get_artifact_uri("model"))
        
        return rmse


## 3: Run a sample model training experiment

We will run a simple ElasticNet model with default hyperparameters.

- This will serve as a baseline for comparison.

- The run will be recorded in MLflow so you can view the metrics and model artifact in the web UI.

- Before running, set a meaningful experiment name and model name, as these will be visible to others on the MLflow server.

In [None]:
from sklearn.linear_model import ElasticNet

default_model = ElasticNet(alpha=0.5, l1_ratio=0.5, random_state=42)
default_params = {"alpha": 0.5, "l1_ratio": 0.5}
# MLFLOW_EXPERIMENT_NAME = "YOUR_EXPERIMENT_NAME_HERE" # Remember that others on the same MLflow service may see these names so be nice.
# MODEL_NAME="YOUR_MODEL_NAME"

rmse = run_mlflow_experiment(
    X_train, y_train, X_test, y_test,
    model=default_model,
    experiment_name=MLFLOW_EXPERIMENT_NAME,
    model_name=MODEL_NAME,
    params=default_params
)
print("RMSE for default ElasticNet:", rmse)


Expected output:

```text
2025/11/30 13:10:56 INFO mlflow.tracking.fluent: Experiment with name 'wine_quality_experiment' does not exist. Creating a new experiment.
MLflow run_id: ccc5efce623f4cf69d3fb35d698221af
INFO:botocore.credentials:Found credentials in environment variables.
Successfully registered model 'elasticnet_model'.
2025/11/30 13:11:00 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: elasticnet_model, version 1
Logged model artifact URI: s3://mlflow/3/ccc5efce623f4cf69d3fb35d698221af/artifacts/model
RMSE for default ElasticNet: 0.6777347143112558
Created version '1' of model 'elasticnet_model'.
```

Navigate to the MLflow service UI at [http:/127.0.0.1:5000/](http:/127.0.0.1:5000/) (ported from the http://mlflow-server.local URL inside the cluster),
and you should see your run under the experiment "mlflow-minio-test". You can browse the run parameters, metrics and artifacts. For example: 

* Training hyperparameters and evaluation metrics:

<img src="images/mlflow-logging.png" width="1000"/>

You may notice that the "Metrics" and "Parameters" field are hidden by default, you can make them visible by clicking the "Columns" tab:

<img src="images/mlflow-show-columns.png" width=1000 />

When clicking the Run Name, we can also check where the model and other related files have been uploaded:

<img src="images/mlflow-uploaded-artifacts.png" width=1000 />

In this case, the model (which is a Pickle file) and its related files (such as the model dependency requirements) have been uploaded to the MinIO service. Navigate to [http:/127.0.0.1:9000/](http:/127.0.0.1:9000/) (ported from the http://mlflow-minio.local inside the cluster) and login using "minioadmin" as both the username and password, we can see there is a bucket named "mlflow":

<img src="images/minio-bucket-ui.png" width=1000 />

clicking the bucket (and its underlying folders) we can see the model and its related artifacts reside in the "mlflow" bucket:

<img src="images/minio-model-artifacts.png" width=1000 />

* Finally, we can also see the model has been registered to MLflow:

<img src="images/mlflow-model.png" width="1000"/>

## 4: Try different hyperparameters

Try running the run_mlflow_experiment(...) function with different hyperparameters for the ElasticNet model.
MLflow will record each run separately, allowing you to compare results easily.

In [None]:
alphas = [] # Add your alpha values here
l1_ratios = [] # Add your l1_ratio values here

for alpha in alphas:
    for l1_ratio in l1_ratios:
        model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        run_mlflow_experiment(
            X_train, y_train, X_test, y_test,
            model=model,
            experiment_name=MLFLOW_EXPERIMENT_NAME,
            model_name=MODEL_NAME,
            params={"alpha": alpha, "l1_ratio": l1_ratio}
        )

# 5: Try different models
Now you can experiment with a different model, such as a RandomForestRegressor from sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) which is shown below. The goal is to see how the RandomForest model performs compared to the ElasticNet model, and log the results to MLflow as well.

In [None]:
from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
rf_params = {"n_estimators": 100, "max_depth": 5}
#MLFLOW_RF_EXPERIMENT_NAME = "YOUR_RF_EXPERIMENT_NAME"
#MODEL_RF_NAME="YOUR_RF_MODEL_NAME"

rmse_rf = run_mlflow_experiment(
    X_train, y_train, X_test, y_test,
    model=rf_model,
    experiment_name=MLFLOW_RF_EXPERIMENT_NAME,
    model_name=MODEL_RF_NAME,
    params=rf_params
)
print("RMSE for RandomForest:", rmse_rf)

Please add code cells below to try different models and hyperparameters, the goal is to get the best possible RMSE on the test dataset while logging all runs to MLflow.

In [None]:
# Try different models and hyperparameters here

## 5: Explore the MLflow UI and Inspect Team Runs

- Use the MLflow web UI or CLI to inspect the runs, metrics, parameters, and logged model artifacts.

- Filter by experiment name or model name to see only your runs.

- Try loading a saved model from MLflow and make predictions on new samples.

Can you identify which model and hyperparameter combination achieved the best RMSE on the test dataset?