# MLFlow + WhyLabs Integration

This tutorial showcases how you can use the whylabs integration to:
* Capture data quality metrics while training a linear regression model in `mlflow`
* Extract whylogs data back into an in-memory format from the MLflow backend
* Log same data back into whylabs platform.

# Getting Started

To run this tutorial:
* Create and then activate the conda environment included using `environment.yml` file and run this notebook using this new environment `whylabs-mlflow` as kernel.
* You'll need to install pip into the conda environment using `conda install pip`

# Setup
First, we want to filter out noisy warnings

In [None]:
import logging
logging.basicConfig(level=logging.INFO)

In [None]:
import os
import random
import datetime
import random
import time
import math
import numpy as np
import pandas as pd
import mlflow
import whylogs
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from whylogs import get_or_create_session
from dotenv import load_dotenv

In [None]:
load_dotenv(".env_mlflow")

In [None]:
assert whylogs.__version__ >= "0.1.13" # we need 0.1.13 or later for MLflow integration

In [None]:
pd.options.mode.chained_assignment = None  # default='warn'

# Enable whylogs Integration

Enable whylogs in MLflow to allow storing whylogs statistical profiles with every run. This method returns `True` if whylogs is able to patch MLflow. You might want to pass a session

In [None]:
session = get_or_create_session(".whylogs_mlflow.yaml")
whylogs.enable_mlflow(session)

# Dataset Preparation

Download and prepare the UCI wine quality dataset. We sample the test dataset further to represent batches of data produced every second.

In [None]:
data = pd.read_csv(os.environ["DATASET_URL"], sep=";")
data.head()

In [None]:
# Split the data into training and test sets
train, test = train_test_split(data)

In [None]:
# Relocate predicted variable "quality" to y vectors
train_x = train.drop(["quality"], axis=1).reset_index(drop=True)
test_x = test.drop(["quality"], axis=1).reset_index(drop=True)
train_y = train[["quality"]].reset_index(drop=True)
test_y = test[["quality"]].reset_index(drop=True)

## Train a model

Now, we split the data to simulate multiple machine learning model learning looking for the best hyperparameters (simulation of an hyperparameter optimization processs).

First, we split our test sets data to test every algorithm (also, for testing whylogs ability to log dataframes).

In [None]:
# Create an MLflow experiment for our demo
experiment_name = "whylogs demo"
mlflow.set_experiment(experiment_name)

model_params = {"alpha": 1.0,
                "l1_ratio": 0.7}

We can further evaluate the model performance using crossvalidation with 10 folds without overlapping, taking each batch, evaluating the model and then averaging the metrics obtained for each batch.

It's essential to index the data given a timestamp index because information is indexed by timestamp in WhyLabs Platform.

Note that whylogs profiler data is automatically logged when mlflow.end_run() is called implicitly.

In [None]:
n_folds = 10
kf = KFold(n_splits=n_folds)
mae_v = []
for i, (train_index, test_index) in enumerate(kf.split(train_x)):
    with mlflow.start_run(run_name=f"Run {i + 1}"):        
        X_train, X_test = train_x.loc[train_index], train_x.loc[test_index]
        y_train, y_test = train_y.loc[train_index], train_y.loc[test_index]
        
        # Train a model with each split
        lr = ElasticNet(**model_params)
        lr.fit(X_train, y_train)
        print("ElasticNet model (%s):" % model_params)
        
        # Evaluate trained model 
        predicted_output = lr.predict(X_test)
        mae = mean_absolute_error(y_test, predicted_output)

        # Log to mlflow the hyperparameters and evaluation metric
        mlflow.log_params(model_params)
        mlflow.log_metric("mae", mae)
        X_test[0, "mae"] = mae
        
        for k, v in model_params.items():
            train_x.iloc[test_index][k] = v
        
        # use whylogs to log data quality metrics for the current batch
        mlflow.whylogs.log_pandas(
            train_x.iloc[test_index], 
            datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=i)
        )
        print("Subset %.0f, mean absolute error: %s" % (i + 1, mae))
        mae_v.append(mae)
