# MLflow demo

## Start MLflow server

[MLflow](https://mlflow.org) is a versatile, expandable, open-source platform for managing workflows and artifacts across the machine learning lifecycle. It has built-in integrations with many popular ML libraries, but can be used with any library, algorithm, or deployment tool. It is designed to be extensible, so you can write plugins to support new workflows, libraries, and tools.

You can easily install MLflow for your tasks in `DataScience environment` with the following script:

In [None]:
!cat manutils/start-mlflow.sh

To run install process open a terminal and type `cd ~ && __MANUAL/manutils/start-mlflow.sh` and MLflow will be installed.

## Libraries and UI access

In [None]:
import os
import numpy as np
import mlflow
from mlflow import log_metric, log_param, log_params, log_artifacts
from mlflow.models.signature import infer_signature
from random import random, randint
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from datetime import datetime
from random import randint 

mlflow.set_tracking_uri(f'file:///home/jovyan/{os.environ["JUPYTERHUB_USER"]}_mlflow')
print('MLflow UI available at:',
      'https://jhas01.gsom.spbu.ru{}proxy/{}/'.format(
          os.environ['JUPYTERHUB_SERVICE_PREFIX'], 50000))

__NOTE:__ MLflow is configured to store its artifacts in the `<YOUR_LOGIN_NAME>_mlflow` directory that will be created whrn MLflow is started.

## Experiments and tracking

Let's define a function for evaluation of the model:

In [None]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

Here is a base example how to use MLflow for your tasks:

In [None]:
name_of_run = f'run_{datetime.now()}'

with mlflow.start_run(run_name=name_of_run) as run:
    # Load the diabetes dataset
    db = load_diabetes()
    X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
    
    # Random set of parameters
    n_estimators = randint(20, 200)
    max_depth = randint(2, 10)

    # Create and train models
    rf = RandomForestRegressor(
        n_estimators=n_estimators, 
        max_depth=max_depth
    )
    rf.fit(X_train, y_train)

    # Use the model to make predictions on the test dataset
    predictions = rf.predict(X_test)
    signature = infer_signature(X_test, predictions)
    mlflow.sklearn.log_model(
        rf, 
        'model',
        registered_model_name='RandomForestRegressorModel', 
        signature=signature
    )
    
    # Evaluate metrics
    (rmse, mae, r2) = eval_metrics(y_test, predictions)
    
    print('Run:', name_of_run)
    print('Run ID: {}'.format(run.info.run_id))
    print(f'  RMSE: {rmse}')
    print(f'  MAE: {mae}')
    print(f'  R2: {r2}')

    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('max_depth', max_depth)
    mlflow.log_metric('rmse', rmse)
    mlflow.log_metric('r2', r2)
    mlflow.log_metric('mae', mae)    

You can now view all your experiments and models in MLflow UI with the URL above.

## Fetching a model from the model registry

After you have registered an MLflow model, you can [fetch](https://mlflow.org/docs/latest/model-registry.html#fetching-an-mlflow-model-from-the-model-registry) that model back for predictions:

In [None]:
import mlflow.pyfunc

In [None]:
model_name = 'RandomForestRegressorModel'
model_version = 4

model = mlflow.pyfunc.load_model(model_uri=f'models:/{model_name}/{model_version}')
preds = model.predict(X_test)
print(preds.shape)

In [None]:
print('Here are predictions:\n', preds)