# MLflow

In a nutshell, MLflow is a way to maximise experiment organisation while minimising setup

## Installation

**In terminal:**
```bash
pipenv install mlflow
```

## Usage

MLflow's UI runs in a browser on a specific port the same way Jupyter does. Unless you are running Jupyter in a detached Docker container you will need to open a second terminal to run MLflow:

**In terminal:**
```bash
pipenv run mlflow ui
```

This will output a URI you need for launching the UI and attach your experiment to. MLflow refers to it as the Tracking URI and will look like one of these:

```
http://kubernetes.docker.internal:5000
```
```
http://localhost:5000
```
```
127.0.0.1:5000
```

**In notebook:**

In [None]:
import mlflow

mlflow.set_tracking_uri('http://localhost:5000')
mlflow.set_experiment(experiment_name='Name of this experiment')

with mlflow.start_run(run_name='Name of this experiment run'):
    mlflow.log_param('Any Param 1', 'Any value')
    mlflow.log_param('Any Param 2', 1_234)
    mlflow.log_param('Any Param 3', False)
    mlflow.log_metric('Any Float Metric', 1.23)
    mlflow.log_metric('Any Float 2', 9.555)
    mlflow.log_metric('Any Float 2', 5.555)
    mlflow.log_metric('Any Float 2', 7.555)
    mlflow.log_metric('Any Float 2', 3.555) # All metrics will be logged, but this will be the final value
    mlflow.end_run()

## Mlflow UI

Run the code above and if you refresh the MLFlow Tracking URI (e.g. localhost:5000), you will see this experiment appear in the left sidebar:

![MLflow sidebar experiment](https://i.snipboard.io/WzZqKp.jpg)

The experiment run will appear in the table of runs, showing the details it ran with:

![MLflow table of runs](https://i.snipboard.io/gsJICz.jpg)

Clicking into the run will show more details, including the run time, tags and artifacts:

![MLflow run details page](https://i.snipboard.io/ktwib9.jpg)

The page will also link to charts of the metrics that were logged:

![MLflow metrics chart](https://i.snipboard.io/9leib2.jpg)

## Basic notebook example usage

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from mlflow import set_tracking_uri, set_experiment, start_run, end_run, log_metric, log_param, log_artifacts

set_tracking_uri('http://localhost:5000')
set_experiment(experiment_name='kaggle-nba')

with start_run(run_name='Logistic Regression penalty and C test'):
    df = pd.read_csv('../data/raw/train.csv')
    target = df.pop('TARGET_5Yrs')
    df = df.loc[:, 'GP':'TOV']

    penalty = 'l2'
    C = 1.4

    log_param('penalty', penalty)
    log_param('C', C)

    clf = LogisticRegression(penalty=penalty, C=C, max_iter=10_000)
    clf.fit(df, target)
    accuracy = clf.score(df, target)

    log_metric('accuracy', accuracy)

    end_run()