# MLFlow Tracking 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-tracking.png" width=600>

## What you will learn in this course

Let's start by the simplest part of the API: Tracking. In this course, you will learn:

* How to install MLFlow
* What is MLFlow Tracking 
* How to monitor your ML workflow
* Visualise it with MLFlow UI

## Setup the environment

We already have seen how to create a virtual environment using `virtualenv`. But it isn't the only tool to build and manage your environment. You can use Conda to do so. We assume you have <a href="https://docs.anaconda.com/anaconda/install/" target="_blank">Anaconda</a> already setup on your machine.

You can refer to the last lecture _Setup Conda environment_ for more details about Conda.

### Start a new environment

In your terminal (MacOS, Linux) or Anaconda Powershell Prompt (Windows) create a new environment for this project:

```shell
$ conda create -n mlflow
```

You can name it as you want. Here we named it `mlflow`. To activate it, enter:

```shell
$ conda activate mlflow
```

You should see `(mlflow)` at the line beginning of your line.

### Setup MLFlow

All those steps should be done inside your activated conda environment.

#### On Windows

As for now, if you want to be able to execute `mlflow ui` you need to follow those steps. In your Anaconda Powershell Prompt, we first install all dependencies using conda:

```shell
$ conda install -c conda-forge mlflow --only-deps
```

Once it finishes, we setup MLFlow with `pip`:

```shell
$ pip install mlflow
```

And you are done!

#### On MacOS or Linux

Installing on MacOS or Linux is pretty straightforward and should not be a problem for you:

```shell
$ conda install -c conda-forge mlflow
```

Accept and you should be good to go!

### Check MLFlow

You can check if everything wen't well:

```shell
$ mlflow --version
mlflow, version 1.12.0
```

## What is MLFlow Tracking?

MLFlow tracking is here to help you to: 

* Monitor your ML trainings,
* Log parameters for hyper-parameter tuning,
* Log metrics for assessing for model performance.

When you are working in teams, an MLFlow tracking server is setup and all data scientists logs into it when they are building their models.

In our case, we will use our local computer to monitor our project.

## Use MLFlow Tracking 

### Our project

Let's start by simply loading <a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris" target="_blank">some data</a> from `sklearn`:

In [2]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load Iris dataset
iris = load_iris()

# Split dataset into X features and Target variable
X = pd.DataFrame(data = iris["data"], columns= iris["feature_names"])
y = pd.DataFrame(data = iris["target"], columns=["target"])

# Split our training set and our test set 
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Visualize dataset 
X_train.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
33,5.5,4.2,1.4,0.2
42,4.4,3.2,1.3,0.2
82,5.8,2.7,3.9,1.2
15,5.7,4.4,1.5,0.4
75,6.6,3.0,4.4,1.4


Now to track your training, what you can do is simply to add:

In [3]:
import mlflow
mlflow.end_run()  # This is to be sure that now MLFlow run is actually in process

In [7]:
# By using `with` the run end automatically at the end if this scope
# We do not need to precise mlflow.end_run()
with mlflow.start_run():
    # Instanciate and fit the model 
    lr = LogisticRegression()
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

LogisticRegression model
Accuracy: 1.0


Finally, the only thing you have left to do is to visualize the results with MLFlow UI by going onto your terminal and type:

```shell 
$ mlflow ui 
```

This will run a local server (`http://127.0.0.1:5000`) and you should be able to check what happened. In the meantime, you project directory should now look something like this: 

```
├── mlruns
│   └── 0
│       ├── 0a2b502f674949b4acb8dfce6549a7fb
│       │   ├── artifacts
│       │   │   └── model
│       │   ├── meta.yaml
│       │   ├── metrics
│       │   ├── params
│       │   └── tags
│       │       ├── mlflow.log-model.history
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
└── train.ipynb ← This notebook for example
```

We will talk about this in the next lecture but it is going to be very useful. 

## Log Metrics 

As of now, we don't have much in our MLFLow UI as well as in our project directory. This is because we haven't logged anything yet. We are going to show you how to log a metric first. 

Do you remember? A metric is something you use to assess the performance of your model. In our case, we use the `accuracy`.

To log a metric, we call:

```python
mlflow.log_metric("METRIC_NAME", metric)
```

That's all. Here is how it looks in the code above:

In [8]:
with mlflow.start_run():
    # Instanciate and fit the model 
    lr = LogisticRegression()
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

LogisticRegression model
Accuracy: 1.0


Run this code and log into your console to see the metric appearing in your new run. 

```shell
$ mlflow ui
```

## Log Parameters 

You can also log parameters of your model to see which one where useful to improve your model's performance. The same way you would do it with metrics, you can log parameters by using: 

```python
mlflow.log_param("PARAM_NAME", param)
```

In [1]:
with mlflow.start_run():
    # Specified Parameters 
    c = 0.5

    # Instanciate and fit the model 
    lr = LogisticRegression(C=c)
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

    # Log Param
    mlflow.log_param("C", c)

NameError: name 'mlflow' is not defined

If you go to your MLFlow UI again, you should see the following screen: 

![](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-ui-log-params.png)

Congratulations! You know how to track your models. This could be useful for your future projects 😉.

## Resources 📚

* <a href="https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html" target="_blank">Mlflow Tutorial</a>
* <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" target="_blank">Logistic Regression</a>