# MLFlow Projects

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-project.png" width=600>

## What you will learn in this course 

What is very cool is to have a standard way of organizing your ML projects so that you can implement trainings easily. MLFlow projects lets you do that. In this course, you will learn: 

* What is MLFlow projects
* How to organize an MLFlow projects
* Understand config files in MLFlow projects

## What is MLFlow Projects? 🤔🤔

MLFlow Project is a way for you to standardize your projects so that you can use them with any types of technologies and train your models remotely.

## How is structured an MLFlow Project 🗂️🗂️

Now that you registered your metrics and your model, your project should look something like this: 

```shell 
├── mlruns
│   └── 0
│       ├── 0a2b502f674949b4acb8dfce6549a7fb
│       │   ├── artifacts
│       │   │   └── model
│       │   │       ├── MLmodel
│       │   │       ├── conda.yaml
│       │   │       └── model.pkl
│       │   ├── meta.yaml
│       │   ├── metrics
│       │   │   └── Accuracy
│       │   ├── params
│       │   │   └── C
│       │   └── tags
│       │       ├── mlflow.log-model.history
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
├── train.ipynb
```

In this structure what is actually important to understand is: 

- `artifacts` folder: where you store informations about your model to deploy it.
- `meta.yaml` file: where you have all the information regarding your run.

## Artifacts 🏛️🏛️

Artifacts are the place where you have all the information regarding the environment when your model has been trained. Especially, you have three files: 

* `MLModel`
* `conda.yml`
* `model.pkl`, if you persisted a sklearn model. But you can have other types of files if you persisted a TensorFlow, Pytorch or any other type of model.

### `MLModel` 

An `MLModel` file should look something like this: 

```yaml
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.7.3
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.23.1
run_id: 0a2b502f674949b4acb8dfce6549a7fb
utc_time_created: '2020-06-14 17:23:39.122114'
```

It gives all necesaries informations to run your model. Especially be careful with: 

- `env`: by default you'll get a `conda` environment but you can setup a `Docker` environment,
- `sklearn_version`: be really careful with the versions registered here as it might not be available in your servers.

### `conda.yaml`

As stated before, MLFlow Project will create a conda environment so that you can run your project on any server. A `conda.yaml` look like this: 

```yaml
channels:
- defaults
dependencies:
- python=3.7.3
- scikit-learn=0.23.1
- pip
- pip:
  - mlflow
  - cloudpickle==1.4.1
name: mlflow-env
```

As you can see, you have all the dependencies stated here. Again be careful with versions stated in your YAML file as some servers might not be able to run them.

## Remote training 📺📺

Now that we have a better understanding of our project. We can easily train a model remotely simply by organizing our project the following way:

```shell 
├── MLProject
├── conda.yaml
└── train.py
```

### Entry Point file 

`train.py` is called the entry point file. You need to have a script or an `.sh` file because notebooks are not accepted. Therefore you need to have your whole training process into your own file. The content look like this:

```python
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

iris = load_iris()

X = pd.DataFrame(data = iris["data"], columns= iris["feature_names"])
y = pd.DataFrame(data = iris["target"], columns=["target"])

X_train, X_test, y_train, y_test = train_test_split(X, y)

# Mlflow tracking
import sys 

with mlflow.start_run():
    
    # Specified Parameters 
    c = float(sys.argv[2]) if len(sys.argv) > 1 else 0.5 # Let the user specify C argument via Cli

    # Instanciate and fit the model 
    lr = LogisticRegression(C=c)
    lr.fit(X_train.values, y_train.values)

    # Store metrics 
    predicted_qualities = lr.predict(X_test.values)
    accuracy = lr.score(X_test.values, y_test.values)

    # Print results 
    print("LogisticRegression model")
    print("Accuracy: {}".format(accuracy))

    # Log Metric 
    mlflow.log_metric("Accuracy", accuracy)

    # Log Param
    mlflow.log_param("C", c)
```

### `conda.yaml`

`conda.yaml` won't change from what we showed you above.

### `MLProject`

In your MLProject file, you can specify few additionnal things like let the user specify some parameters and you will need to specify to you `entry_points.py`:

```yaml
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.7.3
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.23.1
run_id: 0a2b502f674949b4acb8dfce6549a7fb
utc_time_created: '2020-06-14 17:23:39.122114'

entry_points:
  main:
    parameters:
      c: {type: float, default: 0.8}
    command: "python train.py -c {c}"
```

As you can see in the `entry_points:` section we specified some `parameters` and provide the command `python train.py -c` which will be the command that will be run when we'll be calling the model.

## Run a training process 🏃‍♀️🏃‍♀️

Now finally, if you have all these files ready, you can simply run a CLI command from any server: 

```shell 
$ mlflow run path_to_your_project
```

> NB: if you are already in the project folder, you can simply do:</br>
> ```shell 
> $ mlflow run . 
> ```

To add parameters:
```shell
$ mlflow run path_to_your_project -P c=0.1
```

## Resources 

* <a href="https://mlflow.org/docs/latest/projects.html" target="_blank">Mlflow Projects</a>
* <a href="https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html" target="_blank">Mflow Tutorial</a>