# MLFLOW
First of all create a virtual environment and then install required dependencies. All the necessary instructions are given in the README.md file.
After installing dependencies start jupyter-lab

In [22]:
!python -c "import sys; print(sys.executable)"

/home/all/Documents/mlops-task/mlops-student/bin/python



source ml# MLFlow lab

In [23]:
import pandas as pd

In [24]:
import mlflow

In [25]:
pd.__version__

'2.0.1'

### Setting up MLFlow tracking server

We also specify artifact root and backend store URI. This makes it possible to store models.

After running this command tracking server will be accessible at `localhost:5000`

In [26]:
%%bash --bg

mlflow server --host 0.0.0.0 \
    --port 5000\
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns    #path to the folder where we want to store data

### MLProject file

This file is used to configure MLFlow steps.

Using `MLproject` we can define our project's pipeline steps, called *entry points*.

Each entry point in this file corresponds to a shell command.

Entry points can be ran using

```
mlflow run -e <ENTRY_POINT>
```

By default `mlflow run` runs `main` entrypoint.

In [27]:
%cat MLproject

name: basic_mlflow

# this file is used to configure Python package dependencies.
# it uses Anaconda, but it can be also alternatively configured to use pip.
conda_env: conda.yaml

# entry points can be ran using `mlflow run <project_name> -e <entry_point_name>
entry_points:
  
  main:
    # parameters is a key-value collection.
   
    command: "python train.py"



source mlFirst we need to download data. We will use weather data from previous machine learning tutorial.

## Training

Now we can train models. See `train.py`.
It contains code from supervised machine learning tutorial; we added tracking metrics and model.

We will train kNN models for $k \in \{1, 2, ..., 10\}$ using *temperature* and *casual* features.

After running this command you can go to `localhost:5000` and see the trained models.

In [28]:
import sklearn

In [29]:
sklearn.__version__

'1.2.2'

In [30]:
%%bash
source mlflow_env_vars.sh
mlflow run .   #this dot tells to run the file in same directory, if the file is in another folder we have to give the path

2023/05/08 12:37:20 INFO mlflow.utils.conda: Conda environment mlflow-dd0fbdd40ba98798131458f29496394bd1a3fb33 already exists.
2023/05/08 12:37:20 INFO mlflow.projects.utils: === Created directory /tmp/tmpitiiw3i2 for downloading remote URIs passed to arguments of type 'path' ===
2023/05/08 12:37:20 INFO mlflow.projects.backend.local: === Running command 'source /home/all/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-dd0fbdd40ba98798131458f29496394bd1a3fb33 1>&2 && python train.py' in run with ID 'a9a74d7fc75f4bc7b8b857a6a7537f54' === 
Registered model 'sklearn_rf' already exists. Creating a new version of this model...
2023/05/08 12:37:23 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: sklearn_rf, version 110
Created version '110' of model 'sklearn_rf'.
2023/05/08 12:37:23 INFO mlflow.projects: === Run (ID 'a9a74d7fc75f4bc7b8b857a6a7537f54') succeeded ===



source ml## Inspecting stored models

The trained models are stored in `mlruns/0`.

These directories contain artifacts and config that is needed to serve them.

In [31]:
%%bash
last_model_path=$(ls -tr mlruns/0/ | tail -1)
cat mlruns/0/$last_model_path/artifacts/model/MLmodel

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.10.6
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.2.2
mlflow_version: 2.3.1
model_uuid: 838d7f3e6391451eb61b4b3bfc43d700
run_id: a9a74d7fc75f4bc7b8b857a6a7537f54
utc_time_created: '2023-05-08 07:37:22.194208'


In [32]:
import mlflow

In [33]:
mlflow.__version__

'2.3.1'

## Serving model

Now that we trained our models we can go to *Models* page on MLFLow UI (http://localhost:5000/#/models).

Click *sklearn_knn* on this page, choose a model and move it to *Production* stage.

The following cell will serve the model at localhost on port 5009.

In [34]:
%%bash --bg
source mlflow_env_vars.sh
mlflow --version
mlflow models serve -m models:/sklearn_rf/Production -p 5009 --env-manager=conda 

# Prediction

In [39]:
%%bash 
data='[[13.049, 0.399, 2.43, 21.68, 71.78, 2.33, 1.8699, 0.09, 1.3, 1.06, 1.13, 2.45, 96.79]]'
echo $data

curl -d "{\"inputs\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5009/invocations

[[13.049, 0.399, 2.43, 21.68, 71.78, 2.33, 1.8699, 0.09, 1.3, 1.06, 1.13, 2.45, 96.79]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   121  100    22  100    99   1554   6993 --:--:-- --:--:-- --:--:--  8642


{"predictions": [1.0]}

Voila! We see that the model outputs correct predictions.