In [1]:
!python -c "import sys; print(sys.executable)"

/home/muhammadtalha/anaconda3/bin/python


# MLFlow lab

In [2]:
import pandas as pd
import mlflow

In [3]:
pd.__version__

'2.0.0'

### Setting up MLFlow tracking server

We also specify artifact root and backend store URI. This makes it possible to store models.

After running this command tracking server will be accessible at `localhost:5000`

In [4]:
%%bash --bg

mlflow server --host 0.0.0.0 \
    --port 5000 \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns

### MLProject file

This file is used to configure MLFlow steps.

Using `MLproject` we can define our project's pipeline steps, called *entry points*.

Each entry point in this file corresponds to a shell command.

Entry points can be ran using

```
mlflow run -e <ENTRY_POINT>
```

By default `mlflow run` runs `main` entrypoint.

In [5]:
%cat MLproject

name: basic_mlflow

# this file is used to configure Python package dependencies.
# it uses Anaconda, but it can be also alternatively configured to use pip.
conda_env: conda.yaml

# entry points can be ran using `mlflow run <project_name> -e <entry_point_name>
entry_points:
  download_data:
    # you can run any command using MLFlow
    command: "bash download_data.sh"
  # MLproject file has to have main entry_point. It can be toggled without using -e option.
  main:
    # parameters is a key-value collection.
    parameters:
      file_name:
        type: str
        default: "day.csv"
      max_k:
        type: int
        default: 10
    command: "python train.py {file_name} {max_k}"


First we need to download data. We will use weather data from previous machine learning tutorial.

## Training

Now we can train models. See `train.py`.
It contains code from supervised machine learning tutorial; we added tracking metrics and model.

We will train kNN models for $k \in \{1, 2, ..., 10\}$ using *temperature* and *casual* features.

After running this command you can go to `localhost:5000` and see the trained models.

In [7]:
import sklearn

In [8]:
sklearn.__version__

'1.2.1'

In [20]:
%%bash
source mlflow_env_vars.sh
mlflow run . 

Traceback (most recent call last):
  File "/home/muhammadtalha/anaconda3/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/mlflow/cli.py", line 202, in run
    projects.run(
  File "/home/muhammadt

CalledProcessError: Command 'b'source mlflow_env_vars.sh\nmlflow run . \n'' returned non-zero exit status 1.

## Inspecting stored models

The trained models are stored in `mlruns/0`.

These directories contain artifacts and config that is needed to serve them.

In [11]:
%%bash
last_model_path=$(ls -tr mlruns/0/ | tail -1)
cat mlruns/0/$last_model_path/artifacts/model/MLmodel

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.10.9
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.2.1
mlflow_version: 2.3.1
model_uuid: 25e67b71408c47338013e09b1535d0ac
run_id: 5c91048d5cd44db99472a6dae8489e7e
utc_time_created: '2023-05-04 03:32:47.131806'


In [13]:
import mlflow

In [14]:
mlflow.__version__

'2.3.1'

## Serving model

Now that we trained our models we can go to *Models* page on MLFLow UI (http://localhost:5000/#/models).

Click *sklearn_knn* on this page, choose a model and move it to *Production* stage.

The following cell will serve the model at localhost on port 5001.

In [18]:
%%bash 
source mlflow_env_vars.sh
mlflow --version
mlflow models serve -m models:/sklearn_knn/Production -p 5001 --env-manager=conda 


mlflow, version 2.3.1


  value = self.callback(ctx, self, value)
Traceback (most recent call last):
  File "/home/muhammadtalha/anaconda3/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/muhammadtalha/anaconda3/lib/python3.10/site-packages/click/core

CalledProcessError: Command 'b'source mlflow_env_vars.sh\nmlflow --version\nmlflow models serve -m models:/sklearn_knn/Production -p 5001 --env-manager=conda \n'' returned non-zero exit status 1.

# Prediction

We'll load data that we can feed into prediction server.

In [14]:
# df = pd[["temp", "casual", "season"]]
# df["is_winter"] = df["season"] == 1

# df[~df["is_winter"]].head()

Unnamed: 0,temp,casual,season,is_winter
79,0.430435,401,2,False
80,0.441667,460,2,False
81,0.346957,203,2,False
82,0.285,166,2,False
83,0.264167,300,2,False


Let's predict for first winter day and first non-winter day (first rows of previous two dataframes)

**warning: this might fail at first because the prediction server didn't spin up; in this case wait a minute**

In [17]:
%%bash
data='[[0.344,331], [0.43, 401]]'
echo $data

curl -d "{\"inputs\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

[[0.344,331], [0.43, 401]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 127.0.0.1 port 5001 after 0 ms: Couldn't connect to server


CalledProcessError: Command 'b'data=\'[[0.344,331], [0.43, 401]]\'\necho $data\n\ncurl -d "{\\"inputs\\": $data}" -H \'Content-Type: application/json\' 127.0.0.1:5001/invocations\n'' returned non-zero exit status 7.

In [2]:
%%bash
data='[[0.344,331], [0.43, 401]]'
echo $data

curl -d "{\"instances\": $data}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

[[0.344,331], [0.43, 401]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    71  100    30  100    41   9606  13128 --:--:-- --:--:-- --:--:-- 23666


{"predictions": [true, false]}

In [3]:
%%bash
data='[[0.344,331], [0.43, 401]]'
columns='["temp","casual"]'
echo $data

curl -d "{\"dataframe_split\":{\"columns\":[\"temp\",\"casual\"],\"data\": $data}}" -H 'Content-Type: application/json' 127.0.0.1:5001/invocations

[[0.344,331], [0.43, 401]]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   114  100    30  100    84   8888  24888 --:--:-- --:--:-- --:--:-- 38000


{"predictions": [true, false]}

Voila! We see that the model outputs correct predictions.