# Exercise 01 : Track logs and metrics

In this exercise, we track logs on training with Azure Machine Learning backend. All tracked data in MLflow is collected in Azure Machine Learning experiments.

![AML tracking](./images/aml_tracking.png)

*back to [index](https://github.com/tsmatz/mlflow-azureml/)*

First, get MLflow tracking URI for your Azure Machine Learning workspace, and set this URI as current MLflow tracking.

In [1]:
from azureml.core import Workspace
import mlflow

ws = Workspace.get(
    name = "<FILL-AML-WORKSPACE-NAME>",
    subscription_id = "<FILL-AZURE-SUBSCRIPTION-ID>",
    resource_group = "<FILL-RESOUCE-GROUP-NAME>")
tracking_uri = ws.get_mlflow_tracking_uri()
mlflow.set_tracking_uri(tracking_uri)

When you want to check which tracking scheme is used in current context, run as follows.

In [3]:
from urllib.parse import urlparse

urlparse(mlflow.get_tracking_uri()).scheme

'azureml'

Set experiment name for the following experiments.

In [4]:
experimentName = "exercise01-sklearn-autolog-test"
mlflow.set_experiment(experimentName)

2022/03/10 05:33:52 INFO mlflow.tracking.fluent: Experiment with name 'exercise01-sklearn-autolog-test' does not exist. Creating a new experiment.


<Experiment: artifact_location='', experiment_id='24c883f6-12ae-4426-9538-f0f00f495ccd', lifecycle_stage='active', name='exercise01-sklearn-autolog-test', tags={}>

Now we build script for training. (This source code is almost same as [MLflow tutorial sample](https://www.mlflow.org/docs/latest/tutorials-and-examples/tutorial.html).)<br>
As you can see below, logs and metrics will be automatically tracked by MLflow's ```autolog()```.

In [5]:
import os
import sys

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import ElasticNet
import mlflow

def train_model(alpha=0.5, l1_ratio=0.5):
    # Read the wine-quality csv file from the URL
    csv_url = (
        "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    )
    data = pd.read_csv(csv_url, sep=";")

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    mlflow.sklearn.autolog()

    with mlflow.start_run() as my_run:
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)
        rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
        print("RMSE: %s" % rmse)

    print("MLflow run id: %s" % my_run.info.run_id)
    return my_run

Now let's start training !

In [6]:
run = train_model()

RMSE: 0.7434638228253136
MLflow run id: 00a6654a-470f-44bf-85b5-d69b492617df


Go and login to [Azure Machine Learning (AML) studio UI](https://ml.azure.com/).<br>
Click "Experiments" and see the logged run.

In AML studio UI, you will see params and metrics tracked by mlflow's ```autolog()```.

![Params and metrics](./images/params_metrics.png)

You can also get params and metrics by MLflow API.

In [7]:
from mlflow.tracking import MlflowClient

run_result = MlflowClient().get_run(run.info.run_id)
print("params: {}".format(run_result.data.params))
print("metrics: {}".format(run_result.data.metrics))

params: {'alpha': '0.5', 'copy_X': 'True', 'fit_intercept': 'True', 'l1_ratio': '0.5', 'max_iter': '1000', 'normalize': 'deprecated', 'positive': 'False', 'precompute': 'False', 'random_state': '42', 'selection': 'cyclic', 'tol': '0.0001', 'warm_start': 'False'}
metrics: {'training_mae': 0.6108990512309974, 'training_mse': 0.5707319629401092, 'training_score': 0.12745381890406593, 'training_r2_score': 0.12745381890406593, 'training_rmse': 0.7554680423023261}


When you see in outputs, you will find that the trained model is automatically saved in ```model``` folder.

![Model output](./images/model_output.png)

You can also list artifacts in output folder with MLflow API. (See "[Exercise04 : Model management and deployment](./04_model_deploy)" for ```MLmodel``` file.)

In [8]:
artifacts = [f.path for f in MlflowClient().list_artifacts(run_result.info.run_id, "model")]
print("artifacts: {}".format(artifacts))

artifacts: ['model/MLmodel', 'model/conda.yaml', 'model/model.pkl', 'model/requirements.txt']
