# Hello MLflow

MLflow is an open-source platform that provides a way to manage the machine learning lifecycle by allowing users to track experiments, package code, and deploy models across different frameworks. In this notebook, we will be taking a look at the following MLflow components:
- MLflow Tracking - used for **tracking** machine learning experiments
- MLflow Registry - used for **versioning** MLflow models
- MLflow Model - used for **packaging** machine learning models

To begin, let's first install the required dependencies:

```bash
pip install -r requirements.txt
```

## MLflow Tracking

In [1]:
import mlflow


MLFLOW_TRACKING_URI = "https://mlflow.test-data-cluster.tiket.com/" 

# This will tell mlflow which server to log to. 
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

### Experiments & Runs

A **run** represents a single code execution within a machine learning project. These runs are grouped together inside an **experiment**.

In [8]:
MLFLOW_EXPERIMENT = "workshop-playground"

# this will create a new experiment, if an experiment with the same
# name exists, this will raise an exception
mlflow.create_experiment(MLFLOW_EXPERIMENT)

# this will set the default experiment to the specified value,
# if the experiment does not exist, it will automatically create a new experiment
mlflow.set_experiment(MLFLOW_EXPERIMENT)

<Experiment: artifact_location='gs://dev_caelum_model_repo/mlflow/14', creation_time=1689264199195, experiment_id='14', last_update_time=1689264199195, lifecycle_stage='active', name='workshop-playground', tags={}>

In [9]:
# this will create a single run that does not contain any data
with mlflow.start_run():
    pass

A new run should be created in the tracking server UI under the `playground` experiment.

### Logging Parameters and Metrics

The term **logging** simply means to record or save information. In this example, we will be saving the parameters and metrics used during the training of a machine learning model.

In [10]:
# parameters and metrics does not necessarily refer to ML model parameters and metrics. It can be any configuration when training the model, and any metric that you want to track.
parameters = {"lr": 0.005, "dataset_version": "v1.0"}
metrics = {"loss": 0.0001, "accuracy": 0.98}

with mlflow.start_run(run_name="logging-metadata-1"):
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)

### Logging Models
Let's try to do simple logging sklearn models

In [11]:
import pandas as pd
import numpy as np

# Sample Data
rows = 1000
data = pd.DataFrame(
    {
        "x0" : np.random.random(rows) * 10,
        "x1" : np.random.random(rows) * 10
    }
)
A, B = 2.5, -1.33
data["y"] = A * data["x0"] + B * data["x1"] + np.random.random(rows)
print(data.head(3).to_markdown())

|    |      x0 |      x1 |       y |
|---:|--------:|--------:|--------:|
|  0 | 5.47231 | 1.12251 | 12.923  |
|  1 | 8.96003 | 7.0727  | 13.0109 |
|  2 | 7.49247 | 6.34766 | 10.6685 |


In [12]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error


with mlflow.start_run(run_name="sklearn-linear-2"):
    # Train the model
    parameters = {"positive": True}
    metrics = {"mae": None}
    model = LinearRegression(**parameters)
    
    X, y = data[["x0", "x1"]], data["y"]
    model.fit(X, y)

    y_pred = model.predict(X)

    metrics = {"mae": mean_absolute_error(y_pred, y)}

    # Log the parameters and metrics
    mlflow.log_params(parameters)
    mlflow.log_metrics(metrics)
    mlflow.sklearn.log_model(
        sk_model=model,
        artifact_path="model-linear-regression"
    )
    