# Exercise09 : ML Pipeline

With AML pipeline, you can create ML workflows for such as following purposes.

- You can build retraining pipeline for MLOps integration.
- You can build batch-scoring pipeline instead of real-time scoring in "[Exercise08 : Publish as a Web Service](./exercise08_publish_model.ipynb)".

> Note : See [here](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/ai/mlops-python) for the reference architecture integrating with CI/CD tools.

ML pipeline can be invoked by the following methods. 

- Time-based schedule invocation
- On-demand invocation by the published endpoint (REST)
- Trigger-based invocation, such as, file change or other combined events (with Azure Event Grid, Azure Logic Apps, etc)

In this exercise, we create a simple training pipeline, which returns model metrics in top-level (pipeline's) outputs.

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## Initialize MLClient

Replace below's branket's string with your subscription id, resource group name, and AML workspace name.<br>
(I note that creating ```MLClient``` will not connect to AML workspace, and the client initialization is lazy.)

Using ```ClientSecretCredential()```, you would be able to involve ML pipeline in CI/CD utilities (such as, in GitHub actions) without login UI.

In [1]:
from azure.ai.ml import MLClient
from azure.identity import DeviceCodeCredential, TokenCachePersistenceOptions

# When you run on remote
cache_opt = TokenCachePersistenceOptions(allow_unencrypted_storage=True)
cred = DeviceCodeCredential(cache_persistence_options=cache_opt)

# # When you run on Azure ML Notebook
# from azure.identity import DefaultAzureCredential
# cred = DefaultAzureCredential()

# Get a handle to the workspace
ml_client = MLClient(
    credential=cred,
    subscription_id="{SUBSCRIPTION ID}",
    resource_group_name="{RESOURCE GROUP NAME}",
    workspace_name="{AML WORKSPACE NAME}",
)

## 2. Create compute

Create your new AML compute for running pipeline.

When the pipeline is invoked, the compute will be started. When the pipeline is completed, this compute will be automatically scaled down to zero.

In [2]:
from azure.ai.ml.entities import AmlCompute

try:
    compute_target = ml_client.compute.get("mycluster01")
    print("found existing: ", compute_target.name)
except Exception:
    print("creating new.")
    compute_target = AmlCompute(
        name="mycluster01",
        type="amlcompute",
        size="Standard_D2_v2",
        min_instances=0,
        max_instances=1,
        tier="Dedicated",
    )
    compute_target = ml_client.begin_create_or_update(compute_target)

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ADSC82ZC9 to authenticate.
found existing:  mycluster01


## 3. Create an environment

First, create a custom environment (with TensorFlow) to run scripts.

In [3]:
%%writefile 09_conda_pydata.yml
name: project_environment
dependencies:
- python=3.8
- pip:
  - tensorflow==2.10.0
channels:
- anaconda
- conda-forge

Writing 09_conda_pydata.yml


In [4]:
from azure.ai.ml.entities import Environment

myenv = Environment(
    name="test-remote-cpu-env",
    description="This is example",
    conda_file="09_conda_pydata.yml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)
myenv = ml_client.environments.create_or_update(myenv)

Go to [AML Studio UI](https://ml.azure.com/) and click "Environments". Next, click "Custom environments" tab and select the above environment.<br>
Please wait until the environment image build status is succeeded.

![Environment status](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20221220_Environment_Status.jpg)

## 4. Save scripts

In this example, I create a pipeline for model training, evaluation, and model registration.<br>
In this pipeline, the following steps will be executed.

1. The model is trained.
2. The model accuracy is evaluated. The model metrics is set as pipeline's output.

Each source code will then be saved as follows.

- training script ```./pipeline_script/train.py```
- evaluation script ```./pipeline_script/evaluate.py```

In [5]:
import os
script_folder = './pipeline_script'
os.makedirs(script_folder, exist_ok=True)

In [6]:
%%writefile pipeline_script/train.py
import os
import argparse
import tensorflow as tf

# device test
print("##### List of available GPU #####")
print(tf.config.list_physical_devices("GPU"))

# parse arguments
parser = argparse.ArgumentParser()
parser.add_argument(
    "--data_folder",
    type=str,
    default="./data/train",
    help="Folder path for input data")
parser.add_argument(
    "--model_folder",
    type=str,
    default="./outputs",  # AML experiments outputs folder
    help="Folder path for model output")
parser.add_argument(
    "--learning_rate",
    type=float,
    default="0.001",
    help="Learning Rate")
parser.add_argument(
    "--first_layer",
    type=int,
    default="128",
    help="Neuron number for the first hidden layer")
parser.add_argument(
    "--second_layer",
    type=int,
    default="64",
    help="Neuron number for the second hidden layer")
parser.add_argument(
    "--epochs_num",
    type=int,
    default="6",
    help="Number of epochs")
FLAGS, unparsed = parser.parse_known_args()

# build model
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(FLAGS.first_layer, activation="relu"),
    tf.keras.layers.Dense(FLAGS.second_layer, activation="relu"),
    tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(FLAGS.learning_rate),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

# run training
train_data = tf.data.experimental.load(FLAGS.data_folder)
model.fit(
    train_data.shuffle(1000).batch(128).prefetch(tf.data.AUTOTUNE),
    epochs=FLAGS.epochs_num
)

# save model and variables
model_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
model.save(model_path)
print("current working directory : ", os.getcwd())
print("model folder : ", model_path)

Writing pipeline_script/train.py


In [7]:
%%writefile pipeline_script/evaluate.py
import os
import json
import argparse
import tensorflow as tf

parser = argparse.ArgumentParser()
parser.add_argument(
    '--data_folder',
    type=str,
    default='./data/test',
    help='Folder path for input data')
parser.add_argument(
    '--model_folder',
    type=str,
    default='./model',
    help='Folder path for model base dir')
parser.add_argument(
    '--output_info',
    type=str,
    default='./output_info',
    help='File path for model registration info')
FLAGS, unparsed = parser.parse_known_args()

# load data
test_data = tf.data.experimental.load(FLAGS.data_folder)

# load model
model_folder_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
loaded_model = tf.keras.models.load_model(model_folder_path)

# evaluate
results = loaded_model.evaluate(test_data.batch(128))
print("Loss: {}, Accuracy: {}".format(results[0], results[1]))

# write result (metrics)
output_info = {
    "accuracy" : float(results[1]),
    "loss" : float(results[0])
}
output_json = json.dumps(output_info)
f = open(FLAGS.output_info,"w")
f.write(output_json)
f.close()

Writing pipeline_script/evaluate.py


## 5. Build and Run ML pipeline

Now let's compose pipeline in yaml, and submit a job for the generated pipeline.

First, define command objects, which are all used in pipeline.

> Note : In this example, I also use the registered data asset named ```mnist_data``` to mount in your compute target. Run "[Exercise02 : Prepare Data](./exercise02_prepare_data.ipynb)" for dataset preparation.

In [8]:
from azure.ai.ml import command, Input, Output

# 1. Create a command to train model
train_model_command = command(
    name="train_model",
    display_name="train_model",
    code="./pipeline_script",
    command="python train.py --data_folder ${{inputs.tf_dataset}}/train --model_folder ${{outputs.model_dir}}",
    environment="test-remote-cpu-env@latest",
    inputs={
        "tf_dataset": Input(type="uri_folder"),
    },
    outputs={
        "model_dir": Output(type="uri_folder"),
    },
)

# 2. Create a command to evaluate model
evaluate_model_command = command(
    name="evaluate_model",
    display_name="evaluate_model",
    code="./pipeline_script",
    command="python evaluate.py --data_folder ${{inputs.tf_dataset}}/test --model_folder ${{inputs.model_dir}} --output_info ${{outputs.model_info}}/metrics.txt",
    environment="test-remote-cpu-env@latest",
    inputs={
        "tf_dataset": Input(type="uri_folder"),
        "model_dir": Input(type="uri_folder"),
    },
    outputs={
        "model_info": Output(type="uri_folder"),
    },
)

Next build pipeline with above commands.

In [9]:
from azure.ai.ml.dsl import pipeline

@pipeline(default_compute="mycluster01")
def training_pipeline(training_input):
    train_node = train_model_command(
        tf_dataset=training_input
    )
    eval_node = evaluate_model_command(
        tf_dataset=training_input,
        model_dir=train_node.outputs.model_dir
    )
    return {"output_folder": eval_node.outputs.model_info}

pipeline_job = training_pipeline(
    training_input = Input(
        type="uri_folder",
        path="mnist_data@latest",
    )
)

Submit a job to run this pipeline.

In [10]:
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="training-pipeline01"
)

[32mUploading pipeline_script (0.0 MBs): 100%|█████████████████████████████████| 2936/2936 [00:00<00:00, 92624.17it/s][0m
[39m



Go to [AML studio UI](https://ml.azure.com/) and see pipeline results in jobs. (See below.)

![Pipeline results](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20220225_Experiment_Pipeline.jpg)

You can extract model metrics in pipeline outputs.<br>
If it's passed in this training pipeline, you can then invoke the next stage in MLOps integration.

## 6. Remove Compute

You don't need to remove your AML compute for saving money, because the nodes will be automatically terminated, when it's inactive.<br>
But if you want to clean up, please run as follows.

In [14]:
ml_client.compute.begin_delete("mycluster01")

Deleting compute mycluster01 


.....................................

Done.
(3m 7s)

