# Exercise09 : ML Pipeline

With AML pipeline, you can create ML workflows for such as following purposes.

- You can build retraining pipeline for MLOps integration.
- You can build batch-scoring pipeline instead of real-time scoring in "[Exercise08 : Publish as a Web Service](./exercise08_publish_model.ipynb)".

> Note : See [here](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/ai/mlops-python) for the reference architecture integrating with CI/CD tools.

ML pipeline can be invoked by the following methods. 

- Time-based schedule invocation
- On-demand invocation by the published endpoint (REST)
- Trigger-based invocation, such as, file change or other combined events (with Azure Event Grid, Azure Logic Apps, etc)

In this exercise, we create a simple training pipeline, which returns model metrics in top-level (pipeline's) outputs.

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## 1. Variable's Setting

Replace below's branket's string and set the required variables.

Using ```az login --service-principal```, you would be able to involve ML pipeline in CI/CD utilities (such as, in GitHub actions) without login UI.

> Note : By the following ```az configure --defaults```, you can skip setting for ```--resource-group``` and ```--workspace-name``` options in each ```az ml``` command.<br>
> ```az configure --defaults group=$resource_group workspace=$aml_workspace```

In [1]:
my_resource_group = "{AML-RESOURCE-GROUP-NAME}"
my_workspace = "{AML-WORSPACE-NAME}"

## 2. Create compute

Create your new AML compute for running pipeline.

When the pipeline is invoked, the compute will be started. When the pipeline is completed, this compute will be automatically scaled down to zero.

In [2]:
!az ml compute create --name mycluster01 \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --type amlcompute \
  --min-instances 0 \
  --max-instances 1 \
  --size Standard_D2_v2

[K{\ Finished ..
  "id": "/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/computes/mycluster01",
  "idle_time_before_scale_down": 120,
  "location": "eastus",
  "max_instances": 1,
  "min_instances": 0,
  "name": "mycluster01",
  "network_settings": {},
  "provisioning_state": "Succeeded",
  "resourceGroup": "AML-rg",
  "size": "STANDARD_D2_V2",
  "ssh_public_access_enabled": true,
  "tier": "dedicated",
  "type": "amlcompute"
}
[0m

## 3. Create an environment

First, create a custom environment (with TensorFlow) to run scripts.

In [3]:
%%writefile 09_conda_pydata.yml
name: project_environment
dependencies:
- python=3.8
- pip:
  - tensorflow==2.10.0
channels:
- anaconda
- conda-forge

Writing 09_conda_pydata.yml


In [4]:
%%writefile 09_env_register.yml
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: test-remote-cpu-env
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: 09_conda_pydata.yml
description: This is example

Writing 09_env_register.yml


In [5]:
!az ml environment create --file 09_env_register.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

{
  "conda_file": {
    "channels": [
      "anaconda",
      "conda-forge"
    ],
    "dependencies": [
      "python=3.8",
      {
        "pip": [
          "tensorflow==2.10.0"
        ]
      }
    ],
    "name": "project_environment"
  },
  "creation_context": {
    "created_at": "2022-10-04T08:19:55.676910+00:00",
    "created_by": "Tsuyoshi Matsuzaki",
    "created_by_type": "User",
    "last_modified_at": "2022-10-04T08:19:55.676910+00:00",
    "last_modified_by": "Tsuyoshi Matsuzaki",
    "last_modified_by_type": "User"
  },
  "description": "This is example",
  "id": "azureml:/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/environments/test-remote-cpu-env/versions/1",
  "image": "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
  "name": "test-remote-cpu-env",
  "os_type": "linux",
  "resourceGroup": "AML-rg",
  "tags": {},
  "version": "1"
}
[0m

## 4. Save scripts

In this example, I create a pipeline for model training, evaluation, and model registration.<br>
In this pipeline, the following steps will be executed.

1. The model is trained.
2. The model accuracy is evaluated. The model metrics is set as pipeline's output.

Each source code will then be saved as follows.

- training script ```./pipeline_script/train.py```
- evaluation script ```./pipeline_script/evaluate.py```

In [7]:
import os
script_folder = './pipeline_script'
os.makedirs(script_folder, exist_ok=True)

In [8]:
%%writefile pipeline_script/train.py
import os
import argparse
import tensorflow as tf

# device test
print("##### List of available GPU #####")
print(tf.config.list_physical_devices("GPU"))

# parse arguments
parser = argparse.ArgumentParser()
parser.add_argument(
    "--data_folder",
    type=str,
    default="./data/train",
    help="Folder path for input data")
parser.add_argument(
    "--model_folder",
    type=str,
    default="./outputs",  # AML experiments outputs folder
    help="Folder path for model output")
parser.add_argument(
    "--learning_rate",
    type=float,
    default="0.001",
    help="Learning Rate")
parser.add_argument(
    "--first_layer",
    type=int,
    default="128",
    help="Neuron number for the first hidden layer")
parser.add_argument(
    "--second_layer",
    type=int,
    default="64",
    help="Neuron number for the second hidden layer")
parser.add_argument(
    "--epochs_num",
    type=int,
    default="6",
    help="Number of epochs")
FLAGS, unparsed = parser.parse_known_args()

# build model
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(FLAGS.first_layer, activation="relu"),
    tf.keras.layers.Dense(FLAGS.second_layer, activation="relu"),
    tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(FLAGS.learning_rate),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

# run training
train_data = tf.data.experimental.load(FLAGS.data_folder)
model.fit(
    train_data.shuffle(1000).batch(128).prefetch(tf.data.AUTOTUNE),
    epochs=FLAGS.epochs_num
)

# save model and variables
model_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
model.save(model_path)
print("current working directory : ", os.getcwd())
print("model folder : ", model_path)

Writing pipeline_script/train.py


In [9]:
%%writefile pipeline_script/evaluate.py
import os
import json
import argparse
import tensorflow as tf

parser = argparse.ArgumentParser()
parser.add_argument(
    '--data_folder',
    type=str,
    default='./data/test',
    help='Folder path for input data')
parser.add_argument(
    '--model_folder',
    type=str,
    default='./model',
    help='Folder path for model base dir')
parser.add_argument(
    '--output_info',
    type=str,
    default='./output_info',
    help='File path for model registration info')
FLAGS, unparsed = parser.parse_known_args()

# load data
test_data = tf.data.experimental.load(FLAGS.data_folder)

# load model
model_folder_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
loaded_model = tf.keras.models.load_model(model_folder_path)

# evaluate
results = loaded_model.evaluate(test_data.batch(128))
print("Loss: {}, Accuracy: {}".format(results[0], results[1]))

# write result (metrics)
output_info = {
    "accuracy" : float(results[1]),
    "loss" : float(results[0])
}
output_json = json.dumps(output_info)
f = open(FLAGS.output_info,"w")
f.write(output_json)
f.close()

Writing pipeline_script/evaluate.py


## 5. Build and Run ML pipeline

Now let's compose pipeline in yaml, and submit a job for the generated pipeline.

> Note : In this example, I also use the registered data asset named ```mnist_data``` to mount in your compute target. Run "[Exercise02 : Prepare Data](./exercise02_prepare_data.ipynb)" for dataset preparation.

In [10]:
%%writefile 09_training_pipeline_job.yml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: training-pipeline01
experiment_name: training-pipeline01
compute: azureml:mycluster01
inputs:
  mnist_tf:
    type: uri_folder
    path: azureml:mnist_data@latest
outputs:
  model_metrics:
jobs:
  train_model:
    name: train_model
    display_name: train_model
    command: >-
      python train.py
      --data_folder ${{inputs.tf_dataset}}/train
      --model_folder ${{outputs.model_dir}}
    code: pipeline_script
    environment: azureml:test-remote-cpu-env@latest
    inputs:
      tf_dataset: ${{parent.inputs.mnist_tf}}
    outputs:
      model_dir:
  evaluate_model:
    name: evaluate_model
    display_name: evaluate_model
    command: >-
      python evaluate.py
      --data_folder ${{inputs.tf_dataset}}/test
      --model_folder ${{inputs.model_dir}}
      --output_info ${{outputs.model_info}}/metrics.txt
    code: pipeline_script
    environment: azureml:test-remote-cpu-env@latest
    inputs:
      tf_dataset: ${{parent.inputs.mnist_tf}}
      model_dir: ${{parent.jobs.train_model.outputs.model_dir}}
    outputs:
      model_info: ${{parent.outputs.model_metrics}}

Writing 09_training_pipeline_job.yml


Submit a job to run this pipeline.

In [11]:
!az ml job create --file 09_training_pipeline_job.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

[32mUploading pipeline_script (0.0 MBs): 100%|█| 2937/2937 [00:00<00:00, 76361.24it/[0m
[39m

{
  "compute": "azureml:mycluster01",
  "creation_context": {
    "created_at": "2022-10-04T09:17:35.506408+00:00",
    "created_by": "Tsuyoshi Matsuzaki",
    "created_by_type": "User"
  },
  "display_name": "training-pipeline01",
  "experiment_name": "training-pipeline01",
  "id": "azureml:/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/jobs/mango_neck_x0skq5fhvd",
  "inputs": {
    "mnist_tf": {
      "mode": "ro_mount",
      "path": "azureml:mnist_data:1",
      "type": "uri_folder"
    }
  },
  "jobs": {
    "evaluate_model": {
      "$schema": "{}",
      "component": "azureml:azureml_anonymous:6ce01501-f9bf-42d0-b26f-7e8b9ce09ce6",
      "environment_variables": {},
      "inputs": {
        "model_dir": {
          "path": "${{parent.jobs.train_model.outputs.model_dir}}"
        },
        "tf_dat

Go to [AML studio UI](https://ml.azure.com/) and see pipeline results in jobs. (See below.)

![Pipeline results](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20220225_Experiment_Pipeline.jpg)

You can extract model metrics in pipeline outputs.<br>
If it's passed in this training pipeline, you can then invoke the next stage in MLOps integration.

## 6. Remove Compute

You don't need to remove your AML compute for saving money, because the nodes will be automatically terminated, when it's inactive.<br>
But if you want to clean up, please run as follows.

In [10]:
!az ml compute delete --name mycluster01 \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --yes

Deleting compute mycluster01 
.................................................Done.
(4m 8s)

[0m