# Build AML Pipeline with AML Module 

In this tutorial you will learn how to work with Azure ML Module:</br>
1. Setup enrivonment - install module CLI and module/pipeline SDK </br>
2. Register a few sample modules into your aml workspace using CLI </br>
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2 </br>

## Prerequisite
* Install Azure CLI, please follow [the Azure CLI installation instructions](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install.

## Setup environment
* Install Azure CLI AML extension which includes the _module_ command group
* Install Azure ML SDK including the APIs to work with _module_ and _pipeline_

In [None]:
# Uninstall azure-cli-ml (the `az ml` commands)
!az extension remove -n azure-cli-ml

# Install local version of azure-cli-ml (which includes `az ml module` commands)
!az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13349440/azure_cli_ml-0.1.0.13349440-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13349440 --yes

In [1]:
# Verify the availability of `az ml module` commands
!az ml module -h


Group
    az ml module : Commands to manage modules.

Commands:
    disable             : Disable a module.
    download            : Download a module to a specified directory.
    enable              : Enable a module.
    list                : List modules in a workspace.
    register            : Create or upgrade a module.
    set-default-version : Set default version of a module.
    show                : Show detail information of a module.
    validate-spec       : Validate module spec file.

[1mFor more specific examples, use: az find "az ml module"[0m

[33m[1mPlease let us know how we are doing: [34mhttps://aka.ms/clihats[0m
[0m

In [2]:
# Install azureml-sdk with Pipeline, Module
%config IPCompleter.greedy=True
!pip install azureml-pipeline-wrapper==0.1.0.13397231 --extra-index-url https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13397231 --user --upgrade

Looking in indexes: https://pypi.org/simple, https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13397231
Requirement already up-to-date: azureml-pipeline-wrapper==0.1.0.13397231 in ./.local/lib/python3.7/site-packages (0.1.0.13397231)


## Register azureml module
You can register AML module through azure-cli-ml or [ml.azure.com](https://ml.azure.com/). Module could be registered from: <br>
- local path
- public Github url
- Azure DevOps build artifacts

In [3]:
# you need to configure your ws information here

subscription_id = '74eccef0-4b8d-4f83-b5f9-fa100d155b22'
workspace_name = 'lisal-amlservice'
resource_group = 'lisal-dev'

In [4]:
# Configure your aml workspace 
!az login
!az account set -s $subscription_id
!az ml folder attach -w $workspace_name -g $resource_group
!az ml module list -o table

[33mTo sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code A33CYXNTX to authenticate.[0m
[33mThe following tenants don't contain accessible subscriptions. Use 'az login --allow-no-subscriptions' to have tenant level access.[0m
[33m415bf3ca-9434-4156-8b6c-80442d16bfbb[0m
[
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "id": "ef3a6d54-2b53-49f6-b905-7c7ec83487a6",
    "isDefault": false,
    "managedByTenants": [],
    "name": "Visual Studio Enterprise",
    "state": "Enabled",
    "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "user": {
      "name": "lisal@microsoft.com",
      "type": "user"
    }
  },
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "id": "c3f24a87-02ce-4580-8eda-8189454f1ba2",
    "isDefault": false,
    "managedByTenants": [
      {
        "tenantId": "2f4a9838-26b7-47ee-be60-ccc1fdec5953"
      }
    ],

In [None]:
# Register azureml modules from github url

!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/mpi_train.yaml
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/score.yaml
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/eval.yaml
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml

In [None]:
# list available custom module in aml workspace
!az ml module list -o table

## Create pipeline
You can build pipeline through SDK experience, or drag-n-drop way through [Designer](https://ml.azure.com/visualinterface?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/kubeflow-demo/workspaces/kubeflow_ws_1&flight=cm,nml,newGraphDetail,newGraphAuthoring,all&tid=72f988bf-86f1-41af-91ab-2d7cd011db47) in workspace portal


In [5]:
from azureml.core import Workspace, Run, Dataset
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)

# get modules
train_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='MPI Train')
score_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Score')
eval_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Evaluate')
compare_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Compare 2 Models')

# if you want to test a local module, below function allow user to test as anounymous module
# compare_module_func = Module.load_from_yaml(ws, yaml_file='./CompareModdels/compare2.yaml')

# get dataset
try:
    Dataset.get_by_name(ws, 'aml_module_training_data')
    print('Training dataset found in workspace')
except:
    print('Registering a training dataset for sample pipeline ...')
    training_dataset = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    training_dataset.register(workspace=ws, name='aml_module_training_data', description='Training data (just for illustrative purpose)')
    print("Registerd")
try:
    Dataset.get_by_name(ws, 'aml_module_test_data')
    print('Test dataset found in workspace')
except:
    print('Registering a test dataset for sample pipeline ...')
    test_dataset = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    test_dataset.register(workspace=ws, name='aml_module_test_data', description='Test data (just for illustrative purpose)')
    print('Registered')

    
train_data = Dataset.get_by_name(ws, 'aml_module_training_data')
test_data = Dataset.get_by_name(ws, 'aml_module_test_data')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Training dataset found in workspace
Test dataset found in workspace


In [6]:
# get available compute target

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
pipeline_compute = "always-on-ds2v2"

# Verify that cluster does not exist already
try:
    compute = ComputeTarget(workspace=ws, name=pipeline_compute)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    compute = ComputeTarget.create(ws, pipeline_compute, compute_config)

    compute.wait_for_completion(show_output=True)

Found existing cluster, use it.


In [7]:
# define a sub pipeline
@dsl.pipeline(name = 'A sub pipeline including train/score/eval', 
              description = 'train model and evaluate model perf')
def training_pipeline(input_data, learning_rate):
    train = train_module_func(
        training_data=input_data, 
        max_epochs=5, 
        learning_rate=learning_rate)
   
    train.runsettings.configure(process_count_per_node = 2, node_count = 2)

    score = score_module_func(
        model_input=train.outputs.model_output, 
        test_data=test_data)

    eval = eval_module_func(scoring_result=score.outputs.score_output)
    
    return {'eval_output': eval.outputs.eval_output, 'model_output': train.outputs.model_output}


In [8]:
# define pipeline with sub pipeline
@dsl.pipeline(name = 'A dummy pipeline that trains multiple models and output the best one', 
              description = 'select best model trained with different learning rate')
def dummy_automl_pipeline():
    train_and_evalute_model1 = training_pipeline(train_data, 0.01)
    train_and_evalute_model2 = training_pipeline(train_data, 0.02)
    
    compare = compare_module_func(
        model1=train_and_evalute_model1.outputs.model_output, 
        eval_result1=train_and_evalute_model1.outputs.eval_output,
        model2=train_and_evalute_model2.outputs.model_output,
        eval_result2=train_and_evalute_model2.outputs.eval_output
    )

    return {**compare.outputs}

# create a pipeline
pipeline = dummy_automl_pipeline()

In [9]:
# validate pipeline and visualize the graph
pipeline.validate(ws, default_compute_target=pipeline_compute)

{'result': 'validation passed', 'errors': []}

In [None]:
# Submit a pipeline run
run = pipeline.submit_run(
    ws,
    default_compute_target = pipeline_compute,
    experiment_name = 'pipeline-with-azureml-module'
)

run.wait_for_completion()

Submitted PipelineRun 0cf90a62-a8e2-4992-9b0a-6f9c0e5386f7
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-with-azureml-module/runs/0cf90a62-a8e2-4992-9b0a-6f9c0e5386f7?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/lisal-dev/workspaces/lisal-amlservice
PipelineRunId: 0cf90a62-a8e2-4992-9b0a-6f9c0e5386f7
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-with-azureml-module/runs/0cf90a62-a8e2-4992-9b0a-6f9c0e5386f7?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/lisal-dev/workspaces/lisal-amlservice
PipelineRun Status: NotStarted
PipelineRun Status: Running


StepRunId: 1b138723-dd2e-40e4-b4d3-1de1c3e7829a
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-with-azureml-module/runs/1b138723-dd2e-40e4-b4d3-1de1c3e7829a?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/lisal-dev/workspaces/lisal-amlservice
StepRun( MPI Train ) 