# Build AML Pipeline with azureml modules

In this tutorial you will learn how to work with Azure ML module:

1. Setup enrivonment - install module CLI and module/pipeline SDK
2. Register a few sample modules into your aml workspace using CLI
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2

## Prerequisite
* Install Azure CLI, please follow [the Azure CLI installation instructions](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install.

## Setup environment
* Install Azure CLI AML extension which includes the _module_ command group
* Install Azure ML SDK including the APIs to work with _module_ and _pipeline_

In [None]:
# Uninstall azure-cli-ml (the `az ml` commands)
!az extension remove -n azure-cli-ml 

# Install local version of azure-cli-ml (which includes `az ml module` commands)
!az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/15254362/azure_cli_ml-0.1.0.15254362-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/15254362 --yes --debug

In [None]:
# Verify the availability of `az ml module` commands
#!az ml pipeline -h
!az ml module -h

In [None]:
# Install azureml-sdk with Pipeline, Module
# Important! After install succeed, need to restart kernel

%config IPCompleter.greedy=True 
!pip install azureml-pipeline-wrapper[steps,notebooks]==0.1.0.15254362 --extra-index-url https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/15254362 --user --upgrade

## Register azureml module

You can manage AML module through [azure-cli-ml](https://aka.ms/moduledoc) or [ml.azure.com](https://ml.azure.com/). <br>

Module could be registered from:
- local path
- public Github url
- Azure DevOps build artifacts

Azureml module support multiple module type:
- Basic python module
- Mpi module
- Parallel run module
- Hdi module (pending on backend support)

In [1]:
# you need to configure your ws information here

subscription_id = '74eccef0-4b8d-4f83-b5f9-fa100d155b22'
workspace_name = 'lisal-amlservice'
resource_group = 'lisal-dev'

# Specify available aml compute in workspace
pipeline_compute = "always-on-ds2v2"

In [None]:
# Configure your aml workspace 

!az login 
!az account set -s $subscription_id 
!az ml folder attach -w $workspace_name -g $resource_group 

# Configure global .amlignore, it's designed for register module from local development environment
# !az configure --defaults module_amlignore_file=./.amlignore

In [None]:
# Register azureml modules from github url

!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/mpi_train.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/score.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/eval.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml --set-as-default-version

In [None]:
# list available custom module in aml workspace
!az ml module list -o table 

## Create pipeline
You can build pipeline through SDK experience, or drag-n-drop way through [Designer](https://ml.azure.com/visualinterface?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/kubeflow-demo/workspaces/kubeflow_ws_1&flight=cm,nml,newGraphDetail,newGraphAuthoring,all&tid=72f988bf-86f1-41af-91ab-2d7cd011db47) in workspace portal

The new SDK:
* Symplified the syntax to provide consistent experience with drag-n-drop
* Support intellisense and docstring, free you to work with dict all the time
* Support creating a pipeline with unpublished module

In [2]:
from azureml.core import Workspace, Run, Dataset
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)

# get modules
train_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='MPI Train')
score_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Score')
eval_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Evaluate')
compare_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Compare 2 Models')

# if you have unpublished module in local or github, below function allow user to test as anounymous module
# compare_module_func = Module.from_yaml(ws, yaml_file='./CompareModdels/compare2.yaml')
# compare_module_func = Module.from_yaml(ws, yaml_file='https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml')

# get dataset
training_data_name = 'aml_module_training_data'
test_data_name = 'aml_module_test_data'

if training_data_name not in ws.datasets:
    print('Registering a training dataset for sample pipeline ...')
    train_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    train_data.register(workspace = ws, 
                              name = training_data_name, 
                              description = 'Training data (just for illustrative purpose)')
    print('Registerd')
else:
    train_data = ws.datasets[training_data_name]
    print('Training dataset found in workspace')

if test_data_name not in ws.datasets:
    print('Registering a test dataset for sample pipeline ...')
    test_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    test_data.register(workspace = ws, 
                          name = test_data_name, 
                          description = 'Test data (just for illustrative purpose)')
    print('Registered')
else:
    test_data = ws.datasets[test_data_name]    
    print('Test dataset found in workspace')


If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.
Training dataset found in workspace
Test dataset found in workspace


### dsl pipeline 
* 'pipeline parameter' is exposed as input parameter of pipeline function 
* 'pipeline output' is the return of pipeline function

### module function
* 'module input' can be set through set_inputs() or module initialization function
* 'module parameter' can be set through set_parameter() or module initialization function
* 'module runsetting' including compute, datastore, data mode and other runtime parameter are set through runsettings.configure()


In [3]:
# define a sub pipeline
@dsl.pipeline(name = 'A sub pipeline including train/score/eval', 
              description = 'train model and evaluate model perf')
def training_pipeline(input_data, test_data, learning_rate):
    train = train_module_func(
        training_data=input_data, 
        max_epochs=5, 
        learning_rate=learning_rate)
    
# or
#    train = train_module_func()
#    train.set_inputs(training_data=input_data)
#    train.set_parameters(learning_rate=learning_rate)
#

    train.runsettings.configure(process_count_per_node = 2, node_count = 2)

    score = score_module_func(
        model_input=train.outputs.model_output, 
        test_data=test_data)

    eval = eval_module_func(scoring_result=score.outputs.score_output)
    
    return {'eval_output': eval.outputs.eval_output, 'model_output': train.outputs.model_output}

In [4]:
# define pipeline with sub pipeline
@dsl.pipeline(name = 'A dummy pipeline that trains multiple models and output the best one', 
              description = 'select best model trained with different learning rate',
              default_compute_target = pipeline_compute)
def dummy_automl_pipeline(input_data, test_data):
    train_and_evalute_model1 = training_pipeline(input_data, test_data, 0.01)
    train_and_evalute_model2 = training_pipeline(input_data, test_data, 0.02)
    
    compare = compare_module_func(
        model1=train_and_evalute_model1.outputs.model_output, 
        eval_result1=train_and_evalute_model1.outputs.eval_output,
        model2=train_and_evalute_model2.outputs.model_output,
        eval_result2=train_and_evalute_model2.outputs.eval_output
    )

    return {**compare.outputs}

# create a pipeline
pipeline = dummy_automl_pipeline(train_data, test_data)

In [5]:
# validate pipeline and visualize the graph
pipeline.validate()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

SupportDetectView()

{'result': 'validation passed', 'errors': []}

In [6]:
# save as a draft
pipeline.save(experiment_name = 'pipeline-with-azureml-module')

Name,Id,Details page,Pipeline type,Updated on,Created by,Tags
A dummy pipeline that trains multiple models and output the best one,8a3a8b98-8839-4bd6-a975-2c3527805460,Link,TrainingPipeline,"June 04, 2020 04:41 PM",Lisa Li (STC),azureml.Designer: true

0
azureml.Designer: true


In [7]:
# pipeline parameter can be override when submit pipeline
run = pipeline.submit(experiment_name='pipeline-with-azureml-module', tags={'mode':'module-SDK'}, pipeline_parameters={'input_data':test_data,'test_data':train_data})

Submitted PipelineRun 088c3aca-a567-4ca5-9777-7d9db952d911
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-with-azureml-module/runs/088c3aca-a567-4ca5-9777-7d9db952d911?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/lisal-dev/workspaces/lisal-amlservice


In [21]:
run.wait_for_completion()

cf63ca21460a14486acb91e51916831a5ba36379800b587c9e_d.txt': 'https://lisalamlservic0529017975.blob.core.windows.net/azureml/ExperimentRun/dcid.7957db2c-a37f-406d-90db-1b980ca00b67/azureml-logs/75_job_post-tvmps_7dbaf53c2ea4fccf63ca21460a14486acb91e51916831a5ba36379800b587c9e_d.txt?sv=2019-02-02&sr=b&sig=bbISXB9LfFjFXrmzWX2f6fS%2F0xqEjI%2FyJwKEIJ%2BM3eA%3D&st=2020-06-04T07%3A58%3A00Z&se=2020-06-04T16%3A08%3A00Z&sp=r', 'azureml-logs/process_info.json': 'https://lisalamlservic0529017975.blob.core.windows.net/azureml/ExperimentRun/dcid.7957db2c-a37f-406d-90db-1b980ca00b67/azureml-logs/process_info.json?sv=2019-02-02&sr=b&sig=c9%2FpyPG4Ln8nHQdDLs92vi%2BSubrnTKBBUSdlVSntNps%3D&st=2020-06-04T07%3A58%3A00Z&se=2020-06-04T16%3A08%3A00Z&sp=r', 'azureml-logs/process_status.json': 'https://lisalamlservic0529017975.blob.core.windows.net/azureml/ExperimentRun/dcid.7957db2c-a37f-406d-90db-1b980ca00b67/azureml-logs/process_status.json?sv=2019-02-02&sr=b&sig=UZ8XVesCikRVuDkBcd%2BXEOiUv29E9gh9mItXqpa6jcE%