# Build AML Pipeline with custom module and built-in module

In this tutorial you will learn how to use Designer built-in module and custom module together to create a pipeline.

1. Setup enrivonment - install module CLI and module/pipeline SDK
2. Register custom modules into your aml workspace using CLI
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2 and the built-in module available in AML designer

## Prerequisite
* Install Azure CLI, please follow [the Azure CLI installation instructions](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install.

## Setup environment
* Install Azure CLI AML extension which includes the _module_ command group
* Install Azure ML SDK including the APIs to work with _module_ and _pipeline_

In [None]:
# Uninstall azure-cli-ml (the `az ml` commands)
!az extension remove -n azure-cli-ml 

# Install local version of azure-cli-ml (which includes `az ml module` commands)
!az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063/azure_cli_ml-0.1.0.13766063-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063 --yes 

In [None]:
# Verify the availability of `az ml module` commands
!az ml module -h

In [None]:
# Install azureml-sdk with Pipeline, Module
# Important! After install succeed, need to restart kernel

%config IPCompleter.greedy=True 
!pip install azureml-pipeline-wrapper==0.1.0.13912229 --extra-index-url https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13912229 --user --upgrade

In [None]:
# you need to configure your ws information here

subscription_id = '74eccef0-4b8d-4f83-b5f9-fa100d155b22'
workspace_name = 'lisal-amlservice'
resource_group = 'lisal-dev'

# Specify available aml compute in workspace
pipeline_compute = "always-on-ds2v2"

## Register azureml module

You can manage AML module through [azure-cli-ml](https://aka.ms/moduledoc) or [ml.azure.com](https://ml.azure.com/). <br>

Module could be registered from:
- local path
- public Github url
- Azure DevOps build artifacts

In [None]:
# Configure your aml workspace 

!az login 
!az account set -s $subscription_id 
!az ml folder attach -w $workspace_name -g $resource_group 

In [None]:
# register a custom module

!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/eval.yaml

In [None]:
import json
from azureml.data.data_reference import DataReference
from azureml.core import Workspace, Run, Dataset, Datastore
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)

# get built-in module
select_column_func = Module.load(ws, namespace='azureml', name='Select Columns in Dataset')
clean_data_func = Module.load(ws, namespace='azureml', name='Clean Missing Data')
split_data_func = Module.load(ws, namespace='azureml', name='Split Data')
linear_regression_func = Module.load(ws, namespace='azureml', name='Linear Regression')
train_func = Module.load(ws, namespace='azureml', name='Train Model')
score_func = Module.load(ws, namespace='azureml', name='Score Model')
eval_func = Module.load(ws, namespace='azureml', name='Evaluate Model')

#get custom module
my_eval = Module.load(ws, namespace='microsoft.com/aml/samples', name='Evaluate')

#get global dataset
def get_global_dataset_by_path(ws: Workspace, name, path):
    global_dataset_datastore = Datastore(ws, name="azureml_globaldatasets")
    blob_input_data = DataReference(
        datastore=global_dataset_datastore,
        data_reference_name=name,
        path_on_datastore=path,
    )
    return blob_input_data

blob_input_data = get_global_dataset_by_path(ws, 'Automobile_price_data', 'GenericCSV/Automobile_price_data_(Raw)')

In [None]:
# define your pipeline

@dsl.pipeline(name = 'Designer Sample 1', 
              description = 'Regression - Automobile Price Prediction',
              default_compute_target = pipeline_compute)
def sample1_pipeline():
    select = select_column_func(
        Dataset=blob_input_data, 
        Select_Columns="{\"isFilter\":true,\"rules\":[{\"exclude\":false,\"ruleType\":\"AllColumns\"},"
                          "{\"exclude\":true,\"ruleType\":\"ColumnNames\",\"columns\":[\"normalized-losses\"]}]}"
    )   
    
    clean = clean_data_func(
        Dataset=select.outputs.Results_dataset,
        Columns_to_be_cleaned="{\"isFilter\":true,\"rules\":[{\"ruleType\":\"AllColumns\",\"exclude\":false}]}",
        Minimum_missing_value_ratio=0.0,
        Maximum_missing_value_ratio=1.0,
        Cleaning_mode='Remove entire row'
    )
    
    split = split_data_func(
        Dataset=clean.outputs.Cleaned_dataset,
        Splitting_mode='Split Rows',
        Fraction_of_rows_in_the_first_output_dataset=0.7,
        Randomized_split='True',
        Stratified_split='False',
        Stratification_key_column = "{\"isFilter\":true,\"rules\":"
                        "[{\"exclude\":false,\"ruleType\":\"ColumnNames\",\"columns\":[\"make\"]}]}",
        Regular_expression = '\\"column name" ^start',
        Relational_expression = '\\"column name" > 3'
    )
    
    algo = linear_regression_func(
        Solution_method='Ordinary Least Squares',
        L2_regularization_weight=0.001,
        Include_intercept_term='True',
        Random_number_seed=0
    )
    
    train = train_func(
        Dataset=split.outputs.Results_dataset1,
        Untrained_model=algo.outputs.Untrained_model,
        Label_column="{\"isFilter\":true,\"rules\":"
                        "[{\"exclude\":false,\"ruleType\":\"ColumnNames\",\"columns\":[\"price\"]}]}"
    )
    
    score = score_func(
        Trained_model=train.outputs.Trained_model,
        Dataset=split.outputs.Results_dataset2,
        Append_score_columns_to_output='True'
    )
    
    eval_1 = my_eval(
        scoring_result=score.outputs.Scored_dataset
    )
       
    eval_2 = eval_func(
        Scored_dataset = score.outputs.Scored_dataset
    ) 
    
    return {**eval_1.outputs, **eval_2.outputs}

In [None]:
# create a pipeline
pipeline = sample1_pipeline()

In [None]:
# validate pipeline and visualize the graph
pipeline.validate()

In [None]:
# save as a draft
pipeline.save(experiment_name = 'SDK-Created', id='ef7acf77-cd4b-4336-b4a2-c85536faa778')

In [None]:
run = pipeline.submit_run(experiment_name = 'SDK-Created')
run.wait_for_completion()