Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.png)

# Showcasing Dataset and Pipeline Parameter

This notebook demonstrateas the use of [**Dataset**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset(class)?view=azure-ml-py) and **Pipeline Parameters** in AML Pipeline. You will learn how Dataset and other parameters are submitted to AML Pipelines via **Pipeline Parameters**.

* [How to create a Pipeline with Pipeline Parameter](#index1)
* [How to submit a Pipeline with Pipeline Parameter](#index2)
* [How to submit a Pipeline and change the Pipeline Parameter value from the sdk](#index3)
* [How to submit a Pipeline and change the Pipeline Parameter value using a REST call](#index4)

## Azure Machine Learning and Pipeline SDK-specific imports

In [None]:
from azureml.core import Dataset
from azureml.core.compute import ComputeTarget, AmlCompute

from azureml.pipeline.wrapper import Module, Pipeline, PipelineRun, dsl

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Initialize Workspace

Login to azure with cli and set the default workspace using `az ml folder attach` command.

After this operation, the workspace could be retrived with the `Workspace.from_config()` for SDK usage.

In [None]:
# NOTE: Update the following information with your environment

SUBSCRIPTION_ID = '<your subscription ID>'
WORKSPACE_NAME = '<your workspace name>'
RESOURCE_GROUP_NAME = '<your resource group>'

In [None]:
!az login -o none 
!az account set -s $SUBSCRIPTION_ID 
!az ml folder attach -w $WORKSPACE_NAME -g $RESOURCE_GROUP_NAME 

In [None]:
from azureml.core import Workspace

workspace = Workspace.from_config()

## Retrieve or create an Azure Machine Learning compute target

In [None]:
from azureml.core.compute_target import ComputeTargetException

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
try:
    compute_target = ComputeTarget(workspace=workspace, name=cluster_name)
    print('Found existing compute target {}.'.format(cluster_name))
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size="Standard_D2_v2",
                                                           max_nodes=4)

    compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)

print("Azure Machine Learning Compute attached")

## Data and arguments setup 

We use 'titanic-cleaned.csv' file dataset as the sample dataset, use Select Column Module as the sample module to illustrate how a pipeline is created.

In [None]:
# get dataset
dataset_name = 'titanic-cleaned.csv'
titanic_dataset = Dataset.get_by_name(workspace, dataset_name)

# get 'Select Columns in Dataset' module function
select_column_module_func = Module.load(workspace, namespace='azureml', name='Select Columns in Dataset')

# define module parameters
select_columns = "{\"isFilter\":true,\"rules\":[{\"exclude\":false,\"ruleType\":\"AllColumns\"}]}"

# choose a name for the run history container in the workspace.
experiment_name = 'showcasing-Dataset-PipelineParameter'

<a id='index1'></a>

## Create a Pipeline with Pipeline Parameter

Create a module by assigning its parameters directly.

In [None]:
# define a module
select_column_module = select_column_module_func(dataset=titanic_dataset, select_columns=select_columns)

In [None]:
# define a pipeline
pipeline = Pipeline(nodes=[select_column_module],
                    workspace=workspace,
                    name="select-column-sample-pipeline",
                    description="pipeline for Dataset-PipelineParameter sample usage",
                    default_compute_target=cluster_name)

In [None]:
# submit pipeline
pipeline_run = pipeline.submit(experiment_name=experiment_name)
pipeline_run.wait_for_completion()

Create a pipeline using pipeline parameter. In the sample pipeline function below, pipeline parameters are:
* input
* _select_columns

In [None]:
# define a pipeline function
@dsl.pipeline(name='select-column-sample-pipeline',
              description='pipeline for GlobalDataset-PipelineParameter sample usage',
              default_compute_target=cluster_name)
def sample_pipeline(input, _select_columns):
    print('Datatype of input: {}'.format(type(input)))
    print('Datatype of _select_columns: {}'.format(type(_select_columns)))

    select_column_module = select_column_module_func(dataset=input,
                                                     select_columns=_select_columns)
    return select_column_module.outputs

In [None]:
# create a pipeline using pipeline parameter
# dsl.pipeline will transfer inputs parameters into PipelineParameter datatype automatically.
pipeline = sample_pipeline(input=titanic_dataset, _select_columns=select_columns)
print("Pipeline is created")

<a id='index2'></a>

## Submit a Pipeline with default Pipeline Parameters

Pipelines can be submitted with default values of Pipeline Parameters by not specifying any parameters.

In [None]:
pipeline_run = pipeline.submit(experiment_name=experiment_name)
print("Pipeline is submitted for execution")

In [None]:
pipeline_run

In [None]:
pipeline_run.wait_for_completion()

<a id='index3'></a>

## Submit a Pipeline and change the Pipeline Parameters value from the sdk

Or Pipelines can be submitted with values other than default ones by using `pipeline_parameters`. 

In [None]:
# define a pipeline with None input
pipeline = sample_pipeline(input=None, _select_columns=None)
# update pipeline parameters when submit using 'pipeline_parameters'
pipeline_run_with_params = pipeline.submit(experiment_name=experiment_name,
                                           pipeline_parameters={'input': titanic_dataset,
                                                                '_select_columns': select_columns})

In [None]:
pipeline_run_with_params.wait_for_completion()

In [None]:
pipeline_run_with_params

<a id='index4'></a>

## Submit a Pipeline and change the PipelineParameter value using a REST call

Let's published the pipeline to use the rest endpoint of the published pipeline. We publish a pipeline using **PipelineEndpoint**.

In [None]:
from azureml.pipeline.wrapper import PipelineEndpoint

pipeline = sample_pipeline(input=titanic_dataset, _select_columns=select_columns)

# publish pipeline to an endpoint named "PipelineParameterTest", and make it as default version.
pipeline_endpoint = PipelineEndpoint.publish(workspace=workspace, name="PipelineParameterTest",
                                             pipeline=pipeline, description="Test description Notebook", 
                                             set_as_default=True)

pipeline_endpoint

In [None]:
# fetch the published pipeline by default version
default_version = pipeline_endpoint.default_version
print('Default version of "PipelineParameterTest" endpoint is: {} \n'.format(default_version))

# get the list of published pipeline in PipelineParameterTest endpoint
published_pipeline_list = pipeline_endpoint.list_pipelines(active_only=True)
print('Published pipeline list:\n {}'.format(published_pipeline_list))

# fetch the published pipeline
published_pipeline = published_pipeline_list[default_version]
published_pipeline

In [None]:
from azureml.core.authentication import InteractiveLoginAuthentication
import requests

auth = InteractiveLoginAuthentication()
aad_token = auth.get_authentication_header()

rest_endpoint = published_pipeline.endpoint

print("You can perform HTTP POST on URL {} to trigger this pipeline".format(rest_endpoint))

In [None]:
def_blob_store = workspace.get_default_datastore()
print("Default datastore's name: {}".format(def_blob_store.name))

# specify the param when running the pipeline
response = requests.post(rest_endpoint, 
                         headers=aad_token, 
                         json={"ExperimentName": "MyRestPipeline",
                               "RunSource": "SDK",
                               "DataPathAssignments": {
                                   "input": { 
                                       "DataStoreName": def_blob_store.name
                                   }
                               },
                               "ParameterAssignments": {"_select_columns": select_columns}
                              }
                        )

In [None]:
try:
    response.raise_for_status()
except Exception:    
    raise Exception('Received bad response from the endpoint: {}\n'
                    'Response Code: {}\n'
                    'Headers: {}\n'
                    'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

In [None]:
published_pipeline_run_via_rest = PipelineRun(workspace.experiments["MyRestPipeline"], run_id)
published_pipeline_run_via_rest

In [None]:
published_pipeline_run_via_rest.wait_for_completion()

## Finish

Disable created PipelineEndpoint and PublishedPipeline in this notebook.

In [None]:
pipeline_endpoint.disable()
published_pipeline.disable()