# Pipeline for Question Understanding

This notebook tests every step of the question understanding pipeline. It has similar contents as `pipeline.py`.

Check the notebool `sagemaker-pipelines-project.ipynb` for an end-to-end pipeline.

## Create SageMaker Clients and Session

First, we create a new SageMaker Session in the current AWS region. We also acquire the role arn for the session.

This role arn should be the execution role arn that you set up in the Prerequisites section of this notebook.

In [1]:
from botocore.exceptions import ClientError

import os
import sagemaker
import logging
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = 'sm-nlp-data'
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name="sagemaker", region_name=region)

## Track the Pipeline as an `Experiment`

In [23]:
import time
timestamp = int(time.time())
pipeline_name = "qa-pipeline-{}".format(timestamp)
%store pipeline_name

Stored 'pipeline_name' (str)


In [4]:
from smexperiments.experiment import Experiment

pipeline_experiment = Experiment.create(
    experiment_name=pipeline_name,
    description="Extract intention and corresponding slots from questions",
    sagemaker_boto_client=sm,
)

pipeline_experiment_name = pipeline_experiment.experiment_name
print("Pipeline experiment name: {}".format(pipeline_experiment_name))

Pipeline experiment name: qa-pipeline-1631956944


In [5]:
%store pipeline_experiment_name

Stored 'pipeline_experiment_name' (str)


## Create the `Trial`

In [6]:
from smexperiments.trial import Trial

pipeline_trial = Trial.create(
    trial_name="trial-qa-{}".format(timestamp), experiment_name=pipeline_experiment_name, sagemaker_boto_client=sm
)
pipeline_trial_name = pipeline_trial.trial_name
print("Trial name: {}".format(pipeline_trial_name))

%store pipeline_trial_name

Trial name: trial-qa-1631956944
Stored 'pipeline_trial_name' (str)


## Define Parameters to Parametrize Pipeline Execution

We define Workflow Parameters by which we can parametrize our Pipeline and vary the values injected and used in Pipeline executions and schedules without having to modify the Pipeline definition.


In [2]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)

### Experiment Parameters

In [8]:
exp_name = ParameterString(
    name="ExperimentName",
    default_value=pipeline_experiment_name,
)

# Preprocess Step

Run following cell if you don't have training data on your s3 yet.

In [144]:
!aws s3 cp data/qa_raw.zip s3://$bucket/nlu/data/qa_raw.zip

In [9]:
raw_input_data_s3_uri = "s3://{}/nlu/data/qa_raw.zip".format(bucket)
processed_data_s3_uri = "s3://{}/nlu/data/processed/".format(bucket)

print('Input:', raw_input_data_s3_uri)
# check existence
!aws s3 ls $raw_input_data_s3_uri
print('Output:', processed_data_s3_uri)
!aws s3 ls $processed_data_s3_uri

Input: s3://sm-nlp-data/nlu/data/qa_raw.zip
2021-09-08 11:33:36      48140 qa_raw.zip
Output: s3://sm-nlp-data/nlu/data/processed/
                           PRE dev/
                           PRE schema/
                           PRE test/
                           PRE train/
2021-09-03 06:45:26          0 
2021-09-08 11:41:39        107 intent_label.txt
2021-09-08 11:41:39         90 slot_label.txt


Input parameters for preprocess step:

In [10]:
input_data = ParameterString(name="InputData", default_value=raw_input_data_s3_uri)
output_dir = ParameterString(name="OutputData", default_value=processed_data_s3_uri)
validation_split = ParameterString(name="ValidationSplit", default_value='0.1')
test_split = ParameterString(name="TestSplit", default_value='0.1')
processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(name="ProcessingInstanceType", default_value="ml.c5.2xlarge")

We create an instance of an `SKLearnProcessor` processor and we use that in our `ProcessingStep`.

In [11]:
from sagemaker.sklearn.processing import SKLearnProcessor

processor = SKLearnProcessor(
    framework_version="0.23-1",
    role=role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    env={"AWS_DEFAULT_REGION": region},
)

INFO:botocore.credentials:Credentials found in config file: ~/.aws/config
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker.image_uris:Defaulting to only available Python version: py3


In [12]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

processing_inputs = [
    ProcessingInput(
        input_name="raw",
        source=input_data,
        destination="/opt/ml/processing/qa/data/raw",
        s3_data_distribution_type="ShardedByS3Key",
    )
]

processing_outputs = [
    ProcessingOutput(
        output_name="train",
        destination = output_dir,
        s3_upload_mode="EndOfJob",
        source="/opt/ml/processing/qa/data/processed", # processed data in preprocessing should be saved to this folder
    )
]

processing_step = ProcessingStep(
    name="Processing",
    code="preprocess.py",
    processor=processor,
    inputs=processing_inputs,
    outputs=processing_outputs,
    job_arguments=[
        "--input-data",
        processing_inputs[0].destination, # /opt/ml/processing/ie/data/raw
        "--validation-split",
        validation_split,
        "--test-split",
        test_split
    ],
)

print(processing_step)

ProcessingStep(name='Processing', step_type=<StepTypeEnum.PROCESSING: 'Processing'>, depends_on=None)


### Test Preprocess Step

In [17]:
from sagemaker.workflow.pipeline import Pipeline
import json
from pprint import pprint

pipeline_name = f"test-preprocess-{int(time.time())}"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        input_data,
        output_dir,
        validation_split,
        test_split,
        processing_instance_count,
        processing_instance_type,
    ],
    steps=[processing_step],
    sagemaker_session=sess
)

definition = json.loads(pipeline.definition())
pprint(definition)

{'Metadata': {},
 'Parameters': [{'DefaultValue': 's3://sm-nlp-data/nlu/data/qa_raw.zip',
                 'Name': 'InputData',
                 'Type': 'String'},
                {'DefaultValue': 's3://sm-nlp-data/nlu/data/processed/',
                 'Name': 'OutputData',
                 'Type': 'String'},
                {'DefaultValue': '0.1',
                 'Name': 'ValidationSplit',
                 'Type': 'String'},
                {'DefaultValue': '0.1', 'Name': 'TestSplit', 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'ProcessingInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': 'ml.c5.2xlarge',
                 'Name': 'ProcessingInstanceType',
                 'Type': 'String'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
                              'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Arguments': {'AppSpecification': {'Containe

In [18]:
response = pipeline.create(role_arn=role)
pipeline_arn = response["PipelineArn"]
print(pipeline_arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-preprocess-1631957033


In [19]:
execution = pipeline.start()
print(execution.arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-preprocess-1631957033/execution/xrf8clzby5ru


In [20]:
execution_run = execution.describe()
pprint(execution_run)

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2021, 9, 18, 9, 24, 0, 355000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2021, 9, 18, 9, 24, 0, 355000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-preprocess-1631957033',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-preprocess-1631957033/execution/xrf8clzby5ru',
 'PipelineExecutionDisplayName': 'execution-1631957040498',
 'PipelineExecutionStatus': 'Executing',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '519',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sat, 18 Sep 2021 09:24:02 GMT',
                                      'x-amzn-requestid': '4c718bfb-9cb1-4024-b5a6-c80ff96692b5'},
                      'HTTPStatusCode': 200,
                      'RequestId': '4c718bfb-9cb1-4024-b5a6-c80ff96692b5',
               

# Train Step

In [20]:
train_instance_type = ParameterString(name="TrainInstanceType", default_value="ml.g4dn.4xlarge")
train_instance_count = ParameterInteger(name="TrainInstanceCount", default_value=1)
epochs = ParameterString(name="Epochs", default_value='10')
learning_rate = ParameterString(name="LearningRate", default_value='5e-5')
batch_size = ParameterString(name="BatchSize", default_value='64')
max_seq_len = ParameterString(name="MaxSeqLength", default_value='50')

In [21]:
from sagemaker.pytorch.estimator import PyTorch
from sagemaker.debugger import TensorBoardOutputConfig
import os

tensorboard_output_config = TensorBoardOutputConfig(
    s3_output_path='s3://sm-nlp-data/nlu/outputs/tb',
    container_local_output_path='/root/ckgqa-p-kiqtyrraeiec/sagemaker-ckgqa-p-kiqtyrraeiec-modelbuild/pipelines/question_ansering/output/tb'
)

# Filter out metrics from output
metric_definitions = [
    {'Name': 'eval:intent_acc', 'Regex': 'intent_acc = ([0-9\\.]+)'},
    {'Name': 'eval:loss', 'Regex': 'loss = ([0-9\\.]+)'},
    {'Name': 'eval:semantic_frame_acc', 'Regex': 'sementic_frame_acc = ([0-9\\.]+)'},
    {'Name': 'eval:slot_f1', 'Regex': 'slot_f1 = ([0-9\\.]+)'},
    {'Name': 'eval:slot_precision', 'Regex': 'slot_precision = ([0-9\\.]+)'},
    {'Name': 'eval:slot_recall', 'Regex': 'slot_recall = ([0-9\\.]+)'}
]

estimator = PyTorch(
    entry_point = 'train.py',
    role=role,
    instance_type=train_instance_type, # ml.c5.4xlarge, ml.g4dn.4xlarge
    instance_count=1,
    framework_version='1.8.1',
    py_version='py3',
    source_dir='./',
    output_path=f"s3://{bucket}/nlu/outputs",
    code_location=f"s3://{bucket}/nlu/source/train",
    metric_definitions = metric_definitions,
    hyperparameters={
        'task': 'naive',
        'model_type': 'bert',
        'train_batch_size': batch_size,
        'max_seq_len': max_seq_len,
        'learning_rate': learning_rate,
        'num_train_epochs': epochs
    }
)

Setup Pipeline Step Caching:

If a previous execution is found, a cache hit is created. Pipelines then propagates the values from the cache hit during execution, rather than recomputing the step. 

In [22]:
from sagemaker.workflow.steps import CacheConfig
cache_config = CacheConfig(enable_caching=True, expire_after="PT1H")

In [23]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

training_step = TrainingStep(
    name="Train",
    estimator=estimator,
    inputs={
        "train": TrainingInput(
#             s3_data=processing_step.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
            s3_data='s3://sm-nlp-data/nlu/data/processed/',
            content_type="application/json"
        ),
    },
    cache_config=cache_config,
)

print(training_step)

TrainingStep(name='Train', step_type=<StepTypeEnum.TRAINING: 'Training'>, depends_on=None)


### Test Train Step

In [24]:
from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name='test-train-step-'+str(time.time())[:10],
    parameters=[
        train_instance_type,
        train_instance_count,
        epochs,
        learning_rate,
        batch_size,
        max_seq_len
    ],
    steps=[training_step],
    sagemaker_session=sess,
)

definition = json.loads(pipeline.definition())
pprint(definition)

{'Metadata': {},
 'Parameters': [{'DefaultValue': 'ml.g4dn.4xlarge',
                 'Name': 'TrainInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'TrainInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': '10', 'Name': 'Epochs', 'Type': 'String'},
                {'DefaultValue': '5e-5',
                 'Name': 'LearningRate',
                 'Type': 'String'},
                {'DefaultValue': '64', 'Name': 'BatchSize', 'Type': 'String'},
                {'DefaultValue': '50',
                 'Name': 'MaxSeqLength',
                 'Type': 'String'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
                              'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Arguments': {'AlgorithmSpecification': {'EnableSageMakerMetricsTimeSeries': True,
                                                     'MetricDefinitions': [

In [25]:
response = pipeline.create(role_arn=role)
pipeline_arn = response["PipelineArn"]
print(pipeline_arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-train-step-1631971794


In [26]:
execution = pipeline.start()
print(execution.arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-train-step-1631971794/execution/gzixmcw9w2jh


In [27]:
execution_run = execution.describe()
pprint(execution_run)

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2021, 9, 18, 13, 29, 56, 414000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2021, 9, 18, 13, 29, 56, 414000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-train-step-1631971794',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/test-train-step-1631971794/execution/gzixmcw9w2jh',
 'PipelineExecutionDisplayName': 'execution-1631971796496',
 'PipelineExecutionStatus': 'Executing',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '417',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sat, 18 Sep 2021 13:29:55 GMT',
                                      'x-amzn-requestid': 'a6ef10bd-6c09-4121-97a9-1cc38684cd46'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a6ef10bd-6c09-4121-97a9-1cc38684cd46',
           

# Evaluation Step

The evaluation script:

* loads in the model
* reads in the test data
* issues a bunch of predictions against the test data
* builds a classification report, including accuracy
* saves the evaluation report to the evaluation directory

In [11]:
evaluation_instance_count = ParameterInteger(name="EvaluationInstanceCount", default_value=1)
evaluation_instance_type = ParameterString(name="EvaluationInstanceType", default_value="ml.c5.2xlarge")

In [28]:
trained_model_s3 = 's3://sm-nlp-data/nlu/outputs/pipelines-4w1w374i6osy-Train-Zm2plWqv9u/output/model.tar.gz'

In [29]:
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.properties import PropertyFile
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

evaluation_processor = SKLearnProcessor(
    role=role,
    framework_version="0.23-1",
    instance_type=evaluation_instance_type,
    instance_count=evaluation_instance_count,
    env={"AWS_DEFAULT_REGION": region},
    max_runtime_in_seconds=7200,
)

evaluation_report = PropertyFile(name="EvaluationReport", output_name="metrics", path="evaluation.json")
evaluation_step = ProcessingStep(
    name="EvaluateModel",
    processor=evaluation_processor,
    code="evaluate.py",
    inputs=[
        ProcessingInput(
            input_name='model',
#             source=dependencies['step_train'].properties.ModelArtifacts.S3ModelArtifacts,
            source=trained_model_s3,
            destination="/opt/ml/processing/input/model",
        ),
        ProcessingInput(
            input_name='data',
#             source=dependencies['step_process'].properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
            source='s3://sm-nlp-data/nlu/data/processed/',
            destination="/opt/ml/processing/input/data",
        ),
#         ProcessingInput(
#             input_name='source',
#             source=dependencies['step_train'].arguments['HyperParameters']['sagemaker_submit_directory'][1:-1],
#             destination="/opt/ml/processing/input/source/train"
#         )
    ],
    outputs=[
        ProcessingOutput(
            output_name="metrics", s3_upload_mode="EndOfJob", source="/opt/ml/processing/output/metrics/"
        ),
    ],
    job_arguments=[
        "--model_dir", "/opt/ml/processing/input/model/",
        "--task", "naive",
        "--model_type", "bert",
        "--output_data_dir", "/opt/ml/processing/output/metrics/",
        "--data_dir", "/opt/ml/processing/input/data"
    ],
    property_files=[evaluation_report],
)

### Test Evaluation Step

In [30]:
import time
import json
from pprint import pprint
from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name='qa-eval-step-'+str(time.time())[:10],
    parameters=[
        evaluation_instance_type,
        evaluation_instance_count,
    ],
    steps=[evaluation_step],
    sagemaker_session=sess,
)

definition = json.loads(pipeline.definition())
pprint(definition)

{'Metadata': {},
 'Parameters': [{'DefaultValue': 'ml.c5.2xlarge',
                 'Name': 'EvaluationInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'EvaluationInstanceCount',
                 'Type': 'Integer'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
                              'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Arguments': {'AppSpecification': {'ContainerArguments': ['--model_dir',
                                                                      '/opt/ml/processing/input/model/',
                                                                      '--task',
                                                                      'naive',
                                                                      '--model_type',
                                                                      'bert',
                                      

In [31]:
response = pipeline.create(role_arn=role)
pipeline_arn = response["PipelineArn"]
print(pipeline_arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-eval-step-1631971941


In [32]:
execution = pipeline.start()
print(execution.arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-eval-step-1631971941/execution/37k2woo7xag0


In [33]:
execution_run = execution.describe()
pprint(execution_run)

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2021, 9, 18, 13, 32, 25, 229000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2021, 9, 18, 13, 32, 25, 229000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-eval-step-1631971941',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-eval-step-1631971941/execution/37k2woo7xag0',
 'PipelineExecutionDisplayName': 'execution-1631971945335',
 'PipelineExecutionStatus': 'Executing',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '411',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sat, 18 Sep 2021 13:32:25 GMT',
                                      'x-amzn-requestid': 'b09a2d93-1f0a-43d9-a396-b8d3da22496e'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'b09a2d93-1f0a-43d9-a396-b8d3da22496e',
                 

# Register Model Step

In [170]:
model_approval_status = ParameterString(name="ModelApprovalStatus", default_value="PendingManualApproval")
deploy_instance_type = ParameterString(name="DeployInstanceType", default_value="ml.m4.xlarge")
deploy_instance_count = ParameterInteger(name="DeployInstanceCount", default_value=1)

In [171]:
model_package_group_name = f"Question-Understanding-Models-{timestamp}"
print(model_package_group_name)

Question-Understanding-Models-1631090660


In [172]:
from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.step_collections import RegisterModel

model_s3 = 's3://sm-nlp-data/nlu/outputs/pipelines-2qjkjdrs52ev-Train-S78ARZPZst/output/model.tar.gz'

model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri="{}/evaluation.json".format(
            evaluation_step.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
        ),
#         s3_uri='s3://sagemaker-us-east-1-093729152554/sagemaker-scikit-learn-2021-09-07-05-10-04-516/output/metrics/evaluation.json',
        content_type="application/json",
    )
)
step_register = RegisterModel(
    name="QARegisterModel",
    estimator=estimator,
#     model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    model_data=model_s3,
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.c5.4xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
)

# Create Model Step

In [173]:
from sagemaker.pytorch import PyTorchModel
from sagemaker.model import Model
from sagemaker.model import FrameworkModel
import time

model_s3 = 's3://sm-nlp-data/nlu/outputs/pipelines-2qjkjdrs52ev-Train-S78ARZPZst/output/model.tar.gz'
# model_s3 = 's3://sm-nlp-data/ie-baseline/outputs/pipelines-k6maywbudevl-Train-VXbSKBTZ1l/output/model.tar.gz'

# model_data = training_step.properties.ModelArtifacts.S3ModelArtifacts
model_name = "qa-model-{}".format(timestamp)

model = PyTorchModel(
    name=model_name,
#     model_data=dependencies['step_train'].properties.ModelArtifacts.S3ModelArtifacts,
    model_data=model_s3,
    framework_version='1.3.1',
    py_version='py3',
    role=role,
    entry_point='inference.py',
    source_dir='./',
    sagemaker_session=sess
)

In [174]:
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.steps import CreateModelStep

create_inputs = CreateModelInput(
    instance_type="ml.c5.4xlarge",
    accelerator_type="ml.eia1.medium",
)
step_create_model = CreateModelStep(
    name="CreateQAModel",
    model=model,
    inputs=create_inputs,
)

# Condition Step

In [175]:
min_intent_acc = ParameterFloat(name="MinIntentAccuracy", default_value=0.9)
min_slot_f1 = ParameterFloat(name="MinSlotF1", default_value=0.95)

In [176]:
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import (
    ConditionStep,
    JsonGet,
)

min_intent_acc_condition = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step=evaluation_step,
        property_file=evaluation_report,
        json_path="intent_acc",
    ),
    right=min_intent_acc,  # accuracy
)

min_slot_f1_condition = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step=evaluation_step,
        property_file=evaluation_report,
        json_path="slot_f1",
    ),
    right=min_slot_f1,  # accuracy
)

condition_step = ConditionStep(
    name="IntentAndSlotCondition",
    conditions=[min_intent_acc_condition, min_slot_f1_condition],
    if_steps=[step_register, step_create_model],  # success, continue with model registration
    else_steps=[],  # fail, end the pipeline
)

See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


## Test condition step

In [177]:
pipeline = Pipeline(
    name='qa-condition-step-'+str(time.time())[:10],
    parameters=[
        evaluation_instance_type,
        evaluation_instance_count,
        
        model_approval_status,
        deploy_instance_type,
        deploy_instance_count,
        
        min_intent_acc,
        min_slot_f1
    ],
    steps=[evaluation_step, condition_step],
    sagemaker_session=sess,
)

definition = json.loads(pipeline.definition())
pprint(definition)



{'Metadata': {},
 'Parameters': [{'DefaultValue': 'ml.c5.2xlarge',
                 'Name': 'EvaluationInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'EvaluationInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': 'PendingManualApproval',
                 'Name': 'ModelApprovalStatus',
                 'Type': 'String'},
                {'DefaultValue': 'ml.m4.xlarge',
                 'Name': 'DeployInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'DeployInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': 0.9,
                 'Name': 'MinIntentAccuracy',
                 'Type': 'Float'},
                {'DefaultValue': 0.95, 'Name': 'MinSlotF1', 'Type': 'Float'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
                              'TrialName': {'

## Test All Steps

In [178]:
pipeline = Pipeline(
    name='qa-pipeline-'+str(time.time())[:10],
    parameters=[
        evaluation_instance_type,
        evaluation_instance_count,
        
        model_approval_status,
        deploy_instance_type,
        deploy_instance_count,
        
        min_intent_acc,
        min_slot_f1
    ],
    steps=[evaluation_step, condition_step],
    sagemaker_session=sess,
)

definition = json.loads(pipeline.definition())
pprint(definition)



{'Metadata': {},
 'Parameters': [{'DefaultValue': 'ml.c5.2xlarge',
                 'Name': 'EvaluationInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'EvaluationInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': 'PendingManualApproval',
                 'Name': 'ModelApprovalStatus',
                 'Type': 'String'},
                {'DefaultValue': 'ml.m4.xlarge',
                 'Name': 'DeployInstanceType',
                 'Type': 'String'},
                {'DefaultValue': 1,
                 'Name': 'DeployInstanceCount',
                 'Type': 'Integer'},
                {'DefaultValue': 0.9,
                 'Name': 'MinIntentAccuracy',
                 'Type': 'Float'},
                {'DefaultValue': 0.95, 'Name': 'MinSlotF1', 'Type': 'Float'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
                              'TrialName': {'

In [179]:
response = pipeline.create(role_arn=role)
pipeline_arn = response["PipelineArn"]
print(pipeline_arn)



arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-pipeline-1631090731


In [180]:
execution = pipeline.start()
print(execution.arn)

arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-pipeline-1631090731/execution/xr9k8mchhvc3


In [181]:
from pprint import pprint

execution_run = execution.describe()
pprint(execution_run)

{'CreatedBy': {},
 'CreationTime': datetime.datetime(2021, 9, 8, 8, 46, 31, 517000, tzinfo=tzlocal()),
 'LastModifiedBy': {},
 'LastModifiedTime': datetime.datetime(2021, 9, 8, 8, 46, 31, 517000, tzinfo=tzlocal()),
 'PipelineArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-pipeline-1631090731',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:093729152554:pipeline/qa-pipeline-1631090731/execution/xr9k8mchhvc3',
 'PipelineExecutionDisplayName': 'execution-1631090791616',
 'PipelineExecutionStatus': 'Executing',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '409',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 08 Sep 2021 08:46:30 GMT',
                                      'x-amzn-requestid': '70932247-e335-46e6-8f28-8d7b067780db'},
                      'HTTPStatusCode': 200,
                      'RequestId': '70932247-e335-46e6-8f28-8d7b067780db',
                      '