# Targeting Direct Marketing Model Training ML Pipeline [manual]

---

Once you are familiar with using Amazon SageMaker built-in algorithm - [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to do [Targetting Direct Marketing model traing](./01_xgboost_direct_marketing_sagemaker.ipynb), we are going to build a ML Pipeline to automate the workflow with [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io). 

In the design:
* Preprocessing Job for feature engineering
* Model training with tuned hyperparameters
  * For example, you may collect the hyperparameters from HPO jobs with the best candidate.
* Hyperparameters optimization is optional

In the notebook, we are going to demo how to create the workflow step by step. Below is the related Step Functions workflow mapping to the ML pipeline with no HPO and using an trained model:

![Direct Marketing](./images/dm_ml_pipeline.png)

## ML Pipeline Creation
---
To create ML pipeline, we will use Step Functions Data Science SDK v2.0.0rc1, which is compatible with SageMaker SDK 2.x.

We will cover pipeline creation at below:
* Environment initialization
* Create ML Pipline with Step Functions Data Science SDK (v2.0.0rc1)

### Initialize Environment

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=2.0.0" # 2.0.0
!{sys.executable} -m pip install -qU "stepfunctions==2.0.0rc1"
!{sys.executable} -m pip install sagemaker-experiments

In [None]:
import boto3
import time
import re
import uuid

import stepfunctions
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps.sagemaker import *
from stepfunctions.steps.states import *
from stepfunctions.steps.compute import *
from stepfunctions.workflow import Workflow
from stepfunctions.steps import *
from IPython.display import display, HTML, Javascript

import sagemaker
from sagemaker import get_execution_role
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.analytics import ExperimentAnalytics

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker
from smexperiments.search_expression import Filter, Operator, SearchExpression

session = boto3.Session()
sm = session.client('sagemaker')
region = boto3.Session().region_name

role = get_execution_role()
sagemaker_session = sagemaker.Session()
bucket_name = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-xgboost-dm/manual_pipeline'
account_id = session.client('sts').get_caller_identity().get('Account')

Setup the workflow execution role. For the role arn, please refer to the output tab of the CloudFormation stack. 

In [None]:
# ssm = boto3.client('ssm')
# response = ssm.get_parameter(Name = "/directmarketing/ml_pipeline/workflow_execution_role")
# WORKFLOW_EXECUTION_ROLE = response['Parameter']['Value']

WORKFLOW_EXECUTION_ROLE = "arn:aws:iam::593380422482:role/StepFunctionsWorkflowExecutionRole"

In [None]:
if not WORKFLOW_EXECUTION_ROLE:
    raise Exception("ML Pipeline Parameters in System Manager is not setup properly. Please check whether the ml-pipeline stack has been created or not.")
else:
    print(f"Workflow execution IAM service role: {WORKFLOW_EXECUTION_ROLE}")

In [None]:
EXISTING_MODEL_URI = "s3://sagemaker-ap-southeast-2-593380422482/sagemaker/DEMO-xgboost-dm/output/xgboost-201120-0017-007-fc507e21/output/model.tar.gz"

In [None]:
from datetime import datetime
suffix = datetime.now().strftime("%y%m%d-%H%M")

## Prepare data

In [None]:
!wget https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip

In [None]:
import zipfile
with zipfile.ZipFile("./bank-additional.zip", 'r') as zip_ref:
    zip_ref.extractall(".")

### Create ML Pipline with Step Functions Data Science SDK (v2.0.0rc1)

---

#### Create Processing Step for data preprocessing

We will now create the [ProcessingStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/sagemaker.html#stepfunctions.steps.sagemaker.ProcessingStep) that will launch a SageMaker Processing Job.

In the processing job script `./pipeline/preprocessing.py`, the actions will be done:

* Feature engineering on the dataset
* Split training and test data 
* Store the data on S3 buckets.

Upload the preprocessing script.

In [None]:
PREPROCESSING_SCRIPT_LOCATION = "./pipeline/preprocessing.py"
input_code_uri = sagemaker_session.upload_data(
    PREPROCESSING_SCRIPT_LOCATION,
    bucket = bucket_name,
    key_prefix = f"{prefix}/preprocessing/code",
)

The `SKLearnProcessor` class lets you run a command inside the container, which you can use to run your own script.

In [None]:
preprocessing_processor = SKLearnProcessor(
    framework_version='0.20.0',
    role = role,
    instance_count = 1,
    instance_type = 'ml.m5.xlarge',
    max_runtime_in_seconds = 1200
)

S3 locations of preprocessing output with training, test & all features.

In [None]:
output_data = f"s3://{bucket_name}/{prefix}/preprocessing/output"
processing_input_path = f's3://{bucket_name}/{prefix}/preprocessing/input'

In [None]:
local_data_file = './bank-additional/bank-additional-full.csv'
sagemaker.s3.S3Uploader.upload(local_data_file, processing_input_path, sagemaker_session = sagemaker_session)
input_data = f'{processing_input_path}/bank-additional-full.csv'

This step will use `SKLearnProcess` as defined in previous steps along with the inputs and outputs objects that are defined in the below steps.

In [None]:
inputs = [
    ProcessingInput(
        input_name = "code",
        source = input_code_uri,
        destination = "/opt/ml/processing/input/code"
    ),
    ProcessingInput(
        input_name = "input_data",
        source = input_data,
        destination='/opt/ml/processing/input'
    )
]

outputs = [
    ProcessingOutput(
        output_name = "train_data",
        source = "/opt/ml/processing/output/train",
        destination = f"{output_data}/train"
    ),
    ProcessingOutput(
        output_name = "validation_data",
        source = "/opt/ml/processing/output/validation",
        destination = f"{output_data}/validation"
    ),
    ProcessingOutput(
        output_name = "test_data",
        source = "/opt/ml/processing/output/test",
        destination = f"{output_data}/test"
    )
]

In [None]:
# Workflow Execution parameters
execution_input = ExecutionInput(
    schema = {
        "PreprocessingJobName": str,
        "ToDoHPO": bool,
        "ToDoTraining": bool,
        "TrainingJobName": str,
        "TuningJobName": str,
        "ModelName": str,
        "EndpointName": str,
        "LambdaFunctionNameOfQueryEndpoint": str,
        "LambdaFunctionNameOfQueryHpoJob": str
    }
)

In [None]:
# Create Experiment
experiment = Experiment.create(
    experiment_name = f"xgboost-target-direct-marketing-{suffix}", 
    description = "Classification of target direct marketing", 
    sagemaker_boto_client = sm)
print(experiment)

In [None]:
trial_name = f"xgb-processing-job-{suffix}"
xgb_trial = Trial.create(
    trial_name = trial_name, 
    experiment_name = experiment.experiment_name,
    sagemaker_boto_client = sm,
)

`ProcessingStep` queries open air quality data for Sydney Australia with Amazon Athena. Especially, we are using our bucket to store query result. In case you setup default workgroup in Amazon Athena, please ensure to uncheck ***Override client-side settings***. 

In [None]:
processing_step = ProcessingStep(
    "DM Preprocessing Step",
    processor = preprocessing_processor,
    job_name = execution_input["PreprocessingJobName"],
    inputs = inputs,
    outputs = outputs,
    container_arguments = ["--data-file", "bank-additional-full.csv"],
    container_entrypoint = ["python3", "/opt/ml/processing/input/code/preprocessing.py"],
    experiment_config = {
        "TrialName": xgb_trial.trial_name,
        "TrialComponentDisplayName": "Processing",
    },
)

#### Create Hyperparameter Tuning Step

Setup tuning step and use choice state to decide whether we should do HPO.

In [None]:
tuning_output_path = f's3://{bucket_name}/{prefix}/tuning/output'
image_uri = sagemaker.image_uris.retrieve(region = region, framework='xgboost', version='latest')

tuning_estimator = sagemaker.estimator.Estimator(
    image_uri,
    role, 
    instance_count = 1, 
    instance_type = 'ml.m5.xlarge',
    output_path = tuning_output_path,
    sagemaker_session = sagemaker_session
)

#### Set static hyperparameters
The static parameters are the ones we know to be the best based on previously run HPO jobs, as well as the non-tunable parameters like prediction length and time frequency that are set according to requirements.

In [None]:
hpo = dict(
    max_depth = 5,
    eta = 0.2,
    gamma = 4,
    min_child_weight = 6,
    subsample = 0.8,
    silent = 0,
    objective = 'binary:logistic',
    num_round = 100
)

##### Set hyper-parameter ranges
The hyperparameter ranges define the parameters we want the runer to search across.

> Explore: Look in the [user guide](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html) for XGBoost.

In [None]:
hyperparameter_ranges = {
    'eta': ContinuousParameter(0, 1),
    'min_child_weight': ContinuousParameter(1, 10),
    'alpha': ContinuousParameter(0, 2),
    'max_depth': IntegerParameter(1, 10)
}

##### Create HPO tunning job step
Once we have the HPO tuner defined, we can define the tuning step.

In [None]:
tuning_estimator.set_hyperparameters(**hpo)

objective_metric_name = 'validation:auc'

hpo_tuner = HyperparameterTuner(
    tuning_estimator,
    objective_metric_name,
    hyperparameter_ranges,
    max_jobs=20,
    max_parallel_jobs=3
)

s3_input_train = sagemaker.inputs.TrainingInput(s3_data=f'{output_data}/train', content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=f'{output_data}/validation', content_type='csv')

hpo_data = dict(
    train = s3_input_train,
    validation = s3_input_validation
)

# as long as HPO is selected, wait for completion.
tuning_step = TuningStep(
    "HPO Step",
    tuner = hpo_tuner,
    job_name = execution_input["TuningJobName"],
    data = hpo_data,
    wait_for_completion = True
)

In [None]:
# lambda function
import zipfile
from sagemaker.s3 import S3Uploader
zip_name = 'query_hpo_job.zip'
lambda_source_code = './code/query_hpo_job.py'

zf = zipfile.ZipFile(zip_name, mode='w')
zf.write(lambda_source_code, arcname=lambda_source_code.split('/')[-1])
zf.close()
S3Uploader.upload(local_path = zip_name, 
                  desired_s3_uri = f"s3://{bucket_name}/{prefix}/code",
                  sagemaker_session = sagemaker_session)


In [None]:
lambda_client = boto3.client('lambda')

lambda_function_query_hpo_job = 'query_hpo_job'
response = lambda_client.create_function(
    FunctionName = lambda_function_query_hpo_job,
    Runtime = 'python3.7',
    Role = role,
    Handler = 'query_hpo_job.lambda_handler',
    Code={
        'S3Bucket': bucket_name,
        'S3Key': f'{prefix}/code/{zip_name}'
    },
    Description='Queries SageMaker HPO Job.',
    Timeout=15,
    MemorySize=128
)


In [None]:
query_hpo_job_lambda_step = LambdaStep(
    'Query HPO Job',
    parameters = {  
        "FunctionName": execution_input['LambdaFunctionNameOfQueryHpoJob'],
        'Payload':{
            "HpoJobName.$": "$$.Execution.Input['TuningJobName']"
        }
    }
)


In [None]:
# Create SNS Topic and Complete Subscription
sns = boto3.client('sns')
topic_name = 'dm-model-training-notification-topic'
response = sns.create_topic(Name = topic_name)

topic_arn = response['TopicArn']
email_id = 'tomlu@amazon.com'

response = sns.subscribe(
    TopicArn = topic_arn,
    Protocol = 'email',
    Endpoint = email_id
)

In [None]:
hpo_job_sns_step = SnsPublishStep(
    state_id = 'SNS Notification - HPO Job',
    parameters = {
        'TopicArn': topic_arn,
        'Message': query_hpo_job_lambda_step.output()['Payload']['bestTrainingJob']
    }
)

In [None]:
tuning_step.next(query_hpo_job_lambda_step)
query_hpo_job_lambda_step.next(hpo_job_sns_step)

#### Create Model Training Step

We create a DeepAR instance, which we will use to run a training job. This will be used to create a TrainingStep for the workflow.

##### Setup the training job step

In [None]:
training_output_path = f's3://{bucket_name}/{prefix}/training/output'
training_estimator = sagemaker.estimator.Estimator(
    image_uri,
    role, 
    instance_count = 1, 
    instance_type = 'ml.m5.xlarge',
    output_path = training_output_path,
    sagemaker_session = sagemaker_session
)

In [None]:
{'_tuning_objective_metric': 'validation:auc',
 'alpha': '1.9167548939755026',
 'eta': '0.2513705646042541',
 'gamma': '4',
 'max_depth': '4',
 'min_child_weight': '2.561240034842159',
 'num_round': '100',
 'objective': 'binary:logistic',
 'silent': '0',
 'subsample': '0.8'}

In [None]:
# best hyper parameters for tuning
hpo = dict(
    max_depth = 5,
    eta = 0.2,
    gamma = 4,
    min_child_weight = 6,
    subsample = 0.8,
    silent = 0,
    objective = 'binary:logistic',
    num_round = 100
)
training_estimator.set_hyperparameters(**hpo)

In [None]:
s3_input_train = sagemaker.inputs.TrainingInput(s3_data=f'{output_data}/train', content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=f'{output_data}/validation', content_type='csv')

training_data = dict(
    train = s3_input_train,
    validation = s3_input_validation
)

trial_name = f"xgb-training-job-{int(time.time())}"
xgb_trial = Trial.create(
    trial_name = trial_name, 
    experiment_name = experiment.experiment_name,
    sagemaker_boto_client = sm,
)

training_step = TrainingStep(
    "Training Step",
    estimator = training_estimator,
    data = training_data,
    job_name = execution_input["TrainingJobName"],
    wait_for_completion = True,
    experiment_config = {
        "TrialName": xgb_trial.trial_name,
        "TrialComponentDisplayName": "Training",
    },
)

#### Create Model Step

In the following cell, we define a model step that will create a model in Amazon SageMaker using the artifacts created during the TrainingStep. See  [ModelStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.ModelStep) in the AWS Step Functions Data Science SDK documentation to learn more.

The model creation step typically follows the training step. The Step Functions SDK provides the [get_expected_model](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.TrainingStep.get_expected_model) method in the TrainingStep class to provide a reference for the trained model artifacts. Please note that this method is only useful when the ModelStep directly follows the TrainingStep.

In [None]:
model_step = ModelStep(
    "Save Model",
    model = training_step.get_expected_model(),
    model_name = execution_input["ModelName"],
    result_path = "$.ModelStepResults"
)

# for deploying existing model
existing_model_name = f"dm-model-{uuid.uuid1().hex}"
existing_model = Model(
    model_data = EXISTING_MODEL_URI,
    image_uri = image_uri,
    role = role,
    name = existing_model_name
)
existing_model_step = ModelStep(
    "Existing Model",
    model = existing_model,
    model_name = execution_input["ModelName"]
)

#### Create Endpoint Configuration Step

> Endpoing Configuration Step won't be used in workflow as we demo Batch Transform in the lab.

In the following cell we create an endpoint configuration step. See [EndpointConfigStep](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/sagemaker.html#stepfunctions.steps.sagemaker.EndpointConfigStep) in the AWS Step Functions Data Science SDK documentation to learn more.

In [None]:
endpoint_config_step = EndpointConfigStep(
    "Create Model Endpoint Config",
    endpoint_config_name = execution_input["ModelName"],
    model_name = execution_input["ModelName"],
    initial_instance_count = 1,
    instance_type = 'ml.m5.xlarge'
)

#### Lambda function to check Endpoint Existed or not

In [None]:
import zipfile
from sagemaker.s3 import S3Uploader
zip_name = 'query_endpoint_existence.zip'
lambda_source_code = './code/query_endpoint_existence.py'



zf = zipfile.ZipFile(zip_name, mode='w')
zf.write(lambda_source_code, arcname=lambda_source_code.split('/')[-1])
zf.close()


S3Uploader.upload(local_path = zip_name, 
                  desired_s3_uri = f"s3://{bucket_name}/{prefix}/code",
                  sagemaker_session = sagemaker_session)

In [None]:
lambda_client = boto3.client('lambda')

lambda_function_query_endpoint = 'query_endpoint'
response = lambda_client.create_function(
    FunctionName = lambda_function_query_endpoint,
    Runtime = 'python3.7',
    Role = role,
    Handler = 'query_endpoint_existence.lambda_handler',
    Code={
        'S3Bucket': bucket_name,
        'S3Key': f'{prefix}/code/{zip_name}'
    },
    Description='Queries a SageMaker Endpoint existence.',
    Timeout=15,
    MemorySize=128
)

In [None]:
query_endpoint_lambda_step = LambdaStep(
    'Query Endpoint Existence',
    parameters = {  
        "FunctionName": execution_input['LambdaFunctionNameOfQueryEndpoint'],
        'Payload':{
            "EndpointName.$": "$$.Execution.Input['EndpointName']"
        }
    }
)

deployed_endpoint_completed_lambda_step = LambdaStep(
    'Query Deployed Endpoint Status',
    parameters = {  
        "FunctionName": execution_input['LambdaFunctionNameOfQueryEndpoint'],
        'Payload':{
            "EndpointName.$": "$$.Execution.Input['EndpointName']"
        }
    }
)

#### Create Endpoint Step

> Endpoint Step won't be used in workflow as we demo Batch Transform in the lab.

In the following cells, we create the Endpoint step to deploy the new model as a managed API endpoint, updating an existing SageMaker endpoint if our choice state is sucessful.

In [None]:
endpoint_creation_step = EndpointStep(
    "Create Endpoint",
    endpoint_name = execution_input["EndpointName"],
    endpoint_config_name = execution_input["ModelName"],
    update = False
)

In [None]:
endpoint_update_step = EndpointStep(
    "Update Endpoint",
    endpoint_name = execution_input["EndpointName"],
    endpoint_config_name = execution_input["ModelName"],
    update = True
)

In [None]:
check_endpoint_status_step = Choice('Endpoint is InService?')

endpoint_in_service_rule = ChoiceRule.StringEquals(variable = query_endpoint_lambda_step.output()['Payload']['endpoint_status'], value = 'InService')
check_endpoint_status_step.add_choice(rule = endpoint_in_service_rule, next_step = endpoint_update_step)

wait_step = Wait(state_id = f"Wait Until Endpoint becomes InService", seconds = 20)
wait_step.next(query_endpoint_lambda_step)

check_endpoint_status_step.default_choice(next_step = wait_step)

In [None]:
check_endpoint_existence_step = Choice(
    'Endpoint Existed?'
)

endpoint_existed_rule = ChoiceRule.BooleanEquals(variable = query_endpoint_lambda_step.output()['Payload']['endpoint_existed'], value = True)
check_endpoint_existence_step.add_choice(rule = endpoint_existed_rule, next_step = check_endpoint_status_step)

check_endpoint_existence_step.default_choice(next_step = endpoint_creation_step)

In [None]:
# check endpoint readiness
deployed_endpoint_updating_step = Choice('Deployed Endpoint Status Updating?')

wait_deployment_step = Wait(state_id = "Wait Until Endpoint Deployment Completed", seconds = 20)
wait_deployment_step.next(deployed_endpoint_completed_lambda_step)

deployed_endpoint_updating_rule = ChoiceRule.StringEquals(variable = deployed_endpoint_completed_lambda_step.output()['Payload']['endpoint_status'], value = 'Updating')
deployed_endpoint_updating_step.add_choice(rule = deployed_endpoint_updating_rule, next_step = wait_deployment_step)

final_step = Pass(state_id = 'Pass Step')

deployed_endpoint_updating_step.default_choice(next_step = final_step)

deployed_endpoint_completed_lambda_step.next(deployed_endpoint_updating_step)
endpoint_creation_step.next(deployed_endpoint_completed_lambda_step)
endpoint_update_step.next(deployed_endpoint_completed_lambda_step)

#### Setup Workflow Process

Create `Fail` state to mark the workflow failed in case any of the steps fail.

In [None]:
failed_state_sagemaker_pipeline_step_failure = Fail(
    "ML Workflow Failed", cause = "SageMakerPipelineStepFailed"
)

In [None]:
training_path = Chain([training_step, model_step, endpoint_config_step, query_endpoint_lambda_step, check_endpoint_existence_step])
deploy_existing_model_path = Chain([existing_model_step, endpoint_config_step, query_endpoint_lambda_step, check_endpoint_existence_step])

##### Choice Step Configuration

Now, we need to setup choice state for choose HPO / Training or not. See *Choice Rules* in the [AWS Step Functions Data Science SDK documentation](https://aws-step-functions-data-science-sdk.readthedocs.io) .

In [None]:
hpo_choice = Choice(
    "To do HPO?"
)
training_choice = Choice(
    "To do Model Training?"
)

# refer to execution input variable with required format - not user friendly.
hpo_choice.add_choice(
    rule = ChoiceRule.BooleanEquals(variable = "$$.Execution.Input['ToDoHPO']", value = True),
    next_step = tuning_step                 
)
hpo_choice.add_choice(
    rule = ChoiceRule.BooleanEquals(variable = "$$.Execution.Input['ToDoHPO']", value = False),
    next_step = training_choice
)
training_choice.add_choice(
    rule = ChoiceRule.BooleanEquals(variable = "$$.Execution.Input['ToDoTraining']", value = True),
    next_step = training_path
)
training_choice.add_choice(
    rule = ChoiceRule.BooleanEquals(variable = "$$.Execution.Input['ToDoTraining']", value = False),
    next_step = deploy_existing_model_path
)

##### Error Handling in the Workflow

In [None]:
catch_state_processing = Catch(
    error_equals = ["States.TaskFailed"],
    next_step = failed_state_sagemaker_pipeline_step_failure   
)
processing_step.add_catch(catch_state_processing)
tuning_step.add_catch(catch_state_processing)
training_step.add_catch(catch_state_processing)
model_step.add_catch(catch_state_processing)
endpoint_config_step.add_catch(catch_state_processing)
endpoint_creation_step.add_catch(catch_state_processing)
endpoint_update_step.add_catch(catch_state_processing)
existing_model_step.add_catch(catch_state_processing)
query_endpoint_lambda_step.add_catch(catch_state_processing)
deployed_endpoint_completed_lambda_step.add_catch(catch_state_processing)

#### Create and execute the Workflow

In [None]:
suffix = datetime.now().strftime("%y%m%d-%H%M")

# execution input parameter values
preprocessing_job_name = f"dm-preprocessing-{uuid.uuid1().hex}"
tuning_job_name = f"dm-tuning-{suffix}"
training_job_name = f"dm-training-{uuid.uuid1().hex}"
model_job_name = f"dm-model-{suffix}"
endpoint_job_name = f"dm-endpoint-manual"

In [None]:
# variables
WORKFLOW_NAME = "manaul-dm-ml-pipeline"
TO_DO_HPO = False
TO_DO_TRAINING = True

In [None]:
sfn_client = boto3.client('stepfunctions')

workflow_role_arn = f"arn:aws:states:{region}:{account_id}:stateMachine:{WORKFLOW_NAME}"

try:
    response = sfn_client.describe_state_machine(
        stateMachineArn = workflow_role_arn
    )
    existing_workflow = True
except: 
    existing_workflow = False
    

In [None]:
workflow_graph = Chain([processing_step, hpo_choice])
# workflow_graph = Chain([hpo_choice])
if existing_workflow:
    # To update SFN workflow, need to do 'attach' & 'update' together.
    workflow = Workflow.attach(state_machine_arn = workflow_role_arn)
    workflow.update(definition = workflow_graph, role = WORKFLOW_EXECUTION_ROLE) 
    # Wait for 10s so that the update is completed before executing workflow
    time.sleep(10)
else:
    workflow = Workflow(
        name = WORKFLOW_NAME,
        definition = workflow_graph,
        role = WORKFLOW_EXECUTION_ROLE
    )
    workflow.create()

In [None]:
# execute workflow
execution = workflow.execute(
    inputs = {
        "PreprocessingJobName": preprocessing_job_name,
        "ToDoHPO": TO_DO_HPO,
        "ToDoTraining": TO_DO_TRAINING,
        "TrainingJobName": training_job_name,
        "TuningJobName": tuning_job_name,
        "ModelName": model_job_name,
        "EndpointName": endpoint_job_name,
        "LambdaFunctionNameOfQueryEndpoint": lambda_function_query_endpoint,
        "LambdaFunctionNameOfQueryHpoJob": lambda_function_query_hpo_job
    }
)

In [None]:
def display_state_machine_advice(workflow_name, execution_id):
    display(HTML(f'''<br>The Step Function workflow "{workflow_name}" is now executing... 
            <br>To view state machine in the console click 
            <a target="_blank" href="https://{region}.console.aws.amazon.com/states/home?region={region}#/statemachines/view/arn:aws:states:ap-southeast-2:{account_id}:stateMachine:{workflow_name}">State Machine</a> 
            <br>To view execution in the console click 
            <a target="_blank" href="https://{region}.console.aws.amazon.com/states/home?region={region}#/executions/details/arn:aws:states:ap-southeast-2:{account_id}:execution:{workflow_name}:{execution_id}">Execution</a>.
        '''))


In [None]:
response = execution.describe()
execution_id = response['name']
# advice state machine console link
display_state_machine_advice(WORKFLOW_NAME, execution_id)

Run below cell multiple times to observe the workflow execution progress. Please note that the execution may take 15-20mins with using existing model for batch transform. 

In [None]:
execution.render_progress(portrait = True)