# Triggers

## Scheduled Trigger

Check created events in AmazonEventBridge->Events->Rules in your AWS console.

**Parameters for scheduled trigger**

In [11]:
rule_name = 'DailyKGPipelineTrigger'
pipeline_id = "Id26b29827-319a-45b4-8726-f33a37e5f22b"
pipeline_arn = "arn:aws:sagemaker:us-east-1:093729152554:pipeline/ckgqa-p-kiqtyrraeiec1631235879"
run_pipeline_role_arn = "arn:aws:iam::093729152554:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole"

Create a rule

In [8]:
import boto3

events = boto3.client('events')

# use the same name will update the trigger
events.put_rule(
    Name=rule_name,
    ScheduleExpression='rate(1 day)',
    State='DISABLED',
    Description='Daily re-run the knowledge graph generation pipeline',
    EventBusName='default'
)

{'RuleArn': 'arn:aws:events:us-east-1:093729152554:rule/DailyKGPipelineTrigger',
 'ResponseMetadata': {'RequestId': 'ee572c89-eee7-4e29-8dde-a0cad909eb8e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ee572c89-eee7-4e29-8dde-a0cad909eb8e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '79',
   'date': 'Tue, 14 Sep 2021 05:55:07 GMT'},
  'RetryAttempts': 0}}

Add the code pipeline as a target:

In [12]:
events.put_targets(
    Rule=rule_name,
    EventBusName='default',
    Targets=[
        {
            "Id": pipeline_id,
            "Arn": pipeline_arn,
            "RoleArn": run_pipeline_role_arn
        }
    ]
)

{'FailedEntryCount': 0,
 'FailedEntries': [],
 'ResponseMetadata': {'RequestId': '818aed89-52aa-4743-b5d5-71fcdc958ef1',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '818aed89-52aa-4743-b5d5-71fcdc958ef1',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '41',
   'date': 'Tue, 14 Sep 2021 06:06:56 GMT'},
  'RetryAttempts': 0}}

Check events with aws cli

In [23]:
!echo 'Rule description:'
!aws events describe-rule --name $rule_name
!echo 'Targets associated:'
!aws events list-targets-by-rule --rule $rule_name

Rule description:
{
    "Name": "DailyKGPipelineTrigger",
    "Arn": "arn:aws:events:us-east-1:093729152554:rule/DailyKGPipelineTrigger",
    "ScheduleExpression": "rate(1 day)",
    "State": "ENABLED",
    "Description": "Daily re-run the knowledge graph generation pipeline",
    "EventBusName": "default",
    "CreatedBy": "093729152554"
}
Targets associated:
{
    "Targets": [
        {
            "Id": "Id26b29827-319a-45b4-8726-f33a37e5f22b",
            "Arn": "arn:aws:sagemaker:us-east-1:093729152554:pipeline/ckgqa-p-kiqtyrraeiec1631235879",
            "RoleArn": "arn:aws:iam::093729152554:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole"
        }
    ]
}


## S3 Trigger

This step follows instructions from [Automate Pipeline BERT S3 Trigger](https://github.com/data-science-on-aws/workshop/blob/dcb1c95a612d0caf9217c19d639dca16261088bc/10_pipeline/stepfunctions/03_Automate_Pipeline_Train_and_Deploy_Reviews_BERT_TensorFlow_S3_Trigger.ipynb)

Parameters for S3 trigger:

In [88]:
watched_bucket = 'sm-nlp-data'
watched_prefix = 'ie-baseline/raw/DuIE_2_0.zip'
trail_name = 'WatchKGInputDataset'
s3_rule_name = 'S3-Trigger'

### 1. Attach required bucket policy to allow CloudTrail to watch it. 
    
    Check [permission for cloudtrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/create-s3-bucket-policy-for-cloudtrail.html?icmpid=docs_cloudtrail_console) for more detail.

    Note: you might want to keep original policy statements. But for the sake of simplicity, here I just overwrite the bucket policy. You can check the original policy with the following code.

In [65]:
import boto3
from pprint import pprint
# Retrieve the policy of the specified bucket
s3 = boto3.client('s3')
result = s3.get_bucket_policy(Bucket=watched_bucket)
print(result['Policy'])
# origin_statements = json.loads(result['Policy'])['Statement']
# pprint(origin_statements)

[{'Action': 's3:GetBucketAcl',
  'Effect': 'Allow',
  'Principal': {'Service': 'cloudtrail.amazonaws.com'},
  'Resource': 'arn:aws:s3:::sm-nlp-data',
  'Sid': 'AWSCloudTrailAclCheck20150319'},
 {'Action': 's3:PutObject',
  'Condition': {'StringEquals': {'s3:x-amz-acl': 'bucket-owner-full-control'}},
  'Effect': 'Allow',
  'Principal': {'Service': 'cloudtrail.amazonaws.com'},
  'Resource': 'arn:aws:s3:::sm-nlp-data/AWSLogs/093729152554/*',
  'Sid': 'AWSCloudTrailWrite20150319'}]


In [64]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSCloudTrailAclCheck20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:GetBucketAcl",
            "Resource": f"arn:aws:s3:::{watched_bucket}"
        },
        {
            "Sid": "AWSCloudTrailWrite20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:PutObject",
            "Resource": f"arn:aws:s3:::{watched_bucket}/AWSLogs/{account_id}/*",
            "Condition": {"StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}}
        }
    ]
}

bucket_policy = json.dumps(bucket_policy)
s3.put_bucket_policy(Bucket=watched_bucket, Policy=bucket_policy)

{'ResponseMetadata': {'RequestId': '2KFPW3RQ79HHX5JZ',
  'HostId': 'Y0L1klu2XzCYlcVREooZqBaK0GCyz9oRhIW+uIaOBJU0zqhlkLACioYDrHfCekYB2a1Hl3OXPbc=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'Y0L1klu2XzCYlcVREooZqBaK0GCyz9oRhIW+uIaOBJU0zqhlkLACioYDrHfCekYB2a1Hl3OXPbc=',
   'x-amz-request-id': '2KFPW3RQ79HHX5JZ',
   'date': 'Tue, 14 Sep 2021 07:26:55 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

### 2. Create a trail to log S3 events (check [`create_trail`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudtrail.html#CloudTrail.Client.create_trail)):

In [None]:
cloudtrail = boto3.client('cloudtrail')
cloudtrail.create_trail(
    Name=trail_name,
    S3BucketName=watched_bucket
)

### 3. Define event to with CloudTrail logging on S3 bucket

In [77]:
watched_bucket_arn = "arn:aws:s3:::{}/".format(watched_bucket)
event_selector = [
    { 
        "ReadWriteType": "WriteOnly", 
        "IncludeManagementEvents":True, 
        "DataResources": 
            [
                { 
                    "Type": "AWS::S3::Object", 
                    "Values": [watched_bucket_arn] 
                }
            ] 
    }
]

In [79]:
cloudtrail.put_event_selectors(
    TrailName=trail_name,
    EventSelectors=event_selector
)

{'TrailARN': 'arn:aws:cloudtrail:us-east-1:093729152554:trail/WatchKGInputDataset',
 'EventSelectors': [{'ReadWriteType': 'WriteOnly',
   'IncludeManagementEvents': True,
   'DataResources': [{'Type': 'AWS::S3::Object',
     'Values': ['arn:aws:s3:::sm-nlp-data/']}],
   'ExcludeManagementEventSources': []}],
 'ResponseMetadata': {'RequestId': '65276a56-58f4-4a63-a15d-d2c1e0e9d540',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '65276a56-58f4-4a63-a15d-d2c1e0e9d540',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '281',
   'date': 'Tue, 14 Sep 2021 07:42:22 GMT'},
  'RetryAttempts': 0}}

### 4. Create EventBridge rule that can trigger SageMaker pipeline.

In [84]:
pattern = {
    "source": ["aws.s3"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
        "eventSource": ["s3.amazonaws.com"],
        "eventName": ["PutObject", "CompleteMultipartUpload", "CopyObject"],
        "requestParameters": {"bucketName": ["{}".format(watched_bucket)]},
    },
}

pattern_json = json.dumps(pattern)
print(pattern_json)

{"source": ["aws.s3"], "detail-type": ["AWS API Call via CloudTrail"], "detail": {"eventSource": ["s3.amazonaws.com"], "eventName": ["PutObject", "CompleteMultipartUpload", "CopyObject"], "requestParameters": {"bucketName": ["sm-nlp-data"]}}}


In [86]:
response = events.put_rule(
    Name=s3_rule_name,
    EventPattern=pattern_json,
    State="ENABLED",
    Description="Triggers an event on S3 PUT",
    EventBusName="default",
)
print(response)

{'RuleArn': 'arn:aws:events:us-east-1:093729152554:rule/S3-Trigger', 'ResponseMetadata': {'RequestId': '8fa3c9b3-fb58-415d-90f7-d5bddc09c33f', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '8fa3c9b3-fb58-415d-90f7-d5bddc09c33f', 'content-type': 'application/x-amz-json-1.1', 'content-length': '67', 'date': 'Tue, 14 Sep 2021 07:48:47 GMT'}, 'RetryAttempts': 0}}


In [87]:
rule_arn = response["RuleArn"]
print(rule_arn)

arn:aws:events:us-east-1:093729152554:rule/S3-Trigger


### 5. Add pipeline as target to the rule

In [121]:
lambda_fn_name = 'invoke-kg-pipeline'

#### 5.1 Create an IAM role that enables event bridge to trigger pipeline

In [89]:
iam = boto3.client("iam")
iam_role_name_eventbridge = "EventBridge_Invoke_Pipeline"

Create AssumeRolePolicyDocument

In [143]:
from botocore.exceptions import ClientError

assume_role_policy_doc = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow", 
            "Principal": {"Service": "events.amazonaws.com"}, 
            "Action": "sts:AssumeRole"
        },
        {
          "Effect": "Allow",
          "Principal": {"Service": "lambda.amazonaws.com"},
          "Action": "sts:AssumeRole"
        }
    ],
}

try:
    iam_role_eventbridge = iam.create_role(
        RoleName=iam_role_name_eventbridge,
        AssumeRolePolicyDocument=json.dumps(assume_role_policy_doc),
        Description="DSOAWS EventBridge Role",
    )
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Role already exists")
    else:
        print("Unexpected error: %s" % e)

Get the Role ARN

In [144]:
role_eventbridge = iam.get_role(RoleName=iam_role_name_eventbridge)
iam_role_eventbridge_arn = role_eventbridge["Role"]["Arn"]
print(iam_role_eventbridge_arn)

arn:aws:iam::093729152554:role/EventBridge_Invoke_Pipeline


#### 5.2 Define and Create Eventbridge Policy

In [145]:
eventbridge_sfn_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {"Sid": "VisualEditor0", "Effect": "Allow", "Action": "states:StartExecution", "Resource": "*"}
    ],
}


try:
    policy_eventbridge_sfn = iam.create_policy(
        PolicyName="EventBridgeInvokePipeline", PolicyDocument=json.dumps(eventbridge_sfn_policy)
    )
    print("Done.")
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy already exists")
        policy_eventbridge_sfn_arn = f"arn:aws:iam::{account_id}:policy/EventBridgeInvokePipeline"
        iam.create_policy_version(
            PolicyArn=policy_eventbridge_sfn_arn, PolicyDocument=json.dumps(eventbridge_sfn_policy), SetAsDefault=True
        )
        print("Policy updated.")
    else:
        print("Unexpected error: %s" % e)

# Get policy ARN        
policy_eventbridge_sfn_arn = f"arn:aws:iam::{account_id}:policy/EventBridgeInvokePipeline"
print(policy_eventbridge_sfn_arn)

Policy already exists
Policy updated.
arn:aws:iam::093729152554:policy/EventBridgeInvokePipeline


#### 5.3 Attach Policy To Role

In [146]:
try:
    response = iam.attach_role_policy(PolicyArn=policy_eventbridge_sfn_arn, RoleName=iam_role_name_eventbridge)
    print("Done.")
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy is already attached. This is ok.")
    else:
        print("Unexpected error: %s" % e)

Done.


#### 5.4 Define a Lambda to execute pipeline

In [149]:
pipeline_name = 'KGPipeline1631239572'

In [98]:
lambda_script = '''
import json
import os
import time
import sys
from pip._internal import main

main(['install', '-I', '-q', 'boto3==1.16.47', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')

import boto3

region = boto3.Session().region_name
s3 = boto3.client('s3', region_name=region)
sm = boto3.client('sagemaker', region_name=region)

# Need to set the Pipeline Name as Lambda environment variable
PIPELINE_NAME = os.environ['PIPELINE_NAME']
print('Pipeline Name: {}'.format(PIPELINE_NAME))

timestamp = int(time.time())

def lambda_handler(event, context):
    print('boto3: {}'.format(boto3.__version__))
    print('Starting execution of pipeline {}...'.format(PIPELINE_NAME))

    response = sm.start_pipeline_execution(
        PipelineName=PIPELINE_NAME,
        PipelineExecutionDisplayName='trigger-{}'.format(timestamp),
        PipelineParameters=[
        ],
        PipelineExecutionDescription= PIPELINE_NAME,
        # ClientRequestToken='string'
    )

    print('Response: {}'.format(response))

    execution_arn=response['PipelineExecutionArn']
    print('Pipeline execution started with execution ARN: {}'.format(execution_arn))
    print('Done.')
'''

In [159]:
fn_bucket = watched_bucket
fn_key = 'ie-baseline/lambda/'
upload_name = 'lambda_function.zip'
!apt-get update
!apt-get install zip
!echo "$lambda_script" >> lambda_function.py
!zip $upload_name lambda_function.py
!aws s3 cp $upload_name s3://$fn_bucket/$fn_key
!rm lambda_function.py $upload_name

Hit:1 http://security.debian.org/debian-security buster/updates InRelease
Hit:2 http://deb.debian.org/debian buster InRelease
Hit:3 http://deb.debian.org/debian buster-updates InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
zip is already the newest version (3.0-11+b1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.
  adding: lambda_function.py (deflated 54%)
upload: ./lambda_function.zip to s3://sm-nlp-data/ie-baseline/lambda/lambda_function.zip


Use `!aws lambda delete-function --function-name invoke-kg-pipeline` to remove the pipeline if it already exists.

In [163]:
lmd = boto3.client('lambda')

response = lmd.create_function(
    FunctionName=lambda_fn_name,
    Runtime='python3.9',
    Role=iam_role_eventbridge_arn,
    Handler='lambda_handler',
    Code={
        'S3Bucket': fn_bucket,
        'S3Key': fn_key+upload_name,
    },
    Environment={
        'Variables': {
            'PIPELINE_NAME': pipeline_name
        }
    },
)
response

{'ResponseMetadata': {'RequestId': '9f68503c-af4c-4f41-b858-5e0e25883ec6',
  'HTTPStatusCode': 201,
  'HTTPHeaders': {'date': 'Tue, 14 Sep 2021 09:24:52 GMT',
   'content-type': 'application/json',
   'content-length': '965',
   'connection': 'keep-alive',
   'x-amzn-requestid': '9f68503c-af4c-4f41-b858-5e0e25883ec6'},
  'RetryAttempts': 0},
 'FunctionName': 'invoke-kg-pipeline',
 'FunctionArn': 'arn:aws:lambda:us-east-1:093729152554:function:invoke-kg-pipeline',
 'Runtime': 'python3.9',
 'Role': 'arn:aws:iam::093729152554:role/EventBridge_Invoke_Pipeline',
 'Handler': 'lambda_handler',
 'CodeSize': 737,
 'Description': '',
 'Timeout': 3,
 'MemorySize': 128,
 'LastModified': '2021-09-14T09:24:52.641+0000',
 'CodeSha256': 'GxYl3aHk/JI+8XmIDtjjHlfU2NkVUHoKZDPH5arI6+M=',
 'Version': '$LATEST',
 'Environment': {'Variables': {'PIPELINE_NAME': 'KGPipeline1631239572'}},
 'TracingConfig': {'Mode': 'PassThrough'},
 'RevisionId': 'a476ed84-e936-4ad6-8cca-6f4872999154',
 'State': 'Active',
 'Last

In [165]:
lambda_fn_arn = response['FunctionArn']
lambda_fn_arn

'arn:aws:lambda:us-east-1:093729152554:function:invoke-kg-pipeline'

#### 5.5 Set the Lambda function as a target of the EventBridge rule

In [None]:
import uuid

response = events.put_targets(
    Rule="S3-Trigger",
    EventBusName="default",
    Targets=[
        {
            "Id": 'Id'+str(uuid.uuid1()), 
            "Arn": lambda_fn_arn
        }
    ],
)

response

### 6. Trigger pipeline by writing to the watched location