# Triggers

## Scheduled Trigger

Check created events in AmazonEventBridge->Events->Rules in your AWS console.

**Parameters for scheduled trigger**

You can list all your pipelines with: </br>
`$ aws sagemaker --list-pipelines`

If you already know your pipeline name, you can get its detail information with: </br>
`$ aws sagemaker --list-pipelines --pipeline-name [your-pipeline-name]`

For `pipeline_id`, just use a memorable and unique string.

For `run_pipeline_role_arn`, you can try get one with `sagemaker.get_execution_role()`

In [19]:
rule_name = 'DailyKGPipelineTrigger'
pipeline_name = 'KGPipeline1631789947'
pipeline_id = 'Id26b29827-319a-45b4-8726-f33a37e5f22b' # randomly assigned

Run following code to get `pipeline_arn` and `run_pipeline_role_arn` if you only know the pipeline name.

In [20]:
import boto3
from pprint import pprint

sm = boto3.client('sagemaker')

pipeline = sm.describe_pipeline(PipelineName=pipeline_name)
pipeline_arn = pipeline['PipelineArn']
run_pipeline_role_arn = pipeline['RoleArn']

print(f"pipeline_arn:\t\t {pipeline_arn}")
print(f"run_pipeline_role_arn:\t {run_pipeline_role_arn}")

pipeline_arn:		 arn:aws:sagemaker:us-east-1:093729152554:pipeline/kgpipeline1631789947
run_pipeline_role_arn:	 arn:aws:iam::093729152554:role/service-role/AWSNeptuneNotebookRole-NepTestRole


Create a rule that rules every 1 day:

In [8]:
import boto3

events = boto3.client('events')

# use the same name will update the trigger
events.put_rule(
    Name=rule_name,
    ScheduleExpression='rate(1 day)',
    State='DISABLED',
    Description='Daily re-run the knowledge graph generation pipeline',
    EventBusName='default'
)

{'RuleArn': 'arn:aws:events:us-east-1:093729152554:rule/DailyKGPipelineTrigger',
 'ResponseMetadata': {'RequestId': 'ee572c89-eee7-4e29-8dde-a0cad909eb8e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ee572c89-eee7-4e29-8dde-a0cad909eb8e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '79',
   'date': 'Tue, 14 Sep 2021 05:55:07 GMT'},
  'RetryAttempts': 0}}

Add the code pipeline as a target:</br>
Check docs for [put_targets](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/events.html#EventBridge.Client.put_targets)

In [29]:
events.put_targets(
    Rule=rule_name,
    EventBusName='default',
    Targets=[
        {
            "Id": pipeline_id,
            "Arn": pipeline_arn,
            "RoleArn": run_pipeline_role_arn
        }
    ]
)

{'FailedEntryCount': 0,
 'FailedEntries': [],
 'ResponseMetadata': {'RequestId': 'f2dc35f0-fb5b-4a08-9bf2-db697467c0cb',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f2dc35f0-fb5b-4a08-9bf2-db697467c0cb',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '41',
   'date': 'Fri, 17 Sep 2021 10:51:46 GMT'},
  'RetryAttempts': 0}}

Check events with aws cli

In [23]:
!echo 'Rule description:'
!aws events describe-rule --name $rule_name
!echo 'Targets associated:'
!aws events list-targets-by-rule --rule $rule_name

Rule description:
{
    "Name": "DailyKGPipelineTrigger",
    "Arn": "arn:aws:events:us-east-1:093729152554:rule/DailyKGPipelineTrigger",
    "ScheduleExpression": "rate(1 day)",
    "State": "ENABLED",
    "Description": "Daily re-run the knowledge graph generation pipeline",
    "EventBusName": "default",
    "CreatedBy": "093729152554"
}
Targets associated:
{
    "Targets": [
        {
            "Id": "Id26b29827-319a-45b4-8726-f33a37e5f22b",
            "Arn": "arn:aws:sagemaker:us-east-1:093729152554:pipeline/ckgqa-p-kiqtyrraeiec1631235879",
            "RoleArn": "arn:aws:iam::093729152554:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole"
        }
    ]
}


## S3 Trigger

This step follows instructions from [Automate Pipeline BERT S3 Trigger](https://github.com/data-science-on-aws/workshop/blob/dcb1c95a612d0caf9217c19d639dca16261088bc/10_pipeline/stepfunctions/03_Automate_Pipeline_Train_and_Deploy_Reviews_BERT_TensorFlow_S3_Trigger.ipynb)

Also, check this tutorial: [Log Amazon S3 object-level operations using EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-log-s3-data-events.html)

Object-level **data events** are not logged by default, check these docs to know more about data events and **management events**:</br>
- [What are data events](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-concepts.html#cloudtrail-concepts-data-events)
- [How do you log management and data events](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-concepts.html#understanding-event-selectors)
- [Logging data events for trails](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html#logging-data-events)
- [Examples: Logging data events for Amazon S3 objects](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html#logging-data-events-examples)


Parameters for S3 trigger:

In [22]:
watched_bucket = 'sm-nlp-data'
watched_prefix = 'ie-baseline/raw/DuIE_2_0.zip' # set watchted prefix to '' if you want to watch the whole bucket
trail_name = 'WatchKGInputDataset'
s3_rule_name = 'KG-S3-Trigger'
s3_rule_description = 'Run knowledge graph generation pipeline every time new data uploaded to specified location.'

# We use default_bucket to save logs as it's better to store logs in a different bucket to avoid spin triggering.
import sagemaker
default_bucket = sagemaker.session.Session().default_bucket()
default_bucket

'sagemaker-us-east-1-093729152554'

### 1. Attach policy to S3 bucket to receive the log files 


    
Check [permission for cloudtrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/create-s3-bucket-policy-for-cloudtrail.html?icmpid=docs_cloudtrail_console) for more detail.

You can delete bucket policy to revoke permission using:</br>
`$ aws s3api delete-bucket-policy --bucket [your-bucket]`

Note: you might want to keep original policy statements. But for the sake of simplicity, here I just overwrite the bucket policy. You can check the original policy with the following code.

In [9]:
import boto3
import json
from pprint import pprint

# Retrieve the original policy of the specified bucket (this would be overwritten by new policies)
s3 = boto3.client('s3')
result = s3.get_bucket_policy(Bucket=default_bucket)
pprint(json.loads(result['Policy']))

{'Statement': [{'Action': 's3:GetBucketAcl',
                'Effect': 'Allow',
                'Principal': {'Service': 'cloudtrail.amazonaws.com'},
                'Resource': 'arn:aws:s3:::sagemaker-us-east-1-093729152554',
                'Sid': 'AWSCloudTrailAclCheck20150319'},
               {'Action': 's3:PutObject',
                'Condition': {'StringEquals': {'s3:x-amz-acl': 'bucket-owner-full-control'}},
                'Effect': 'Allow',
                'Principal': {'Service': 'cloudtrail.amazonaws.com'},
                'Resource': 'arn:aws:s3:::sagemaker-us-east-1-093729152554/AWSLogs/093729152554/*',
                'Sid': 'AWSCloudTrailWrite20150319'}],
 'Version': '2012-10-17'}


In [70]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
log_bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSCloudTrailAclCheck20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:GetBucketAcl",
            "Resource": f"arn:aws:s3:::{default_bucket}"
        },
        {
            "Sid": "AWSCloudTrailWrite20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:PutObject",
            "Resource": f"arn:aws:s3:::{default_bucket}/AWSLogs/{account_id}/*",
            "Condition": {"StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}}
        }
    ]
}

log_bucket_policy = json.dumps(log_bucket_policy)
s3.put_bucket_policy(Bucket=default_bucket, Policy=log_bucket_policy)

{'ResponseMetadata': {'RequestId': 'S6S2R0GVZ6QY2JZ4',
  'HostId': 'h66G1+6NrZ04n97NaCIkB8Lj2tZMAf+dhJ+7FQps9HCtOSmo0qHAGydgXBuXBf+LnlJoZlqXIug=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'h66G1+6NrZ04n97NaCIkB8Lj2tZMAf+dhJ+7FQps9HCtOSmo0qHAGydgXBuXBf+LnlJoZlqXIug=',
   'x-amz-request-id': 'S6S2R0GVZ6QY2JZ4',
   'date': 'Fri, 17 Sep 2021 12:54:45 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

### 2. Create a trail to log S3 events (check [`create_trail`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudtrail.html#CloudTrail.Client.create_trail)):

A trail captures API calls and related events in your account and then delivers the log files to an S3 bucket that you specify.

You can delete a previously defined trail by running (the default trail name we defined is `WatchKGInputDataset`):</br>
`$ aws cloudtrail delete-trail --name [your-trail-name]`

In [24]:
#!aws cloudtrail delete-trail --name $trail_name

In [25]:
cloudtrail = boto3.client('cloudtrail')

In [26]:
cloudtrail.create_trail(
    Name=trail_name,
    S3BucketName=default_bucket, # this specifies the bucket to save logs
    TagsList=[
        {
            'Key': 'event',
            'Value': 'kg-dataset-update'
        }
    ]
)

{'Name': 'WatchKGInputDataset',
 'S3BucketName': 'sagemaker-us-east-1-093729152554',
 'IncludeGlobalServiceEvents': True,
 'IsMultiRegionTrail': False,
 'TrailARN': 'arn:aws:cloudtrail:us-east-1:093729152554:trail/WatchKGInputDataset',
 'LogFileValidationEnabled': False,
 'IsOrganizationTrail': False,
 'ResponseMetadata': {'RequestId': '1209d4a6-0c46-4651-a80e-4fafdb2fb025',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '1209d4a6-0c46-4651-a80e-4fafdb2fb025',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '283',
   'date': 'Sun, 26 Sep 2021 04:30:41 GMT'},
  'RetryAttempts': 0}}

### 3. Define event selector for CloudTrail

Use event selectors or advanced event selectors to specify management and data event settings for your trail. For each trail, if the event matches any event selector, the trail processes and logs the event.

Learn more about event selectors: [put_event_selector](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudtrail.html#CloudTrail.Client.put_event_selectors)

In [27]:
watched_s3_resource_arn = "arn:aws:s3:::{}/{}".format(watched_bucket, watched_prefix)
event_selector = [
    { 
        "ReadWriteType": "WriteOnly", 
        "IncludeManagementEvents":False, 
        "DataResources": 
            [
                { 
                    "Type": "AWS::S3::Object", 
                    "Values": [watched_s3_resource_arn] 
                }
            ]
    }
]

In [28]:
cloudtrail.put_event_selectors(
    TrailName=trail_name,
    EventSelectors=event_selector
)

{'TrailARN': 'arn:aws:cloudtrail:us-east-1:093729152554:trail/WatchKGInputDataset',
 'EventSelectors': [{'ReadWriteType': 'WriteOnly',
   'IncludeManagementEvents': False,
   'DataResources': [{'Type': 'AWS::S3::Object',
     'Values': ['arn:aws:s3:::sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip']}],
   'ExcludeManagementEventSources': []}],
 'ResponseMetadata': {'RequestId': 'ffe10f7f-83d6-40b9-8af4-c871e2e01dff',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ffe10f7f-83d6-40b9-8af4-c871e2e01dff',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '310',
   'date': 'Sun, 26 Sep 2021 04:32:00 GMT'},
  'RetryAttempts': 0}}

In [29]:
cloudtrail.start_logging(
    Name=trail_name
)

{'ResponseMetadata': {'RequestId': 'cedcc533-62a8-434a-b163-b00b5afd6419',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'cedcc533-62a8-434a-b163-b00b5afd6419',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '2',
   'date': 'Sun, 26 Sep 2021 04:32:06 GMT'},
  'RetryAttempts': 0}}

### 4. Create EventBridge rule that can trigger SageMaker pipeline.

**A Question:**

How is this step associated with the `event_selector` defined above?

In my understanding, CloudTrail puts an event to the `default` event bus, this matches the pattern defined by the EventBridge rule. Therefore, the EventBridge rule is triggered to invoke further steps.

A sample of original logged data event by S3 looks like this:
```json
{
    "eventVersion": "1.08",
    "userIdentity": {
        "accountId": "093729152554",
        "userName": "edXSageMakerUser",
        ...
    },
    "eventTime": "2021-09-26T04:35:07Z",
    "eventSource": "s3.amazonaws.com",
    "eventName": "PutObject",
    "readOnly": false,
    "resources": [
        {
            "type": "AWS::S3::Object",
            "ARN": "arn:aws:s3:::sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip"
        },
        {
            "accountId": "093729152554",
            "type": "AWS::S3::Bucket",
            "ARN": "arn:aws:s3:::sm-nlp-data"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": false,
    "recipientAccountId": "093729152554",
    "eventCategory": "Data"
}
```

A call send to EventBridge looks like this (This is what the pattern should match)</br>
I followed this [tutorial](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-log-s3-data-events.html) to create a lambda function to capture this event json data
```json
{
    "version": "0",
    "id": "6ac6b633-d9ff-7bb3-5846-b5e40f81ea33",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.s3",
    "account": "093729152554",
    "time": "2021-09-26T06:39:17Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventVersion": "1.08",
        "userIdentity": {
            "type": "IAMUser",
            "accountId": "093729152554",
            ...
        },
        "eventTime": "2021-09-26T06:39:17Z",
        "eventSource": "s3.amazonaws.com",
        "eventName": "PutObject",
        "awsRegion": "us-east-1",
        "sourceIPAddress": "52.201.136.146",
        "requestParameters": {
            "bucketName": "sm-nlp-data",
            "Host": "sm-nlp-data.s3.amazonaws.com",
            "key": "ie-baseline/raw/DuIE_2_0.zip"
        },
        ...
        "eventID": "10cdc61c-7934-41a3-8a2b-1be8e0a7346c",
        "readOnly": false,
        "resources": [
            {
                "type": "AWS::S3::Object",
                "ARN": "arn:aws:s3:::sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip"
            },
            {
                "accountId": "093729152554",
                "type": "AWS::S3::Bucket",
                "ARN": "arn:aws:s3:::sm-nlp-data"
            }
        ],
        "eventType": "AwsApiCall",
        "managementEvent": false,
        "recipientAccountId": "093729152554",
        "eventCategory": "Data"
    }
}
```

We create a pattern to filter events (know more at [Event Patterns](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-patterns.html)):

In [16]:
pattern = {
    "source": ["aws.s3"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
        "eventSource": ["s3.amazonaws.com"],
        "eventName": ["PutObject", "CompleteMultipartUpload", "CopyObject"],
        "requestParameters": {
            "bucketName": ["{}".format(watched_bucket)],
            "key": [watched_prefix]
        },
    },
}

pattern_json = json.dumps(pattern)
pprint(pattern)

{'detail': {'eventName': ['PutObject', 'CompleteMultipartUpload', 'CopyObject'],
            'eventSource': ['s3.amazonaws.com'],
            'requestParameters': {'bucketName': ['sm-nlp-data'],
                                  'key': ['ie-baseline/raw/DuIE_2_0.zip']}},
 'detail-type': ['AWS API Call via CloudTrail'],
 'source': ['aws.s3']}


`put_rule` API reference: [put_rule](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/events.html#EventBridge.Client.put_rule)

In [17]:
import boto3

events = boto3.client('events')

response = events.put_rule(
    Name=s3_rule_name,
    EventPattern=pattern_json,
    State="ENABLED",
    Description=s3_rule_description,
    EventBusName="default",
    Tags=[
        {
            'Key': 'event',
            'Value': 'kg-dataset-update'
        },
    ],
)
response

{'RuleArn': 'arn:aws:events:us-east-1:093729152554:rule/S3-Trigger',
 'ResponseMetadata': {'RequestId': '207ed37d-a8a6-426d-9d39-0b5435ca0b7f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '207ed37d-a8a6-426d-9d39-0b5435ca0b7f',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '67',
   'date': 'Sun, 26 Sep 2021 04:19:50 GMT'},
  'RetryAttempts': 0}}

In [18]:
rule_arn = response["RuleArn"]
print(rule_arn)

arn:aws:events:us-east-1:093729152554:rule/S3-Trigger


### 5. Add pipeline as target to the rule

**Parameters:**

In [23]:
lambda_fn_name = 'invoke-kg-pipeline'

### Option 1: Directly set the CodePipeline as the target event (least operational overhead)

In [129]:
response = events.put_targets(
    Rule=s3_rule_name,
    EventBusName='default',
    Targets=[
        {
            "Id": pipeline_id,
            "Arn": pipeline_arn,
            "RoleArn": run_pipeline_role_arn
        }
    ]
)
response

{'FailedEntryCount': 0,
 'FailedEntries': [],
 'ResponseMetadata': {'RequestId': 'b4775a1a-c333-4deb-b9ae-f2c0526c6eb2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'b4775a1a-c333-4deb-b9ae-f2c0526c6eb2',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '41',
   'date': 'Sat, 18 Sep 2021 02:46:32 GMT'},
  'RetryAttempts': 0}}

### Option 2: Define a Lambda function to run pipeline on your behalf

This is troublesome and prone to errors. Don't run 5.1~5.6 if you opt the first option.

#### 5.1 Create an IAM role that enables event bridge to trigger pipeline

In [38]:
iam = boto3.client("iam")
iam_role_name_eventbridge = "EventBridge_Invoke_Pipeline"

Create AssumeRolePolicyDocument

In [39]:
from botocore.exceptions import ClientError

assume_role_policy_doc = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow", 
            "Principal": {"Service": "events.amazonaws.com"}, 
            "Action": "sts:AssumeRole"
        },
        {
          "Effect": "Allow",
          "Principal": {"Service": "lambda.amazonaws.com"},
          "Action": "sts:AssumeRole"
        }
    ],
}

try:
    iam_role_eventbridge = iam.create_role(
        RoleName=iam_role_name_eventbridge,
        AssumeRolePolicyDocument=json.dumps(assume_role_policy_doc),
        Description="DSOAWS EventBridge Role",
    )
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Role already exists")
    else:
        print("Unexpected error: %s" % e)

Role already exists


Get the Role ARN

In [40]:
role_eventbridge = iam.get_role(RoleName=iam_role_name_eventbridge)
iam_role_eventbridge_arn = role_eventbridge["Role"]["Arn"]
print(iam_role_eventbridge_arn)

arn:aws:iam::093729152554:role/EventBridge_Invoke_Pipeline


#### 5.2 Define and Create Eventbridge Policy

In [43]:
eventbridge_sfn_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {"Sid": "VisualEditor0", "Effect": "Allow", "Action": "states:StartExecution", "Resource": "*"}
    ],
}


try:
    policy_eventbridge_sfn = iam.create_policy(
        PolicyName="EventBridgeInvokePipeline", PolicyDocument=json.dumps(eventbridge_sfn_policy)
    )
    print("Done.")
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy already exists")
        policy_eventbridge_sfn_arn = f"arn:aws:iam::{account_id}:policy/EventBridgeInvokePipeline"
        iam.create_policy_version(
            PolicyArn=policy_eventbridge_sfn_arn, PolicyDocument=json.dumps(eventbridge_sfn_policy), SetAsDefault=True
        )
        print("Policy updated.")
    else:
        print("Unexpected error: %s" % e)

# Get policy ARN        
policy_eventbridge_sfn_arn = f"arn:aws:iam::{account_id}:policy/EventBridgeInvokePipeline"
print(policy_eventbridge_sfn_arn)

Policy already exists
Policy updated.
arn:aws:iam::093729152554:policy/EventBridgeInvokePipeline


#### 5.3 Attach Policy To Role

In [44]:
try:
    response = iam.attach_role_policy(PolicyArn=policy_eventbridge_sfn_arn, RoleName=iam_role_name_eventbridge)
    print("Done.")
except ClientError as e:
    if e.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy is already attached. This is ok.")
    else:
        print("Unexpected error: %s" % e)

Done.


#### 5.4 Define a Lambda to execute pipeline

In [33]:
lambda_script = '''
import json
import os
import time
import sys
from pip._internal import main

main(['install', '-I', '-q', 'boto3==1.16.47', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')

import boto3

region = boto3.Session().region_name
s3 = boto3.client('s3', region_name=region)
sm = boto3.client('sagemaker', region_name=region)

# Need to set the Pipeline Name as Lambda environment variable
PIPELINE_NAME = os.environ['PIPELINE_NAME']
print('Pipeline Name: {}'.format(PIPELINE_NAME))

timestamp = int(time.time())

def lambda_handler(event, context):
    print('boto3: {}'.format(boto3.__version__))
    print('Starting execution of pipeline {}...'.format(PIPELINE_NAME))

    response = sm.start_pipeline_execution(
        PipelineName=PIPELINE_NAME,
        PipelineExecutionDisplayName='trigger-{}'.format(timestamp),
        PipelineParameters=[
        ],
        PipelineExecutionDescription= PIPELINE_NAME,
        # ClientRequestToken='string'
    )

    print('Response: {}'.format(response))

    execution_arn=response['PipelineExecutionArn']
    print('Pipeline execution started with execution ARN: {}'.format(execution_arn))
    print('Done.')
'''

In [34]:
fn_bucket = watched_bucket
fn_key = 'ie-baseline/lambda/'
upload_name = 'lambda_function.zip'
!apt-get update
!apt-get install zip
!echo "$lambda_script" >> lambda_function.py
!zip $upload_name lambda_function.py
!aws s3 cp $upload_name s3://$fn_bucket/$fn_key
!rm lambda_function.py $upload_name

Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [303 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [15.2 kB]
Fetched 8465 kB in 2s (4230 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  zip
0 upgraded, 1 newly installed, 0 to remove and 34 not upgraded.
Need to get 234 kB of archives.
After this operation, 623 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 zip amd64 3.0-11+b1 [234 kB]
Fetched 234 kB in 0s (22.2 MB/s)
debconf: delaying package configuration, since ap

Use `!aws lambda delete-function --function-name invoke-kg-pipeline` to remove the pipeline if it already exists.

In [45]:
lmd = boto3.client('lambda')

response = lmd.create_function(
    FunctionName=lambda_fn_name,
    Runtime='python3.9',
    Role=iam_role_eventbridge_arn,
    Handler='lambda_handler',
    Code={
        'S3Bucket': fn_bucket,
        'S3Key': fn_key+upload_name,
    },
    Environment={
        'Variables': {
            'PIPELINE_NAME': pipeline_name
        }
    },
)
response

{'ResponseMetadata': {'RequestId': '2f6e3576-a176-4de7-9f8a-2dc6ad7df085',
  'HTTPStatusCode': 201,
  'HTTPHeaders': {'date': 'Wed, 15 Sep 2021 06:46:38 GMT',
   'content-type': 'application/json',
   'content-length': '965',
   'connection': 'keep-alive',
   'x-amzn-requestid': '2f6e3576-a176-4de7-9f8a-2dc6ad7df085'},
  'RetryAttempts': 0},
 'FunctionName': 'invoke-kg-pipeline',
 'FunctionArn': 'arn:aws:lambda:us-east-1:093729152554:function:invoke-kg-pipeline',
 'Runtime': 'python3.9',
 'Role': 'arn:aws:iam::093729152554:role/EventBridge_Invoke_Pipeline',
 'Handler': 'lambda_handler',
 'CodeSize': 737,
 'Description': '',
 'Timeout': 3,
 'MemorySize': 128,
 'LastModified': '2021-09-15T06:46:38.341+0000',
 'CodeSha256': 'obHM5qcxtY+LY/pRNtpT7uyNKcNlV2BhYqdXNOuELVk=',
 'Version': '$LATEST',
 'Environment': {'Variables': {'PIPELINE_NAME': 'KGPipeline1631239572'}},
 'TracingConfig': {'Mode': 'PassThrough'},
 'RevisionId': 'f6c4aa4a-66c5-4f3c-8cbc-7eb65c93d674',
 'State': 'Active',
 'Last

In [46]:
lambda_fn_arn = response['FunctionArn']
lambda_fn_arn

'arn:aws:lambda:us-east-1:093729152554:function:invoke-kg-pipeline'

#### 5.5 Set the Lambda function as a target of the EventBridge rule

In [47]:
import uuid

response = events.put_targets(
    Rule="S3-Trigger",
    EventBusName="default",
    Targets=[
        {
            "Id": 'Id'+str(uuid.uuid1()), 
            "Arn": lambda_fn_arn
        }
    ],
)

response

{'FailedEntryCount': 0,
 'FailedEntries': [],
 'ResponseMetadata': {'RequestId': '650e8af7-d5d8-4fa7-8c4f-d52a07d60a06',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '650e8af7-d5d8-4fa7-8c4f-d52a07d60a06',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '41',
   'date': 'Wed, 15 Sep 2021 06:46:47 GMT'},
  'RetryAttempts': 0}}

### 6. Trigger pipeline by writing to the watched location

To check whether a data event is logged, go to CloudTrail console, select the trail you created. At the CloudWatch Logs section, create a CloudWatch group for this trail. Then, go to CloudWatch and check the stream logs of this group.

In [103]:
# download a fake data
!aws s3 cp s3://$watched_bucket/psudo/DuIE_2_0.zip data/pseudo/
# save a copy of real data
!aws s3 cp s3://$watched_bucket/ie-baseline/raw/DuIE_2_0.zip data/real/

download: s3://sm-nlp-data/psudo/DuIE_2_0.zip to data/pseudo/DuIE_2_0.zip
download: s3://sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip to data/real/DuIE_2_0.zip


In [32]:
# upload fake data to the watched location
!aws s3 cp data/pseudo/DuIE_2_0.zip s3://$watched_bucket/ie-baseline/raw/DuIE_2_0.zip

upload: data/pseudo/DuIE_2_0.zip to s3://sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip


Now go to the CloudTrail console and EventBridge console to check whether there are something happening!

And check whether there is a new pipeline running as well!

If everything goes well, well, we revert the pseudo data to its original version.

In [33]:
!aws s3 cp data/real/DuIE_2_0.zip s3://$watched_bucket/ie-baseline/raw/

upload: data/real/DuIE_2_0.zip to s3://sm-nlp-data/ie-baseline/raw/DuIE_2_0.zip
