Skip to content

Latest commit

 

History

History
333 lines (170 loc) · 15.8 KB

Analyzing-Data-Streams.md

File metadata and controls

333 lines (170 loc) · 15.8 KB

Introduction

This guide will help you set up the lab environment for the Real-Time Clickstream Anomaly Detection Amazon Kinesis Data Analytics lab.

The AWS CloudFormation template Kinesis_Pre_Lab.json included with this lab deploys the following architecture without the highlighted components. You will set up the highlighted components manually.

After you deploy the CloudFormation template, sign into your account to view the following resources:

• Two Amazon Simple Storage Service (Amazon S3) buckets:You will use these buckets to persist raw and processed data.

• One AWS Lambda function: This Lambda function will be triggered once an anomaly has been detected.

• Amazon Simple Notification Service (Amazon SNS) topic with an email and phone number subscribed to it: The Lambda function will publish to this topic once an anomaly has been detected.

• Amazon Cognito User credentials: You will use these user credentials to log into the Kinesis Data Generator to send records to our Amazon Kinesis Data Firehose.

Image of

Download Lab Files

Download the zip file to your machine:

Analyzing Data Streams Lab Files (zipped)

The zip has three files you will be using in different steps in the lab:

Kinesis_Pre_Lab.json: CloudFormation template

anomaly_detection.sql: Anomaly detection SQL code

anomaly_detection_lambda.js: Anomaly Detection Lambda function

CloudFormation Stack Deployment

Make sure you are in US-WEST-2 (Oregon) region

  1. In your AWS account, navigate to the CloudFormation console.
  2. On the CloudFormation console, Click Create new Stack.

Image

  1. In the Select Template section, select Upload a template to Amazon S3. Then, browse to your Kinesis_Pre_Lab.json file provided with your lab package.

Image

  1. Click Next at the bottom of the select template page in as shown in above screenshot.
  2. In the Specify Details section, for Stack name, type kinesis-pre-lab.
  3. In the Parameters section, fill the following fields:

• Username: This is your username to login to the Kinesis Data Generator

• Password: This is your password for the Kinesis Data Generator. The password must be at least 6 alpha-numeric characters and contain at least one number and a capital letter.

• Email: Type an email address that you can access. The SNS topic sends a confirmation to this address.

• SMS: Type a phone number (+1XXXXXXXXX) where you can receive texts from the SNS topic.

Image

  1. In the Options, section, keep the default values.
  2. In the Review section, select the check box marked I acknowledge that AWS CloudFormation might create IAM resources.

Image

  1. Click Create. CloudFormation redirects you to your existing stacks. The kinesis-pre-lab displays a CREATE_IN_PROGESS status.

Image

  1. Once your stack is deployed, click the Outputs tab to view more information: • KinesisDataGeneratorUrl: This value is the Kinesis Data Generator (KDG) URL.
    • RawBucketName – Store raw data coming from KDG. • ProcessedBucketName – Store transformed data

Image

Congratulations! You are all done with the CloudFormation deployment.

Set up the Amazon Kinesis Data Generator

On the Outputs tab, notice the Kinesis Data Generator URL. Navigate to this URL to login into the Amazon Kinesis Data Generator (Amazon KDG).

The KDG simplifies the task of generating data and sending it to Amazon Kinesis. The tool provides a user friendly UI that runs directly in your browser. With the KDG, you can do the following tasks:

• Create templates that represent records for your specific use cases

• Populate the templates with fixed data or random data

• Save the templates for future use

• Continuously send thousands of records per second to your Amazon Kinesis stream or Firehose delivery stream

Let’s test your Cognito user in the Kinesis Data Generator.

  1. On the Outputs tab, click the KinesisDataGeneratorUrl.

Image

  1. Sign in using the username and password you entered in the CloudFormation console.

Image

  1. After you sign in, you should see the KDG console. You need to set up some templates to mimic the clickstream web payload.

Image

Create the following templates but don’t click on Send Data yet. We will do that during main lab: a. Schema Discovery Payload {"browseraction":"DiscoveryKinesisTest", "site": "yourwebsiteurl.domain.com"} b. Click Payload {"browseraction":"Click", "site": "yourwebsiteurl.domain.com"} c. Impression Payload {"browseraction":"Impression", "site": "yourwebsiteurl.domain.com"}

Note that your Kinesis Data Firehose has been deployed in US-WEST-2.

Your Amazon Kinesis Data Generator console should look similar to this example.

Image

Set up Email and SMS Subscription

  1. Go to the SNS service in the AWS console
  2. On the Amazon SNS navigation menu, select Topics. An SNS topic named starting with kinesis-pre-lab-CSEClickStreamEvent-2 appears in the display.:

Image

  1. Click the topic name. The Topic details screen appears listing the e-mail/SMS subscription as pending confirmation. Make sure to take note of Topic ARN value because you need this value in next section.

Image

Note: Select corresponding subscription endpoint and Click Request confirmations to confirm your subscription for e-mail/SMS. Make sure to check your email junk folder for the request confirmation link .

Observe AWS Lambda Anomaly function:

  1. In the console, navigate to AWS Lambda.
  2. In the AWS Lambda navigation pane, select Functions.
  3. A Lambda function named starting with kinesis-pre-lab-CSEBeconAnomalyResponse appears in the Functions panel.

Image

  1. Click the function hyperlink.
  2. On the next page, scroll down to Function Code section.

Image

  1. Go through the code in the Lambda code editor. Notice TopicArn value your recorded in Email/SMS subscription step. Lambda will send message to this topic and notify.

  2. You can also analyze code from anomaly_detection_lambda.js provided with your lab package

Set up an Analytics Pipeline Application

Make sure you are in US-WEST-2 (Oregon) region

  1. Navigate to the Amazon Kinesis console.
  2. Click Get started and then click Create analytics application.

Image

  1. On the Create application page, fill the fields as follows: a. For Application name, type anomaly-detection-application. b. For Description, type a description for your application. c. Leave “SQL” selected as Default.

Image

  1. Click Create application.

  2. On the application page, click Connect streaming data.

Image

  1. Select Choose source, and make the following selections: a. For Source, choose Kinesis Firehose delivery stream. b. From Kinesis Firehose delivery stream dropdown, select the Firehose stream name starting with kinesis-pre-lab-FirehoseDeliveryStream. This is the Firehose Delivery Stream created via CloudFormation earlier.

Image

  1. In the Record pre-processing with AWS Lambda section, choose Disabled.
  2. In the Access to chosen resources section, select Choose from IAM roles that Kinesis Analytics can assume.
  3. In the IAM role box, search for the following role:
    -CSEKinesisAnalyticsRole-

Image

You have set up the Kinesis Data Analytics application to receive data from a Kinesis Data Firehose and to use an IAM role from the pre-lab. However, you need to start sending some data to the Kinesis Data Firehose before you click Discover schema in your application.

Navigate to the Amazon Kinesis Data Generator (Amazon KDG) which you setup in prelab and start sending the Schema Discovery Payload at 1 record per second by click on Send data button. Make sure to select the region “us-west-2”

Image

Image

Now that your Kinesis Data Firehose is receiving data, you can continue configuring the Kinesis Data Analytics Application.

  1. In the console?, click Discover Schema. (Make sure your KDG is sending data to your Kinesis Data Firehose.)

Image

  1. Click Save and continue. Your Kinesis Data Analytics Application is created with an input stream.

Image

Now, you can add some SQL queries to easily analyze the data that is being fed into the stream.

  1. In the Real time analytics section, click Go to SQL editor.

  2. Click on “Yes, start application” to start your kinesis analytics application.

Image

  1. Erase the placeholder text in the SQL editor
  2. Copy the contents of the file named ‘anomaly_detection.sql’ from your lab package and paste it into the SQL editor.

Image

Image

  1. Click Save and run SQL. The analytics application starts and runs your SQL query. (You can find the SQL query in Appendix A.)

To learn more about the SQL logic, see the Analytics application section in the following blog post: https://aws.amazon.com/blogs/big-data/real-time-clickstream-anomaly-detection-with-amazon-kinesis-analytics/

  1. On the Source data tab, observe the input stream data named “SOURCE_SQL_STREAM_001”.

Image

If you click the Real-time analytics tab, you will notice multiple in-application streams You will populate data in these streams later in the lab.

Image

Connect Lambda as destination to Analytics Pipeline

Now that the logic to detect anomalies is in the Kinesis Data Analytics application, you must. connect it to a destination (AWS Lambda function) to notify you when there is an anomaly.

  1. Click the Destination tab and click Connect to a Destination.
  2. For Destination, choose AWS Lambda function.
  3. In the Deliver records to AWS Lambda section, make the following selections: a. For Lambda function, choose CSEBeconAnomalyResponse.
    b. For Lambda function version, choose $LATEST.
  4. In the In-application stream section, make the following selections: a. Select Choose an existing in-application stream. b. For In-application stream name, chooseDESTINATION_SQL_STREAM c. For Output format, choose: JSON.
  5. In the Access to chosen resources section, make the following selections:
    a. Select Choose from IAM roles that Kinesis Analytics can assume. b. For IAM role, choose pre-lab-CSEKinesisAnalyticsRole-RANDOMSTRING.

Your parameters should look like the following image. This configuration allows your Kinesis Data Analytics Application to invoke your anomaly Lambda function and notify you when any anomalies are detected.

Image

Image

Now that all of the components are in place, you can test your analytics application. For this part of the lab, you will need to use your Kinesis Data Generator in three separate browser windows. You need to replicate the clickstream data, and each browser window will send a different payload in each request to your Kinesis Data Firehose stream.

  1. Open your KDG in five separate browser windows and sign in as the same user. Note: Make sure to select the us-west2 region. Do not accept the default region.
  2. In one of your browser windows, start sending the Impression payload at a rate of 1 record per second (keep this running).
  3. On another browser window, start sending the Click payload at a rate of 1 record per second (keep this running).
  4. On your last three browser windows, start sending the Click payload at a rate of 1 record per second for a period of 20 seconds.
    **If you did not receive an anomaly email, open another KDG window and send additional concurrent Click payloads. Make sure to not allow these functions to run for more than 10 to 20 seconds at a time. This could cause AWS Lambda to send you multiple emails due to the number of anomalies you are creating.

You can monitor anomalies on the Real-time analytics tab in the DESTINATION_SQL_STREAM table. If an anomaly is detected, it displays in that table.

Image

Make sure to click other streams and review the data.

Once an anomaly has been detected in your application and you will receive an email and text message to the specified accounts.

Email Snapshot:

Image

SMS Snapshot:

Image

After you have completed the lab, click Actions > Stop Application to stop your application and avoid flood of SMS and e-mails messages.

Image

Environment Cleanup

To save on cost, it is required to dispose your environment which you have created during this lab. Make sure to empty S3 buckets from console before following below steps:

  1. In your AWS account, navigate to the CloudFormation console.

  2. On the CloudFormation console, select stack which you have created during pre-lab.

  3. Click on Action drop down and select delete stack as shown in below screenshot.

    Image

  4. As you created, Kinesis Analytics application manually, so need to delete it by selecting your analytics application . Click on Action drop down and select delete application

Image

  1. Go the Cognito and delete the user pool that have been created.