# Customer Review Processing Pipeline with Firehose Data Streams
1. Reviews are submitted to Firehose Streams
2. Firehose then does data Transformation using Lambda
3. Lambda invokes Comprehed to assess sentiment and add sentiment to JSON
4. Firehose then collects the transforms records and stores
5. With this pipeline, firehose is ready to ingest streaming data continuously and process and send to S3

Objective: Use Comprehend Service to detect sentiment

Input: Customer Review
Output: Overall sentiment and scores for Positive, Negative, Neutral, Mixed  

https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html  

Dataset and Problem Description:https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products/

File: s3://amazon-reviews-multilingual/Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

Matplotlib is building the font cache; this may take a moment.


### Customer Reviews for Major Appliances

### Prepare Training and Test data 

In [2]:
bucket='amazon-reviews-multilingual' # Or whatever you called your bucket
data_key = 'Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv' # Where the file is within your bucket
data_location = 's3://{}/{}'.format(bucket, data_key)
df = pd.read_csv(data_location)

In [3]:
print('Rows: {0}, Columns: {1}'.format(df.shape[0],df.shape[1]))

Rows: 5000, Columns: 24


In [4]:
df.head()

Unnamed: 0,id,dateAdded,dateUpdated,name,asins,brand,categories,primaryCategories,imageURLs,keys,...,reviews.dateSeen,reviews.doRecommend,reviews.id,reviews.numHelpful,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.username,sourceURLs
0,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,"2018-05-27T00:00:00Z,2017-09-18T00:00:00Z,2017...",False,,0,3,http://reviews.bestbuy.com/3545/5442403/review...,I thought it would be as big as small paper bu...,Too small,llyyue,https://www.newegg.com/Product/Product.aspx%25...
1,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,"2018-05-27T00:00:00Z,2017-07-07T00:00:00Z,2017...",True,,0,5,http://reviews.bestbuy.com/3545/5442403/review...,This kindle is light and easy to use especiall...,Great light reader. Easy to use at the beach,Charmi,https://www.newegg.com/Product/Product.aspx%25...
2,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,2018-05-27T00:00:00Z,True,,0,4,https://reviews.bestbuy.com/3545/5442403/revie...,Didnt know how much i'd use a kindle so went f...,Great for the price,johnnyjojojo,https://www.newegg.com/Product/Product.aspx%25...
3,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,2018-10-09T00:00:00Z,True,177283626.0,3,5,https://redsky.target.com/groot-domain-api/v1/...,I am 100 happy with my purchase. I caught it o...,A Great Buy,Kdperry,https://www.newegg.com/Product/Product.aspx%25...
4,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,2018-05-27T00:00:00Z,True,,0,5,https://reviews.bestbuy.com/3545/5442403/revie...,Solid entry level Kindle. Great for kids. Gift...,Solid entry-level Kindle. Great for kids,Johnnyblack,https://www.newegg.com/Product/Product.aspx%25...


In [5]:
df['reviews.title'] = df['reviews.title'].fillna(' ')
df['reviews.text'] = df['reviews.text'].fillna(' ')

In [6]:
# Replace embedded new lines, tabs and carriage return
pattern = r'[\n\t\r]+'

### Submit review to Firehose Stream

In [13]:
import boto3

In [20]:
session = boto3.Session(region_name='us-east-2')

In [21]:
client_firehose = session.client('firehose')

In [22]:
kinesis_delivery_stream_name = 'CustomerReviewStream'

### Warning: Sending all 5000 reviews would incur a cost of USD 3.5 for sentiment analysis.
### In this lab, we need to send only the first 10 reviews

In [23]:
# Push Reviews to Firehose
# firehose to s3 json
# https://stackoverflow.com/questions/34468319/reading-the-data-written-to-s3-by-amazon-kinesis-firehose-stream/49417680#49417680

for i in range(10):
    # Strip out any new line, tab and carriage return from json payload
    # Add a new line at the end to ensure firehose places each json record in a separate
    # row. without the new line, firehose simply places all records in a single line in S3.
    payload = re.sub(pattern,' ', df.iloc[i].to_json()) + "\n"

    print(payload)
    response = client_firehose.put_record(
        DeliveryStreamName = kinesis_delivery_stream_name,
        Record={
            'Data': payload
        }
    )

    print ('Response',response['ResponseMetadata']['HTTPStatusCode'])
    print()
    '''if response['ResponseMetadata']['HTTPStatusCode'] != 200:
        print (response)
    else:
        print('.',end=' ')
'''        

{"id":"AVqVGZNvQMlgsOJE6eUY","dateAdded":"2017-03-03T16:56:05Z","dateUpdated":"2018-10-25T16:36:31Z","name":"Amazon Kindle E-Reader 6\" Wifi (8th Generation, 2016)","asins":"B00ZV9PXP2","brand":"Amazon","categories":"Computers,Electronics Features,Tablets,Electronics,iPad & Tablets,Kindle E-readers,iPad Accessories,Used:Tablets,E-Readers,E-Readers & Accessories,Computers\/Tablets & Networking,Used:Computers Accessories,iPads Tablets,All Tablets,Tablets & E-readers,Computers & Tablets,Amazon,Tablets & eBook Readers","primaryCategories":"Electronics","imageURLs":"https:\/\/pisces.bbystatic.com\/image2\/BestBuy_US\/images\/products\/5442\/5442403_sd.jpg,https:\/\/c1.neweggimages.com\/NeweggImage\/ProductImage\/A3FA_1_201801081360871160.jpg,https:\/\/i.ebayimg.com\/thumbs\/images\/g\/N4IAAOSwoA9Zgkso\/s-l96.jpg,http:\/\/i.ebayimg.com\/thumbs\/images\/g\/dpkAAOSwfpVZFKHy\/s-l200.jpg,http:\/\/i.ebayimg.com\/thumbs\/images\/g\/PJgAAOSwiDFYPE8h\/s-l200.jpg,http:\/\/i.ebayimg.com\/thumbs\/image

### Verify CloudWatch Log for the Lambda Function to confirm processing of review
### and check S3 bucket for the delivered data