### Amazon Comprehend Demo

***
Copyright [2019] Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
***

### Prerequisites:

#### Identity and Acces Management

The user or role that executes the commands must have permissions in AWS Identity and Access Management (IAM) to perform those actions. AWS provides a set of managed policies that help you get started quickly. For our example, you should apply the following managed policy to your user or role:

    ComprehendReadOnly

Be aware that we recommend you follow AWS IAM best practices for production implementations, which is out of scope for this workshop.

### Install tqdm for progress bar

In [None]:
!pip install tqdm

In [None]:
import boto3
import gzip
import json
import csv
from pprint import pprint
from tqdm import tqdm_notebook as tqdm_nb
from time import sleep

comprehend = boto3.client('comprehend', region_name='us-east-2')

In [None]:
# download review dataset

!curl -O http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Amazon_Instant_Video_5.json.gz

### Run sentiment analysis against each review

In [None]:
%%time

filename = 'reviews_Amazon_Instant_Video_5.json.gz'
f = gzip.open(filename, 'r') 
out = {}
totalx = 50
x = totalx # only process the first 50 entries 
for line in tqdm_nb(f, total=x): 
    
    x -= 1
    if x == -1:
        break
    review = json.loads(line)
    print(str(totalx-x)+') '+str(review['reviewText'][:100]) + " ....")
    # get sentiment for reviewText
    reviewText = review['reviewText']
    if len(reviewText) > 5000: # only supporting up to 5000 Bytes, skipping entry
        print ('Skipping: %s' % reviewText)
    else:
        textSentiment = comprehend.detect_sentiment(
                            Text=reviewText,
                            LanguageCode='en'
                            )

        out[totalx-x] = {'ReviewText':review['reviewText'],'Sentiment':textSentiment['Sentiment'],
                                             'P(positive)':textSentiment['SentimentScore']['Positive'],
                                             'P(negative)':textSentiment['SentimentScore']['Negative'],
                                             'P(neutral)':textSentiment['SentimentScore']['Neutral'],
                                             'P(mixed)':textSentiment['SentimentScore']['Mixed'] }    



### Print out Analyzed sentiment for each line

In [None]:
pprint(out)

### Write the above output to a file (optional)

In [None]:
import json

with open('sentiment-analysis.txt', 'w') as file:
     file.write(json.dumps(out))

In [None]:
!ls