### Amazon Comprehend Demo

***
Copyright [2019] Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
***

### Prerequisites:

#### Identity and Acces Management

The user or role that executes the commands must have permissions in AWS Identity and Access Management (IAM) to perform those actions. AWS provides a set of managed policies that help you get started quickly. For our example, you should apply the following managed policy to your user or role:

    ComprehendReadOnly

Be aware that we recommend you follow AWS IAM best practices for production implementations, which is out of scope for this workshop.

### Install tqdm for progress bar

In [1]:
!pip install tqdm

[33mYou are using pip version 18.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import boto3
import gzip
import json
import csv
from pprint import pprint
from tqdm import tqdm_notebook as tqdm_nb
from time import sleep

comprehend = boto3.client('comprehend', region_name='eu-west-1')

In [3]:
# download review dataset

!curl -O http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Amazon_Instant_Video_5.json.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9294k  100 9294k    0     0   470k      0  0:00:19  0:00:19 --:--:--  462k


### Run sentiment analysis against each review

In [4]:
%%time

filename = 'reviews_Amazon_Instant_Video_5.json.gz'
f = gzip.open(filename, 'r') 
out = {}
totalx = 50
x = totalx # only process the first 50 entries 
for line in tqdm_nb(f, total=x): 
    
    x -= 1
    if x == -1:
        break
    review = json.loads(line)
    print(str(totalx-x)+') '+str(review['reviewText'][:100]) + " ....")
    # get sentiment for reviewText
    reviewText = review['reviewText']
    if len(reviewText) > 5000: # only supporting up to 5000 Bytes, skipping entry
        print ('Skipping: %s' % reviewText)
    else:
        textSentiment = comprehend.detect_sentiment(
                            Text=reviewText,
                            LanguageCode='en'
                            )

        out[totalx-x] = {'ReviewText':review['reviewText'],'Sentiment':textSentiment['Sentiment'],
                                             'P(positive)':textSentiment['SentimentScore']['Positive'],
                                             'P(negative)':textSentiment['SentimentScore']['Negative'],
                                             'P(neutral)':textSentiment['SentimentScore']['Neutral'],
                                             'P(mixed)':textSentiment['SentimentScore']['Mixed'] }    



HBox(children=(IntProgress(value=0, max=50), HTML(value='')))

1) I had big expectations because I love English TV, in particular Investigative and detective stuff bu ....
2) I highly recommend this series. It is a must for anyone who is yearning to watch "grown up" televisi ....
3) This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what  ....
4) Mysteries are interesting.  The tension between Robson and the tall blond is good but not always bel ....
5) This show always is excellent, as far as british crime or mystery showsgoes this is one of the best  ....
6) I discovered this series quite by accident. Having watched and appreciated Masterpiece Contemporary: ....
7) It beats watching a blank screen. However, I just don't seem to be in tune with to comedy of today. ....
8) There are many episodes in this series, so I pretty-much just skip through them to try to find a des ....
9) This is the best of the best comedy Stand-up. The fact that I was able to just watch continuously on ....
10) Not bad.  Didn't

### Print out Analyzed sentiment for each line

In [5]:
pprint(out)

{1: {'P(mixed)': 0.1006036177277565,
     'P(negative)': 0.8911099433898926,
     'P(neutral)': 0.0028986716642975807,
     'P(positive)': 0.0053877560421824455,
     'ReviewText': 'I had big expectations because I love English TV, in '
                   'particular Investigative and detective stuff but this guy '
                   "is really boring. It didn't appeal to me at all.",
     'Sentiment': 'NEGATIVE'},
 2: {'P(mixed)': 4.820343747269362e-05,
     'P(negative)': 2.043618223979138e-06,
     'P(neutral)': 0.00017538119573146105,
     'P(positive)': 0.9997743964195251,
     'ReviewText': 'I highly recommend this series. It is a must for anyone '
                   'who is yearning to watch "grown up" television. Complex '
                   'characters and plots to keep one totally involved. Thank '
                   'you Amazin Prime.',
     'Sentiment': 'POSITIVE'},
 3: {'P(mixed)': 0.0440569669008255,
     'P(negative)': 0.8037886619567871,
     'P(neutral)': 0.09414444118

### Write the above output to a file (optional)

In [6]:
import json

with open('sentiment-analysis.txt', 'w') as file:
     file.write(json.dumps(out))

In [7]:
!ls

AmazonML_Demo.ipynb                    large.wav
Comprehend_Demo.ipynb                  mocha.wav
FashionMNIST_MXNet_Demo.ipynb          no.wav
Lex_CreateBot_Demo.ipynb               reviews_Amazon_Instant_Video_5.json.gz
Lex_Demo.ipynb                         sentiment-analysis.txt
PollyPSE.xml                           sentiment.csv
Polly_Demo.ipynb                       small.wav
Rekognition_Demo.ipynb                 special.wav
