### In this notebook, please complete the following tasks:

1. Load the data from the uploaded file
2. Make sure the SageMaker role has the right permission to access the Amazon Comprehend API 
2. Use Amazon Comprehend API to detect sentiment by writing code in the TODO lines  
3. Analyze the sentiment to provide recommendations and answers by writing code in the TODO lines 

### Load Data
The first step is to load the data that's uploaded to this SageMaker instance. Once the data is loaded, the reviews are then being split into lists to be consumed by the Comprehend API.

In [1]:
f = open('reviews.txt')    
reviews = f.read().split("\n")

### Perform Sentiment Analysis
Once you have the right permission attached, it's time to analyze these reviews!

#### Amazon Comprehend
To analyze the reviews, we will be using [Amazon Comprehend](https://aws.amazon.com/comprehend/). Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. 

Amazon Comprehend uses machine learning to help you uncover the insights and relationships in your unstructured data. The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic. 

#### TODO: Get the right permission
In order to connect to Amazon Comprehend from SageMaker, first thing to do is to make sure that you have the right permission attached to the SageMaker role. Read here for more details: https://docs.aws.amazon.com/comprehend/latest/dg/access-control-managing-permissions.html

#### Connect to Comprehend

After you have the right permission attached, run the following command to connect to comprehend. The following code uses the AWS SDK for Python SDK (Boto3) to connect to Amazon Comprehend. It first imports boto3 and then connects to Amazon Comprehend in a specified AWS Region using the boto3 client.
The AWS Region must be same Region as the notebook.

In [2]:
import boto3
comprehend = boto3.client(service_name='comprehend', region_name = 'us-east-1')
comprehend

<botocore.client.Comprehend at 0x7f88839f5320>

#### Run the detect_sentiment function
Using the [Amazon Comprehend API](https://docs.aws.amazon.com/comprehend/latest/dg/get-started-api-sentiment.html), you can now analyze the reviews. You will be able to extract the sentiment for those reviews using detect_sentiment method that is in the following command.There are 3 lines of python code that you need to write to complete this section, please remove the comments and put in the required lines of code.

In [3]:
# note: go to your roles in iam and add amazon comprehend access to existing user or else this will be an error.
all_result = []
for review in reviews:
    result = comprehend.detect_sentiment(Text = review, LanguageCode='en')
    sentiment = result.get('Sentiment')
    pos = result.get('SentimentScore')['Positive']
    neg = result.get('SentimentScore')['Negative']
    neu = result.get('SentimentScore')['Neutral']
    mixed = result.get('SentimentScore')['Mixed']
    all_result.append([review, sentiment,pos,neg,neu, mixed])

### Analyze the sentiments using Pandas!
Once you have detected the sentiments for the reviews, it's time to analyze them. In order to do so, we will be using a popular package called pandas. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Read more [here](https://pandas.pydata.org/getting_started.html).

Run the following command to import pandas and put the sentiments into a dataframe called result. A [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) a two-dimensional array;  size-mutable, potentially heterogeneous tabular data.



In [4]:
import pandas as pd 
result_df = pd.DataFrame(all_result, 
                      columns = ['review','sentiment', 'positive_score','negative_score', 'neutral_score','mixed_score'])
result_df

Unnamed: 0,review,sentiment,positive_score,negative_score,neutral_score,mixed_score
0,This toy is awesome! I have a 1yrd German Shep...,POSITIVE,0.999706,3.5e-05,0.000242,1.8e-05
1,My germany sheppard destoried this toy for chr...,NEGATIVE,0.000429,0.999116,0.000449,6e-06
2,"I have Chihuahua and this toy was good, but I ...",MIXED,0.001415,0.003245,0.000197,0.995142
3,This dog toy is too small!!,NEGATIVE,0.000947,0.99663,0.001079,0.001344
4,Best dog toy!!! I got 3 of them and now my dog...,POSITIVE,0.999217,0.000611,0.000111,6e-05
5,My dog has been playing with this toy for 5 ye...,POSITIVE,0.445268,0.184193,0.27096,0.09958
6,This product is my dog's favourite toy!!Not on...,POSITIVE,0.999856,2.7e-05,7.3e-05,4.4e-05
7,Got this for christmas for my dog,NEUTRAL,0.051873,0.000873,0.947236,1.9e-05
8,Don't order it from here. I found a different ...,NEGATIVE,0.003533,0.969972,0.012201,0.014294
9,I have a 30lb dog that could chew through anyt...,POSITIVE,0.998972,0.00023,0.000565,0.000233


Now that you have the result in a data frame, it's time to get the sentiment that has the most reviews and the count for sentiment. In order to do so, we will be using this [method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html) to get the counts for all the sentiments and then print/extract the top 1 sentiment/count. 

Can you figure out how to use this method? Hint: Pandas Series is the data structure for a single column of a DataFrame. For example: result_df['review'] is considered a series 

In [6]:
sentiments_count = result_df['sentiment'].value_counts()
print ("The sentiment that has the most reviews is {}, the total count for that sentiment is {}"
      .format(sentiments_count.index[0], sentiments_count.iloc[0]))

The sentiment that has the most reviews is POSITIVE, the total count for that sentiment is 27


In [40]:
sentiments_count

POSITIVE    27
NEGATIVE    11
MIXED        5
NEUTRAL      1
Name: sentiment, dtype: int64

Once you get the most popular sentiment, it is time to find the highest score! In order to do so, we will be using this method: [sort_values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html), which sorts the dataframe based on a specified column. Please follow the TODO instructions and modify the code

In [33]:
# sentiment_score_col_name = # TODO: find the name of the column that contains the score of the most popular sentiment and assign it to variable sentiment_score_col_name

sentiment_score_col_name = result_df.sort_values(by=['positive_score', 'sentiment', 'review'], ascending=False)
# sentiment_score_col_name = result_df['positive_score'].sort_values(ascending=False)
sentiment_score_col_name

Unnamed: 0,review,sentiment,positive_score,negative_score,neutral_score,mixed_score
6,This product is my dog's favourite toy!!Not on...,POSITIVE,0.999856,2.7e-05,7.3e-05,4.4e-05
40,Very happy with this purchase,POSITIVE,0.999831,4.5e-05,9.9e-05,2.5e-05
37,My dog really loves these toys!! they have so ...,POSITIVE,0.999831,2.5e-05,9.8e-05,4.6e-05
27,My dog loved this toy!! he was especially prot...,POSITIVE,0.99982,2.7e-05,0.00013,2.4e-05
23,Great price!! and very cute,POSITIVE,0.999788,2.6e-05,0.00013,5.6e-05
43,My little lab dog loves this beaver!! very cut...,POSITIVE,0.999782,2.9e-05,0.000166,2.3e-05
34,Nice toy to carry around,POSITIVE,0.99978,4.6e-05,0.000144,3e-05
25,My pomerian had a few of these cute toys!! he ...,POSITIVE,0.999707,2.4e-05,0.000233,3.7e-05
41,Best dog toys I have ever had. It's a good toy...,POSITIVE,0.999706,5.1e-05,0.000222,2.1e-05
0,This toy is awesome! I have a 1yrd German Shep...,POSITIVE,0.999706,3.5e-05,0.000242,1.8e-05


In [55]:
#TODO: Modify the following line of code to display the review with the HIGHEST score instead of the LOWEST score (which is the default)
single_review = result_df.sort_values(by=['sentiment','positive_score'], ascending=False)['review'].iloc[0]
print ('The review that has the highest score from the most popular sentiment: {}'.format(single_review))

The review that has the highest score from the most popular sentiment: This product is my dog's favourite toy!!Not only is the price affordable, it's also a very durable product with very good design. Love it!


### Get Answer - Run the following cell and put the answer into the answer field

In [56]:
print ('The answer to this challenge is: {}{}'.format(sentiments_count.iloc[0], single_review))

The answer to this challenge is: 27This product is my dog's favourite toy!!Not only is the price affordable, it's also a very durable product with very good design. Love it!


In [48]:
result_df['positive_score'].sort_values(ascending=False)

6     0.999856
40    0.999831
37    0.999831
27    0.999820
23    0.999788
43    0.999782
34    0.999780
25    0.999707
41    0.999706
0     0.999706
17    0.999680
20    0.999673
28    0.999576
18    0.999575
13    0.999298
4     0.999217
36    0.999111
14    0.999024
9     0.998972
29    0.998945
21    0.998811
15    0.991746
10    0.909561
39    0.719479
32    0.662992
42    0.530159
5     0.445268
16    0.248718
22    0.072572
12    0.065440
7     0.051873
26    0.046015
38    0.030923
11    0.021075
30    0.006398
19    0.005688
8     0.003533
24    0.003133
33    0.002831
35    0.001930
2     0.001415
3     0.000947
1     0.000429
31    0.000057
Name: positive_score, dtype: float64

In [49]:
result_df.iloc[6]

review            This product is my dog's favourite toy!!Not on...
sentiment                                                  POSITIVE
positive_score                                             0.999856
negative_score                                          2.71233e-05
neutral_score                                           7.29193e-05
mixed_score                                             4.41631e-05
Name: 6, dtype: object