## Part 1: Existing Machine Learning Services

<a href="https://colab.research.google.com/github/peckjon/hosting-ml-as-microservice/blob/master/part1/score_reviews_via_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Obtain labelled reviews

In order to test any of the sentiment analysis APIs, we need a labelled dataset of reviews and their sentiment polarity. We'll use NLTK to download the movie_reviews corpus.

In [1]:
from nltk import download

download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\tvanderm\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


True

### Load the data

The files in movie_reviews have already been divided into two sets: positive ('pos') and negative ('neg'), so we can load the raw text of the reviews into two lists, one for each polarity.

In [2]:
from nltk.corpus import movie_reviews

# extract words from reviews, pair with label

reviews_pos = []
for fileid in movie_reviews.fileids('pos'):
    review = movie_reviews.raw(fileid)
    reviews_pos.append(review)

reviews_neg = []
for fileid in movie_reviews.fileids('neg'):
    review = movie_reviews.raw(fileid)
    reviews_neg.append(review)
    
print(len(reviews_pos))
print(len(reviews_neg))

1000
1000


In [3]:
# see what one review looks like
print(len(reviews_pos[4]))
reviews_pos[4]

3898


'moviemaking is a lot like being the general manager of an nfl team in the post-salary cap era -- you\'ve got to know how to allocate your resources . \nevery dollar spent on a free-agent defensive tackle is one less dollar than you can spend on linebackers or safeties or centers . \nin the nfl , this leads to teams like the detroit lions , who boast a superstar running back with a huge contract , but can only field five guys named herb to block for him . \nin the movies , you end up with films like " spawn " , with a huge special-effects budget but not enough money to hire any recognizable actors . \njackie chan is the barry sanders of moviemaking . \nhe spins and darts across the screen like sanders cutting back through the defensive line . \nwatching jackie in operation condor as he drives his motorcycle through the crowded streets of madrid , fleeing an armada of pursuers in identical black compact cars , is reminiscent of sanders running for daylight with the chicago bears in hot 

### Connect to the scoring API

Fill in this function with code that connects to one of these APIs, and uses it to score a single review:

* [Amazon Comprehend: Detect Sentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_DetectSentiment.html)
* [Google Natural Language: Analyzing Sentiment](https://cloud.google.com/natural-language/docs/analyzing-sentiment)
* [Azure Cognitive Services: Sentiment Analysis](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis)
* [Algorithmia: Sentiment Analysis](https://algorithmia.com/algorithms/nlp/SentimentAnalysis)

Your function must return either 'pos' or 'neg', so you'll need to make some decisions about how to map the results of the API call to one of these values. For example, Amazon Comprehend can return "NEUTRAL" or "MIXED" for the Sentiment -- if this happens, you may with to inspect the numeric values under the SentimentScore to see whether it leans toward positive or negative.


In [5]:
import boto3
import json
# progress bar
from ipypb import track, irange

In [8]:
# test - from https://docs.aws.amazon.com/code-samples/latest/catalog/python-comprehend-DetectSentiment.py.html
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')
text = "It is raining today in Seattle"

print('Calling DetectSentiment')
print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4))
print('End of DetectSentiment\n')

Calling DetectSentiment
{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "content-length": "161",
            "content-type": "application/x-amz-json-1.1",
            "date": "Sun, 27 Sep 2020 04:19:54 GMT",
            "x-amzn-requestid": "920f7dc2-d8a3-4e7b-806b-65ae670012e0"
        },
        "HTTPStatusCode": 200,
        "RequestId": "920f7dc2-d8a3-4e7b-806b-65ae670012e0",
        "RetryAttempts": 0
    },
    "Sentiment": "NEUTRAL",
    "SentimentScore": {
        "Mixed": 0.00021913634554948658,
        "Negative": 0.162128284573555,
        "Neutral": 0.7376415133476257,
        "Positive": 0.10001111775636673
    }
}
End of DetectSentiment



In [6]:
# keep hearing about other students hitting a 5K limit on Amazon Comprehend. 
# rather than truncate the review I'm going to first try removing stopwords 

In [7]:
from nltk.corpus import stopwords
from nltk.tokenize import NLTKWordTokenizer

In [82]:
stops = stopwords.words("english")
# some of the stopwords might be useful, like the negative ones
neg_words = {'mightn', "mightn't", 'mustn', "mustn't",  'needn',  "needn't",  'shan',  "shan't",  'shouldn',  "shouldn't",  'wasn',  "wasn't",  'weren',  "weren't",  'won',  "won't",  'wouldn',  "wouldn't",  'ain',  'aren',  "aren't",  'couldn',  "couldn't",  'didn',  "didn't",  'doesn',  "doesn't",  'hadn',  "hadn't",  'hasn',  "hasn't",  'haven',  "haven't",  'isn',  "isn't"}
stops = set(stopwords.words("english")+ ['``']) - neg_words 
print(stops)

{'both', 'below', 'that', 'under', 'while', 'herself', 's', 'was', 'himself', 'off', 'whom', 'does', 'he', 'down', "you've", 'yourself', 'with', 'yours', 'above', 'm', 've', 'few', 'other', 'here', 'she', 'until', 'who', 'or', 'so', 'had', 'any', 'further', "should've", 'those', 'nor', 'are', 'is', 'between', 'own', 'our', 're', 'his', 'from', 'my', "you'd", 'their', 'has', 'most', 'y', 'same', 'where', 'ourselves', 'against', 'no', 'did', 'what', 'be', 'because', 'me', 'its', "you're", 'him', 'a', 'just', 'we', 'itself', "it's", 'each', 'you', 'then', 'there', 'd', 'too', 'have', 'as', 'such', 'do', 'should', 'it', 'during', "that'll", 'her', 'hers', 'they', 'up', 'myself', 'which', 'your', 'more', 'in', 'can', 'over', 'not', 'don', 'when', 'at', 'after', 'am', 'into', "you'll", 'been', 'but', 'of', 'theirs', 'some', 't', "don't", 'how', 'll', '``', 'if', 'themselves', 'once', 'out', 'on', 'all', 'having', 'again', 'an', 'were', 'ma', 'the', 'by', 'why', 'this', 'ours', 'will', 'and',

In [84]:
tokenizer = NLTKWordTokenizer()
# test it
#tokenizer.tokenize(review)

In [85]:
# try removing stop words to get past Comprehend's 5K character limit
# does tokenizing help?
print("Raw review string length:", len(review))
print("Review string length without stops:", len(remove_stopwords(review, stops)))
# works a little, maybe

Raw review string length: 3255
Review string length without stops: 2334


In [86]:
def remove_stopwords(review, stopwords=[]):
    tokenizer = NLTKWordTokenizer()
    return " ".join([x.strip() for x in tokenizer.tokenize(review) if x.strip() != "" and x.strip() not in stopwords])

remove_stopwords(review, stops)

"two party guys bob heads haddaway 's dance hit love ? getting trouble nightclub nightclub . 's barely enough sustain three-minute _saturday_night_live_ skit , _snl_ producer lorne michaels , _clueless_ creator amy heckerling , paramount pictures saw something late night television institution 's recurring roxbury guys sketch would presumably make good feature . emphasis word presumably . _a_night_at_the_roxbury_ takes already-thin concept tediously stretches far beyond breaking point -- viewers ' patience levels . first five minutes _roxbury_ play much like one original roxbury guys skits . love ? blaring soundtrack , brotherly duo doug steve butabi ( chris kattan ferrell ) bob heads , scope hotties clubs , bump select violent pelvic thrusts . one crucial difference , however -- guys speak . little fact used justification film 's existence , butabis ' newfound capacity speech would open whole new set doors characters . doors opened director john fortenberry screenwriters steve koren ,

In [87]:
def remove_stopwords(review, stopwords=[]):
    tokenizer = NLTKWordTokenizer()
    return " ".join([x for x in tokenizer.tokenize(review) if x.strip() not in ("", "``") and x not in stopwords])
    

def score_review(review, aws_region='us-east-1', stopwords=[], force_remove_stops=False):
    """calls Comprehend service for review. Returns either 'positive' or 'negative'
    --review ==> (str) movie review text.
    --aws_region ==> (str) preferred host region for AWS, e.g., 'us-east-1' 
    --stopwords ==> list of words to remove from string, if necessary 
                    (will automatically attempt to remove if review exceeds Comprehend's 5K character limit)
    --remove_stops ==> If True, will remove stopwords from review string even if below
                        Comprehend's 5K character limit
    """
    if force_remove_stops:
        review = remove_stopwords(review, stopwords)
        
    comprehend = boto3.client(service_name='comprehend', region_name=aws_region)
    try:
        response = comprehend.detect_sentiment(Text=review, LanguageCode='en')
    except:  # still can't find where to import Comprehend's exceptions, like TextSizeLimitExceededException
        # we likely ran up against the 5K character limit
        review = remove_stopwords(review, stopwords)
        # might still be too long
        if len(review.encode('utf-8')) > 5000:
            # now truncate it 
            review = review[:4999]
        response = comprehend.detect_sentiment(Text=review, LanguageCode='en')
    sentiment_final = response.get("Sentiment").lower()
    sentiment_scores = response.get("SentimentScore")
    # AWS Comprehend returns positive, negative, neutral and mixed
    # but we only want pos/neg, so if final is Neutral or Mixed,
    # instead use the highest score between Positive and Negative
#     print(sentiment_final)
    if sentiment_final not in ["positive", "negative"]:
        pos_neg = [(key.lower(),value) for key, value in sentiment_scores.items() if key.lower() in ["positive", "negative"]]
#         print(pos_neg)
        sentiment_final = sorted(pos_neg, key=lambda x:x[1], reverse=True)[0][0]
#         print(sentiment_final)

    return sentiment_final.lower()

In [88]:
# testing
tests = [reviews_pos[110]]
for test in tests:
    print("Using function without forced stopword removal:", score_review(review, stopwords=stops, force_remove_stops=False))
    print("Using function with forced stopword removal:", score_review(review, stopwords=stops, force_remove_stops=True))
    
    
# comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')
# print("Without tokenization:")
# print(comprehend.detect_sentiment(Text=review, LanguageCode='en'))
# print("With tokenization:")
# print(comprehend.detect_sentiment(Text=remove_stopwords(review, stopwords=stops), LanguageCode='en'))
# print("****************")
# print("Using function without forced stopword removal:", score_review(review, stopwords=stops, force_remove_stops=False))
# print("Using function with forced stopword removal:", score_review(review, stopwords=stops, force_remove_stops=True))

Using function without forced stopword removal: positive
Using function with forced stopword removal: negative


### Score each review

Now, we can use the function you defined to score each of the reviews

In [89]:
results_pos = []
for review in track(reviews_pos):
    result = score_review(review, stopwords=stops, force_remove_stops=True)
    results_pos.append(result)

results_neg = []
for review in track(reviews_neg):
    result = score_review(review, stopwords=stops, force_remove_stops=True)
    results_neg.append(result)


### Calculate accuracy

For each of our known positive reviews, we can count the number which our function scored as 'pos', and use this to calculate the % accuracy. We repeaty this for negative reviews, and also for overall accuracy.

In [90]:
correct_pos = results_pos.count('positive')
accuracy_pos = float(correct_pos) / len(results_pos)
correct_neg = results_neg.count('negative')
accuracy_neg = float(correct_neg) / len(results_neg)
correct_all = correct_pos + correct_neg
accuracy_all = float(correct_all) / (len(results_pos)+len(results_neg))

print('Positive reviews: {}% correct'.format(accuracy_pos*100))
print('Negative reviews: {}% correct'.format(accuracy_neg*100))
print('Overall accuracy: {}% correct'.format(accuracy_all*100))

Positive reviews: 74.6% correct
Negative reviews: 58.099999999999994% correct
Overall accuracy: 66.35% correct


### without forced stop word removal:
 - Positive reviews: 75.5% correct
 - Negative reviews: 56.8% correct
 - Overall accuracy: 66.14999999999999% correct
    
### with forced stop word removal:
 - Positive reviews: 74.6% correct
 - Negative reviews: 58.099999999999994% correct
 - Overall accuracy: 66.35% correct
 
 So a very slight gain...