#Predicting sentiment from product reviews

#Fire up GraphLab Create

In [1]:
import graphlab
from functools import partial
from collections import OrderedDict

#Read some product review data

Loading reviews for a set of baby products. 

In [2]:
products = graphlab.SFrame('amazon_baby.gl/')

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1494013301.log


This non-commercial license of GraphLab Create for academic use is assigned to thomasjaensch@gmail.com and will expire on May 04, 2018.


#Let's explore this data together

Data includes the product name, the review text and the rating of the review. 

In [3]:
products.head()

name,review,rating
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0


#Build the word count vector for each review

In [4]:
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [5]:
products.head()

name,review,rating,word_count
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ..."
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ..."
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ..."
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ..."
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ..."
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ..."
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ..."
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ..."


In [6]:
def positive_count(word, word_count):
    if word in word_count:
        return word_count[word]
    else:
        return 0

In [7]:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

In [8]:
# Use .apply() to build a new feature with the counts for each of the selected_words
products['awesome'] = products['word_count'].apply(partial(positive_count, 'awesome'))

In [9]:
awesome_products = products[products['awesome'] > 0]

In [10]:
awesome_products

name,review,rating,word_count,awesome
Pedal Farm Tractor,I bought this for my son when he was 3 years old. ...,5.0,"{'and': 3, 'this': 2, 'on': 1, 'old': 1, ...",1
Thomas &amp; Friends - 3 Piece Dinnerware Set- ...,This dining ware set is awesome for the Thomas ...,5.0,"{'and': 1, 'thomas': 1, 'set': 1, 'awesome': 1, ...",1
Munchkin Mozart Magic Cube ...,The Mozart magic cube is an AWESOME toy for my ...,5.0,"{'and': 3, 'magic': 1, 'old': 2, 'classic': 1, ...",1
Munchkin Mozart Magic Cube ...,Our daughter got this toy for her first birthday. ...,4.0,"{'and': 4, 'grandfather!': 1, 'a ...",1
Evenflo Top of Stair Gate,"Awesome gate. It is sturdy, so its very well ...",5.0,"{'and': 1, 'childproof': 1, 'isse.': 1, 'botto ...",1
Animal Planet's Big Tub of Dinosaurs ...,This is an awesome complete set of dinos ...,5.0,"{'this': 1, 'we': 1, 'set': 1, 'price.': 1, ...",1
"Graco TotBloc Pack 'N Play with Carry Bag, ...",I ordered this because my 23 lb 30 inch long 7 ...,1.0,"{'month': 1, 'sleep': 1, 'still': 1, 'its': 1, ...",1
Philips AVENT Isis On The Go Set ...,I based my decision to purchase this pump based ...,2.0,"{'all': 3, ""don't"": 1, 'catch': 1, 'ounces': 1, ...",1
Philips AVENT Isis On The Go Set ...,I loved this pump. I had my first child this past ...,5.0,"{'feed': 2, 'this': 4, 'inexpensive.': 1, 'on': ...",1
The First Years Nature Sensations Lullaby Pl ...,Our son had problems falling asleep and ...,5.0,"{'all': 1, 'just': 1, 'saver': 1, 'toy.and' ...",1


In [11]:
# selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
for word in selected_words:
    products[word] = products['word_count'].apply(partial(positive_count, word))

In [None]:
products.head()

name,review,rating,word_count,awesome,great,fantastic
Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3.0,"{'and': 5, '6': 1, 'stink': 1, 'because' ...",0,0,0
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...",0,0,0
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",0,0,0
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'ingenious': 1, 'and': 3, 'love': 2, ...",0,0,0
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...",0,1,0
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...",0,1,0
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'when': 1, ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'help': 1, 'give': 1, 'is': 1, ' ...",0,0,0
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'journal.': 1, 'nanny': 1, 'standarad': 1, ...",0,0,0

amazing,love,horrible,bad,terrible,awful,wow,hate
0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,2,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0


In [None]:
# selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
sum_data = dict()
for word in selected_words:
    sum_data[word] = products[word].sum()

In [None]:
# word_count sum sorted by value
sorted_sum_data = OrderedDict(sorted(sum_data.items(), key=lambda t: t[1]))
sorted_sum_data

In [None]:
graphlab.canvas.set_target('ipynb')

In [None]:
products['rating'].show(view='Categorical')

##Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment.  Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.

In [None]:
#ignore all 3* reviews
products = products[products['rating'] != 3]

In [None]:
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4

In [None]:
products.head()

## Split the data into test and training

In [None]:
train_data,test_data = products.random_split(.8, seed=0)

In [None]:
# In what range is the accuracy of simply predicting the majority class on the test_data
float(len(test_data[test_data['sentiment'] > 0])) / len(test_data)

#Build a selected words sentiment classifier

In [None]:
selected_words_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=selected_words,
                                                     validation_set=test_data)

In [None]:
# Using this approach, sort the learned coefficients according to the ‘value’ column using .sort(). 
# Q. Out of the 11 words in selected_words, which one got the most positive weight? Which one got the most negative weight?
selected_words_model['coefficients'].sort('value').print_rows(num_rows=12, num_columns=4)

#Evaluate the selected words sentiment model

In [None]:
selected_words_model.evaluate(test_data, metric='roc_curve')

In [None]:
selected_words_model.show(view='Evaluation')

#Build a word_count sentiment classifier

In [None]:
sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)

#Evaluate the word_count words sentiment model

In [None]:
sentiment_model.evaluate(test_data, metric='roc_curve')

In [None]:
sentiment_model.show(view='Evaluation')

Q. What is the accuracy of the selected_words_model on the test_data? What was the accuracy of the sentiment_model that we learned using all the word counts in the IPython Notebook above from the lectures? What is the accuracy majority class classifier on this task? How do you compare the different learned models with the baseline approach where we are just predicting the majority class? Save these results to answer the quiz at the end.

# Interpreting the difference in performance between the models
To understand why the model with all word counts performs better than the one with only the selected_words, we will now examine the reviews for a particular product 'Baby Trend Diaper Champ'

In [None]:
diaper_champ_reviews = products[products['name'] == 'Baby Trend Diaper Champ']

## Applying the word_count sentiment model to understand sentiment for Diaper Champ

In [None]:
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type='probability')

##Sort the reviews based on the predicted sentiment and explore

In [None]:
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)

In [None]:
diaper_champ_reviews.head()

Q. What is the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model?

## Applying the selected words sentiment model to predict sentiment for most positive review for Diaper Champ

In [None]:
predicted_sentiment_for_most_positive_review = selected_words_model.predict(diaper_champ_reviews[0:1], output_type='probability')

In [None]:
predicted_sentiment_for_most_positive_review

#### Q. Why is the predicted_sentiment for the most positive review found using the model with all word counts (sentiment_model) much more positive than the one using only the selected_words (selected_words_model) ?

In [None]:
diaper_champ_reviews[0]['review']

In [None]:
diaper_champ_reviews[0]['word_count']

In [None]:
for word in selected_words:
    print diaper_champ_reviews[0][word]

#### None of the words of the selected_words is present in the most popular review text, hence the sentiment prediction by selected words model is poor as compared to the words count model