# Analysing Amazon Reviews Part II: Finding controversial reviews
(2017-01-20 (C) Wouter van Atteveldt CC-BY-SA)

With the amazon reviews data prepared in Part I, we can start doing substantive queries.

There are many interesting questions that can be answered with this data.
The one we'll answer here is: what are people loving and hating about the most controversial review?

To start answering the question, we want to do one more preparation step:
Since this question is product-centered, it would be helpful to have a dictionary with all reviews for each product. That is, a dict with the asin product IDs as keys, and a list of reviews as value for each product.


In [1]:
import json
products = json.load(open("gourmet_products.json"))
reviews = json.load(open("gourmet_reviews.json"))

In [2]:
p_reviews = {}
for review in reviews:
    asin = review['asin']
    if asin not in p_reviews:
        p_reviews[asin] = [review]
    else:
        p_reviews[asin].append(review)

# Note: check out the defaultdict class to make this easier! 

Now, for example we can get the ratings and summaries for a review. Do people like starbucks coffee? :)

In [3]:
asin = 'B001EQ5O5U'
print("Reviews for", products[asin]['title'])
for review in p_reviews[asin]:
    print("-", review['overall'], review['summary'])

Reviews for Starbucks Sumatra Coffee, Whole Bean, 12-Ounce Bags (Pack of 3)
- 5.0 Extremely aromatic coffee, yes!
- 5.0 YUM!
- 5.0 Best of Starbucks' whole bean coffees
- 2.0 Not so good
- 5.0 Double Yum!!


# Finding the most-reviewed product

Before finding the most controversial product, let's find the product with the most reviews.
Do do this, we sort the product ids (the keys of p_reviews), using the number of reviews as sorting key.

To do this, we create a function that returns the number of reviews given a product id:

In [4]:
def n_reviews(asin):
    reviews = p_reviews[asin]
    return len(reviews)

n_reviews("B001EQ5O5U")

5

Now, we can use that function to sort the reviews and select the highest:

In [5]:
ids = sorted(p_reviews, key=n_reviews, reverse=True)
most = ids[0]
print(most, "(", products[most]['title'],"): ", n_reviews(most), "reviews")

B000FEH8ME ( Pure Bar Organic Chocolate Brownie, Gluten Free, Raw, Vegan,  1.7-Ounce Bars (Pack of 12) ):  742 reviews


Now let's have a look at the scores this brownie got:

In [7]:
reviews_most = p_reviews[most]
reviews_most

[{'asin': 'B000FEH8ME',
  'helpful': [1, 1],
  'overall': 4.0,
  'reviewText': 'Most gluten-free products taste like cardboard, so when I came upon this sampler, I was happily surprised.  Measurably better tasting than sports\' bars and gluten-free breakfast bars, this organic snack recharges your energy and rejuvenates your spirits.  Now don\'t expect a health product of this caliber to be inexpensive.  I\'m happy that PURE is very competitively priced at less than two dollars per bar.Surprisingly, I liked the "Cherry Cashew" flavor over "Wild Blueberry," but the variety can keep you more than satisfied.Nutrition *****, high in omega-3 and fiber, gluten-free, and organic.Price  ***, You get what you pay for.Taste  ****, So fresh, so good.Value  ****, Seldom do you get everything together in one bar.',
  'reviewTime': '04 25, 2010',
  'reviewerID': 'A3EE0H0NWQ9QVL',
  'reviewerName': '&#34;Rocky Raccoon&#34; "Hey, Doc, It\'s Only ...',
  'summary': 'Gluten-Free Is Right for Me',
  'uni

In [8]:
scores = [review['overall'] for review in reviews_most]
scores

[4.0,
 3.0,
 3.0,
 5.0,
 2.0,
 5.0,
 4.0,
 3.0,
 3.0,
 5.0,
 3.0,
 5.0,
 3.0,
 5.0,
 3.0,
 4.0,
 4.0,
 5.0,
 2.0,
 5.0,
 5.0,
 1.0,
 2.0,
 5.0,
 5.0,
 4.0,
 2.0,
 4.0,
 5.0,
 3.0,
 5.0,
 2.0,
 5.0,
 4.0,
 3.0,
 4.0,
 2.0,
 5.0,
 3.0,
 3.0,
 5.0,
 3.0,
 2.0,
 5.0,
 2.0,
 5.0,
 5.0,
 3.0,
 4.0,
 5.0,
 3.0,
 5.0,
 5.0,
 4.0,
 5.0,
 4.0,
 4.0,
 4.0,
 5.0,
 2.0,
 4.0,
 3.0,
 2.0,
 5.0,
 5.0,
 4.0,
 5.0,
 4.0,
 4.0,
 5.0,
 5.0,
 4.0,
 5.0,
 5.0,
 4.0,
 5.0,
 4.0,
 5.0,
 4.0,
 5.0,
 5.0,
 3.0,
 4.0,
 3.0,
 4.0,
 5.0,
 5.0,
 3.0,
 4.0,
 4.0,
 4.0,
 5.0,
 4.0,
 3.0,
 4.0,
 4.0,
 4.0,
 5.0,
 5.0,
 5.0,
 4.0,
 4.0,
 3.0,
 4.0,
 2.0,
 4.0,
 5.0,
 4.0,
 5.0,
 4.0,
 4.0,
 3.0,
 4.0,
 4.0,
 3.0,
 2.0,
 3.0,
 2.0,
 5.0,
 4.0,
 3.0,
 3.0,
 5.0,
 5.0,
 3.0,
 5.0,
 4.0,
 5.0,
 3.0,
 4.0,
 4.0,
 4.0,
 4.0,
 2.0,
 1.0,
 3.0,
 3.0,
 1.0,
 4.0,
 4.0,
 2.0,
 1.0,
 2.0,
 2.0,
 5.0,
 4.0,
 3.0,
 3.0,
 5.0,
 4.0,
 5.0,
 5.0,
 4.0,
 2.0,
 5.0,
 3.0,
 4.0,
 3.0,
 2.0,
 4.0,
 4.0,
 5.0,
 2.0,
 4.0,
 5.0,
 5.0,
 5.0

In [9]:
print("Average:", sum(scores) / len(scores))

Average: 3.8544474393530996


Let's count how often each review score was given:

In [10]:
counts = {}
for review in reviews_most:
    score = review['overall']
    counts[score] = counts.get(score, 0) + 1

for score, n in sorted(counts.items()):
    print(score, ":", n)
# Note: Check out the collections.Counter class to make this easier!

1.0 : 26
2.0 : 61
3.0 : 151
4.0 : 261
5.0 : 243


# Finding the most controversial product

I don't think there is a standard definition of controversiality. 
The product with the highest standard deviation will probably just give a product with one really good and one really bad review, which isn't really the most controversial.

What we can do is count the number of good (5 stars) and bad (1 or 2 stars) reviews, and then take the lowest of the two as the controversiality. So, a product wiht 25 negative and 70 positive reviews has a controversiality of 25, just like a product with 40 negative and 25 poitive reviews.

As above, the easiest way to do this is to create a 'controversiality' function that computes this score. 
First, we create a function to count the scores per review.
(Note that I create a counts dict first to ensure it always returns each possible score, even if it did not occur)

In [13]:
    counts = {score: 0 for score in range(1,6)}
counts

{1: 0, 2: 0, 3: 0, 4: 0, 5: 0}

In [14]:
def count_scores(reviews):
    counts = {score: 0 for score in range(1,6)}
    for review in reviews:
        score = review['overall']
        counts[score] +=  1
    return counts
print(count_scores(reviews_most))

{1: 26, 2: 61, 3: 151, 4: 261, 5: 243}


In [15]:
def controversiality(asin):
    reviews = p_reviews[asin]
    counts = count_scores(reviews)
    negative = counts[1] + counts[2]
    positive = counts[5]
    return min(negative, positive)

print(controversiality(most))

87


Now, we can sort the products by controversiality and select the most controversial

In [16]:
ids = sorted(p_reviews, key=controversiality, reverse=True)
controv = ids[0]
print("Most controversial:", controv, "(", products[controv]['title'],"): ", controversiality(controv))

Most controversial: B002IEVJRY ( illy issimo Coffee Drink, Cappuccino, 8.45-Ounce Cans (Pack of 12) ):  90


In [11]:
print(count_scores(p_reviews[controv]))

{1: 27, 2: 63, 3: 179, 4: 285, 5: 187}


In [12]:
negatives = [review for review in p_reviews[controv] if review['overall'] == 1.0]
positives = [review for review in p_reviews[controv] if review['overall'] == 5.0]
print("Negative example:", negatives[0]['reviewText'])
print("\nPositive example:", positives[0]['reviewText'])

Negative example: I love drinking coffee, Cappuccinos, and even iced coffee. So when I seen the illy issimo coffee drink I just had to get it. The day it arrived in the mail, I was eager to drink it. This had to be the worst coffe drink I had ever had. It tasted like coffee grounds with water, I couldn't even finish the whole drink. I personally wouldn't recommened this drink, but if you're a coffee drinker and like try try new things it doesn't hurt. You may actually enjoy this drink more than I have.

Positive example: I really enjoyed this coffee drink.  It has a dark, rich espresso flavor perfect for a hot summer's day, but what I really appreciated the most is that it's not too sweet like many coffee drinks out there. It has just enough sugar to make it tasty and refreshing.  The servings are small in a 6.8 fl oz can so it disappears quickly, but makes for the perfect iced drink and caffeinated pick me up.


# Finding reviews that mention a word

The positive example above used the words enjoy and perfect to describe the coffee. However, the negative example also used the word enjoy, but in a negative sense ("could have enjoyed more"). 

Let's see what the average review is for reviews that contain the words (actually, substrings) "perfect" and "enjoy":

In [13]:
perfect_reviews = [review for review in reviews if "perfect" in review['reviewText'].lower()]
enjoy_reviews = [review for review in reviews if "enjoy" in review['reviewText'].lower()]

print("Found", len(perfect_reviews), "'perfect' reviews and ", len(enjoy_reviews)," 'enjoy' reviews")

Found 10958 'perfect' reviews and  15545  'enjoy' reviews


In [14]:
perfect_avg = sum(review['overall'] for review in perfect_reviews) / len(perfect_reviews)
print("Average score for reviews containing 'perfect':", perfect_avg)
enjoy_avg = sum(review['overall'] for review in enjoy_reviews) / len(enjoy_reviews)
print("Average score for reviews containing 'enjoy':", enjoy_avg)


Average score for reviews containing 'perfect': 4.6179959846687355
Average score for reviews containing 'enjoy': 4.273271148279189


So, it seems that indeed 'perfect' is a more positive word than 'enjoy'

Note that this would have been a good place to use a small utility function:

In [15]:
def avg_score(reviews):
    return sum(review['overall'] for review in reviews) / len(reviews)

print("Average score for reviews containing 'enjoy':", avg_score(enjoy_reviews))
print("Average score for reviews containing 'perfect':", avg_score(perfect_reviews))

Average score for reviews containing 'enjoy': 4.273271148279189
Average score for reviews containing 'perfect': 4.6179959846687355


# Finding the most frequent words

As a final exercise, let's find the most frequent words in the most positive and negative reviews.
To do this, we first create a word count function that works on a list of texts:

In [16]:
import re
def count_words(texts):
    counts = {}
    for text in texts:
        text = text.lower()
        words = re.split("\W+", text)
        for word in words:
            counts[word] = counts.get(word, 0) + 1
    return counts
        
count_words(["this is a text", "and this is another text"])

{'a': 1, 'and': 1, 'another': 1, 'is': 2, 'text': 2, 'this': 2}

Now, we can use this function to get the most frequent words from the most positive and negative reviews

In [17]:
positive_words = count_words([review['reviewText'] for review in positives])
negative_words = count_words([review['reviewText'] for review in negatives])
positive_words.get('perfect'), negative_words.get('perfect')

(22, None)

So, perfect occurs 22 times in the positive reviews, and not at all in the negative reviews.
To create a list of most frequent words in the positive reviews, we sort the (word, count) *items* in the dictionary and then print the top 10.

To sort the dictionary by value, we create a get_value function first:

In [18]:
def get_value(item):
    return item[1] # 0=key, 1=value

positive_words_list = sorted(positive_words.items(), key=get_value, reverse=True)
for word, n in positive_words_list[:10]:
    print(word, n)

the 797
a 730
i 710
it 634
and 553
coffee 535
is 450
of 434
to 419
this 379


Not suprising, the top words are mostly stopwords. Let's get rid of them

In [19]:
stopwords = {"the", "a", "i", "it", "is", "of", "to", "this", "and", "in", "", "for", "s", "was", "t"}
positive_words_list = [(w, n) for (w, n) in positive_words_list if w not in stopwords]
for word, n in positive_words_list[:10]:
    print(repr(word), n)

'coffee' 535
'that' 245
'can' 239
'drink' 204
'but' 199
'you' 194
'not' 177
'with' 165
'like' 158
'taste' 136


In [20]:
negative_words_list = sorted(negative_words.items(), key=get_value, reverse=True)
negative_words_list = [(w, n) for (w, n) in negative_words_list if w not in stopwords]
for word, n in negative_words_list[:10]:
    print(repr(word), n)

'coffee' 76
'drink' 36
'not' 36
'but' 31
'that' 28
'my' 26
'like' 25
'with' 20
'illy' 19
'as' 18


So, the most frequent review words are actually not that different, and refer more to the topic of the review (coffee, drink) than to the tone or sentiment.  

Although this could be solved by comparing relative frequencies (similar to the first lab in COM5507). 
It also shows how difficult sentiment analysis can be: people don't just fill a negative review with negative words, they still try to describe the product and sometimes use nuanced language such as "could have enjoyed better".