# Part 1: Sentiment Analysis of Product Reviews using TextBlob

__John Cartlidge, Nov 2021, Internet Economics and Financial Technology (IEFT)__

In the first part of this week's activity, we look at some simple approaches to perform sentiment analysis of product reviews.  

This notebook requires the __TextBlob__ library: a non-adaptive noncommercial sentiment analyser that uses a classification model that was trained on a movie review corpus.

For information TextBlob see: https://textblob.readthedocs.io/en/dev/

Quick start guide: https://textblob.readthedocs.io/en/dev/quickstart.html

## Required install

This example requires `TextBlob` package for performing natural language processing and sentiment analysis. The code will not work without `TextBlob`. For installation details, see: 
> https://textblob.readthedocs.io/en/dev/install.html

<br>

For quick install using `pip`, issue the following commands on the command line:

> `pip install -U textblob`

> `python -m textblob.download_corpora`

Now we can import TextBlob functions. Also, let's import pandas. 

In [1]:
from textblob import TextBlob
import pandas as pd  # import pandas as pd

# Document, sentence, and aspect-level sentiment analysis

__Problem__: We want to analyse the sentiment of reviews for the latest MacBook laptop.

__Data__:

(1) We have access to online reviews. Here is an example:

> __MacBook Review:__ I feel the latest laptop from Mac is really good overall. It has amazing resolution. The computer is really very sleek and can slide into bags easily. However, I feel the weight is a let down. The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.

(2) We have access to the technical specification for MacBook (https://www.apple.com/uk/macbook-pro-13/specs/). So, we can use the specification headings to identify aspects of the product:

> __MacBook Aspects:__ Price, Display, Processor, Storage, Memory, Graphics, Charging, Expansion, Keyboard, Trackpad, Wireless, Camera, Video, Audio, Battery, Power, Size, Weight 



Let's start by creating a list of the aspects...

In [2]:
aspects = ['price', 'display', 'resolution', 'processor', 'storage', 'memory', 'graphics', 'charging', 
           'expansion', 'keyboard', 'trackpad', 'wireless', 'camera', 'video', 
           'audio', 'battery', 'power', 'size', 'weight']

Then, we create a TextBlob containing the review text...

In [3]:
review = TextBlob("I feel the latest laptop from Mac is really good overall. It has amazing resolution. The computer is really very sleek and can slide into bags easily. However, I feel the weight is a let down. The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.")

### __(1) Document-level sentiment__: 

For document-level analysis, we can simply request the sentiment for the entire review...

In [4]:
review.sentiment

Sentiment(polarity=0.27530864197530863, subjectivity=0.5691358024691358)

Sentiment returns a tuple containing:

> polarity (+1 = extremely positive, 0 = neutral, -1 = extremely negative)

> subjectivity (0 = objective, 1 = extremely subjective)

We can also request these values individually...

In [5]:
print("Document polarity: %.2f" % review.polarity)
print("Document subjectivity: %.2f" % review.subjectivity)

Document polarity: 0.28
Document subjectivity: 0.57


We see this review is subjective (57%) and positive (0.28).

> The high __subjectivity__ score shows that the review contains lots of opinions, e.g., "I feel", "I would expect", "seems really good". In contast, a low subjectivity (i.e., a high __objectivity__) score would occur when the document is more factual (e.g., "Price is XXX, Size is XXX.)

> The __positive__ sentiment score indicates that the opinions expressed in the document are positive overall; with phrases such as "really good", "amazing", and "very sleek" outweighing the negative sentiment expressed in phrases such as "let down" and "bit expensive".

__Question:__ How accurate do you think this is? Do you agree?

So, we could stop here. We have an overall idea about the sentiment expressed in the review. But can we learn any more? 

### __(2) Sentence-level sentiment__: 

Let's look in more detail at each sentence in the review to see what we can discover. 

Define the following method to create a dataframe, with each row containing one sentence in the review. We then add columns to show the polarity score and subjectivity of the sentence. Finally, we identify aspects that appear in the sentence.

In [6]:
'''
Perform sentence-level sentiment analysis of text document

review: TextBlob document text
aspects: List of aspects 

return: dataframe with columns {Sentence, Sentence Text, Polarity, Subjectivity, Aspects Mentioned}
'''
def get_sentence_sentiment_df(review, aspects=None):
    
    # Create a dataframe containing one row for each sentence in the review
    # and include the sentiment polarity and subjectivity of each sentence 
    df = pd.DataFrame([s, str(s), s.polarity, s.subjectivity] for s in review.sentences)
    
    # Name the four columns in the dataframe 
    df.columns = ['Sentence','Text','Polarity','Subjectivity']
    
    # Let's also find if any aspects are mentioned in each sentence, and list those that are...
    
    # Iterate through each sentence and each aspect; add aspects that appear in sentence to aspects list
    if aspects!=None:
        index = 0
        aspects_list = []
        for s in review.sentences:
            aspect = []
            index += 1
            if any(a in aspects for a in s.words):
                #print("An aspect exists in sentence: ", s)       
                #we want this to be in order of appearance of aspect in sentence, not alphabetically
                for w in s.words:
                    if w in aspects:
                        aspect.append(w)

                        
            aspects_list.append(aspect)
            
        # Add the list of aspects mentioned in sentence as a new column in the dataframe    
        df['Aspects'] = aspects_list
    
    return df

Now, we can use the method above to create a dataframe of sentence-level sentiment analysis, with aspects labelled. Let's print the dataframe to the screen, ordered by polarity (so that sentences with the most positive sentiment appear at the top)...

In [7]:
# Create dataframe of sentence-level sentiment
df = get_sentence_sentiment_df(review,aspects)

# Order dataframe by polarity score
df.sort_values(by='Polarity',ascending=False)

Unnamed: 0,Sentence,Text,Polarity,Subjectivity,Aspects
1,"(I, t, , h, a, s, , a, m, a, z, i, n, g, , ...",It has amazing resolution.,0.6,0.9,[resolution]
0,"(I, , f, e, e, l, , t, h, e, , l, a, t, e, ...",I feel the latest laptop from Mac is really go...,0.4,0.5,[]
2,"(T, h, e, , c, o, m, p, u, t, e, r, , i, s, ...",The computer is really very sleek and can slid...,0.316667,0.566667,[]
4,"(T, h, e, , p, r, i, c, e, , i, s, , a, , ...",The price is a bit expensive given the configu...,0.1,0.65,"[price, storage, processor]"
3,"(H, o, w, e, v, e, r, ,, , I, , f, e, e, l, ...","However, I feel the weight is a let down.",-0.155556,0.288889,[weight]


We see that there are five sentences. The sentence with the highest polarity (row id=1) is "It has amazing resolution". We also see that this sentence contains the aspect "resolution". So, we can see that the review has positive sentiment towards the resolution of the macbook.

The sentence with the lowest polarity (row id=3) is "However, I feel the weight is a let down". This sentence contains the aspect "weight". Therefore, the review contains negative sentiment towards the weight of the macbook.

By drilling down to the sentence level, we have been able to discover more detail about exactly what the author of the review likes and does not like about the macbook.

However, there is still more that we could learn. We can see that the sentence with row id=4 contains three aspects, "price", "storage", and "processor". Overall, the sentence is slightly positive (polarity=0.1) about these three aspects. Let's take a closer look at this sentence... 

In [8]:
df.loc[4]['Text']

'The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.'

We see this sentence describes __price__, __storage__, and __processor__. The overall sentence sentiment is 0.1 (slightly positive), but it is clear that the sentiment for processor is very positive, the sentiment for price is slightly negative, and the sentiment for storage is negative. We cannot determine these details by looking at the sentence as a whole, so can we automatically mine the sentiment for each of these aspects? Let's try...

### __(3) Aspect-level__

We saw that the sentence with row id=4 has three aspects.

Here, we will try to split the sentence so that we can ascertain the sentiment of each aspect contained within the sentence. The approach we will take is very simple and is __not__ the best way to do this; it is purely an example.

We will find the location of the aspect word within the sentence and then split the sentence half-way between each aspect word. This approach does not account for grammar (",", ";", etc.) or sentiment shifters ("but", "not", etc.) or any other parts of speech that could help us; but as a rough-and-ready first approach, let's see how it performs...

In [9]:
sentence = review.sentences[4]

### __Example code for aspect-level sentiment of sentences__

__(1) Split the sentence into n-grams of size n.__ Convert to TextBlob strings to calculate the sentiment of each n-gram. Let's set: __n=3__ as a default.


In [10]:
'''
Convert a sentence to n-grams of size n. Return polarity for each n-gram as list

return: ngram_polarities 
'''
def get_ngram_polarities(sentence,n=3):
    
    ngram_polarities = []
    index = 0
    for i in sentence.ngrams(n):
        string = ""
        for word in i:
            string += word + " "
        ngram_polarities.append(TextBlob(string).polarity)
        index += 1
    
    return ngram_polarities

Let's use the above method to split the sentence into n-grams of size 3 and get the sentiment of each n-gram...

In [11]:
#split sentence into n-grams of size n=3 and then calculate sentiment of each n-gram
print(sentence)
get_ngram_polarities(sentence,3)

The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.


[0.0,
 0.0,
 0.0,
 -0.5,
 -0.5,
 -0.5,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.7]

We see there is negative sentiment around a quarter of the way into the sentence, then positive sentiment near the end. The rest of the sentence has neutral sentiment.

__(2) Find the index location of each aspect in the sentence__, then find the ranges for each aspect (i.e., split the sentence at the mid-way point between each aspect location) 

In [12]:
'''
Split a sentence using a aspects contained in a list. 

Find the location of aspects in the sentence, then split at the mid-point between aspect locations.

return: list containing start and end location of each split. 
'''
def get_range_ends(sentence,n=3,verbose=False):

    index = 0
    aspect_index = []
    count = 0
    last_index = 0

    # hold the index of the mid-point between the index of each aspect location 
    range_end = [] 

    for i in range(len(sentence.words)):
        for a in aspects:
            if(a==sentence.words[i]):
                if verbose: print("Aspect '", a, "' found at index: ", i)
                aspect_index.append([a,i])
                if count > 0:
                    range_end.append(last_index + (i - last_index)/2)
                    #print("Range end: ", range_end)
                last_index = i
                count += 1

    # range end holds mid-point of indices between aspects. Round to integer value and add start and end.
    for i in range(len(range_end)):
        range_end[i] = round(range_end[i])
    range_end.insert(0,0)
    range_end.append(len(sentence.ngrams(n)))
    # print(range_end)
    return range_end

We can now use the above to find the locations to split the sentence, using aspects found in the sentence, as follows:

In [13]:
#get list of start and end locations to split sentence, where aspects are in the middle of each range.
print("Sentence: ",sentence)
print("Aspects: ", aspects)
print("Split sentence at locations:", get_range_ends(sentence,3,True))

Sentence:  The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.
Aspects:  ['price', 'display', 'resolution', 'processor', 'storage', 'memory', 'graphics', 'charging', 'expansion', 'keyboard', 'trackpad', 'wireless', 'camera', 'video', 'audio', 'battery', 'power', 'size', 'weight']
Aspect ' price ' found at index:  1
Aspect ' storage ' found at index:  14
Aspect ' processor ' found at index:  17
Split sentence at locations: [0, 8, 16, 19]


So, we will split the sentence into 3; each associated with an aspect:
> Price: "The __price__ is a bit expensive given the"

> Storage: "configurations (I would expect an SSD __storage__), however"

> Processor: "the __processor__ seems really good"

It's not perfect, but it's not a bad start.

__(3) Calculate polarity for each range.__ Simple hack: for all n-grams in the range, sum the maximum and minimum polarity...

In [14]:
'''
Calculate polarity for an aspect from range of polarities (sum the min and max values).

    polarities: array of polarity scores
    ranges: break points of ranges
    
    return: polarities array
'''
def get_aspect_polarity(polarities, range_end):
    
    pol = []
    for a in range(len(range_end)-1):
        min_p = 0
        max_p = 0
        for i in range(range_end[a],range_end[a+1]):
            p = polarities[i]       
            if p < min_p:
                min_p=p
            if(p>max_p):
                max_p=p
        
        pol.append(max_p+min_p)
    
    return pol 



So, let's now put it all together... 

In [15]:
# Put it all together. Get polarities for each aspect in a sentence, using n-gram of size n=3
print("Aspect polarities: ", get_aspect_polarity(get_ngram_polarities(sentence,3),get_range_ends(sentence,3)))

Aspect polarities:  [-0.5, 0, 0.7]


This gives us polarity of -0.5 for aspect 1 (price), 0 polarity for aspect 2 (storage), and +0.7 polarity for aspect 3 (processor).

Good. That's pretty much what we would expect. Now let's make a function to do this automatically for our full review, outputting results as a dataframe...

The following method will take a dataframe of reviews, where each row is a sentence, and iterate over each row. If there are multiple aspects in the sentence, split the sentence to get aspect-level sentiment; drop rows with no aspects; and then return a dataframe of aspect-level sentiment. 

In [16]:
'''
Get aspect level sentiment. 

Input: dataframe containing sentence-level sentiment analysis

Return: dataframe of aspect-level sentiment
'''
def get_aspect_level_sentiment(df):
    
    # Create a new dataframe
    df_aspect = pd.DataFrame()

    # iterating over rows in sentence-level sentiment dataframe using iterrows() function  
    for i, j in df.iterrows(): 

        if len(j['Aspects'])==1: # one aspect found in sentence, copy row
            rowdict = {'Aspect':j['Aspects'][0],'Polarity':j['Polarity'],'Text':j['Text'],'Sentence':j['Sentence']}
            df_aspect = df_aspect.append(rowdict, ignore_index=True)

        if len(j['Aspects'])>1: # multiple aspects found in sentence, split row

            # get array of aspect polarities for sentence, using n-gram.
            pols = get_aspect_polarity(get_ngram_polarities(j['Sentence'],3),get_range_ends(j['Sentence'],3))
            #print("Polarities: ", pols)
            
            for p in range(len(pols)):
                rowdict = {'Aspect':j['Aspects'][p],'Polarity':pols[p],'Text':j['Text'],'Sentence':j['Sentence']}
                df_aspect = df_aspect.append(rowdict, ignore_index=True)
        
    return df_aspect

We can now use the above method to get aspect-level sentiment for the full review, ordered by polarity score...

In [17]:
# Get aspect-level sentiment and order with highest polarity shown at the top
get_aspect_level_sentiment(df).sort_values(by='Polarity',ascending=False)

Unnamed: 0,Aspect,Polarity,Sentence,Text
4,processor,0.7,"(T, h, e, , p, r, i, c, e, , i, s, , a, , ...",The price is a bit expensive given the configu...
0,resolution,0.6,"(I, t, , h, a, s, , a, m, a, z, i, n, g, , ...",It has amazing resolution.
3,storage,0.0,"(T, h, e, , p, r, i, c, e, , i, s, , a, , ...",The price is a bit expensive given the configu...
1,weight,-0.155556,"(H, o, w, e, v, e, r, ,, , I, , f, e, e, l, ...","However, I feel the weight is a let down."
2,price,-0.5,"(T, h, e, , p, r, i, c, e, , i, s, , a, , ...",The price is a bit expensive given the configu...


So, we now have a sentiment score for each aspect of the review. Our aspect-level sentiment analysis is complete. 

Let's remind ourselves of the original review text:

> __MacBook Review__: I feel the latest laptop from Mac is really good overall. It has amazing resolution. The computer is really very sleek and can slide into bags easily. However, I feel the weight is a let down. The price is a bit expensive given the configurations (I would expect an SSD storage), however the processor seems really good.


We can see that aspect-level sentiment analysis shows that the review is:

> Very positive for __processor__ and __resolution__

> Neutral on __storage__

> Somewhat negative for __weight__

> Negative for __price__


__Question:__ How accurate does that seem to you? Is this approach useful?

## Putting it all together.

We can now perform document, sentence, and aspect level sentiment analysis with just a few python commands.

`review` : TextBlob of review text

`aspects` : list of aspect strings

`get_sentence_sentiment_df(review, aspects=None)` : dataframe of sentence-level sentiment

`get_aspect_level_sentiment(df)` : dataframe of aspect-level sentiment

Let's try it on some other reviews...


> __Review2:__ I love my MacBook. Great memory and processor speed. Heavier than I'd like, but you can't win them all. I'd only ever buy MacBooks if only the price wasn't so high!

In [18]:
txt = "I love my MacBook. Great memory and processor speed. Heavier than I'd like, but you can't win them all. I'd only ever buy MacBooks if only the price wasn't so high!"
review2 = TextBlob(txt)

print("Document polarity: %.2f" % review2.polarity)

print("Sentence polarity:")
df2 = get_sentence_sentiment_df(review2, aspects)
df2.sort_values(by='Polarity',ascending=False)

Document polarity: 0.38
Sentence polarity:


Unnamed: 0,Sentence,Text,Polarity,Subjectivity,Aspects
1,"(G, r, e, a, t, , m, e, m, o, r, y, , a, n, ...",Great memory and processor speed.,0.8,0.75,"[memory, processor]"
2,"(H, e, a, v, i, e, r, , t, h, a, n, , I, ', ...","Heavier than I'd like, but you can't win them ...",0.8,0.4,[]
0,"(I, , l, o, v, e, , m, y, , M, a, c, B, o, ...",I love my MacBook.,0.5,0.6,[]
3,"(I, ', d, , o, n, l, y, , e, v, e, r, , b, ...",I'd only ever buy MacBooks if only the price w...,0.066667,0.846667,[price]


In [19]:
print("Aspect polarity:")
get_aspect_level_sentiment(df2).sort_values(by='Polarity',ascending=False)

Aspect polarity:


Unnamed: 0,Aspect,Polarity,Sentence,Text
0,memory,0.8,"(G, r, e, a, t, , m, e, m, o, r, y, , a, n, ...",Great memory and processor speed.
2,price,0.066667,"(I, ', d, , o, n, l, y, , e, v, e, r, , b, ...",I'd only ever buy MacBooks if only the price w...
1,processor,0.0,"(G, r, e, a, t, , m, e, m, o, r, y, , a, n, ...",Great memory and processor speed.


Very positive on memory, which is correct; but neutral on processor, which is incorrect. That's because of the way we have naively split the sentences. The "and" means that the "great" should also apply to the processor.

> __Review 3:__ Macbook price is much too high for what they are. I love my macbook, but I can't afford the price. Having said that, the processor is fantastic and the keyboard is smooth and comfortable, although walking into work is a pain because it weighs too much. Overall it's a hit for me!

In [20]:
txt = "Macbook price is much too high for what they are. I love my macbook, but I can't afford the price. Having said that, the processor is fantastic and the keyboard is smooth and comfortable, although walking into work is a pain because it weighs too much. Overall it's a hit for me!"
r = TextBlob(txt)
print("Document polarity: %.2f" % r.polarity)
df3 = get_sentence_sentiment_df(r, aspects)
print("Sentence polarity:")
df3.sort_values(by='Polarity',ascending=False)

Document polarity: 0.29
Sentence polarity:


Unnamed: 0,Sentence,Text,Polarity,Subjectivity,Aspects
1,"(I, , l, o, v, e, , m, y, , m, a, c, b, o, ...","I love my macbook, but I can't afford the price.",0.5,0.6,[price]
2,"(H, a, v, i, n, g, , s, a, i, d, , t, h, a, ...","Having said that, the processor is fantastic a...",0.35,0.6,"[processor, keyboard]"
0,"(M, a, c, b, o, o, k, , p, r, i, c, e, , i, ...",Macbook price is much too high for what they are.,0.18,0.37,[price]
3,"(O, v, e, r, a, l, l, , i, t, ', s, , a, , ...",Overall it's a hit for me!,0.0,0.0,[]


In [21]:
asp = get_aspect_level_sentiment(df3)
print("Aspect polarity:")
asp.sort_values(by='Polarity',ascending=False)

Aspect polarity:


Unnamed: 0,Aspect,Polarity,Sentence,Text
1,price,0.5,"(I, , l, o, v, e, , m, y, , m, a, c, b, o, ...","I love my macbook, but I can't afford the price."
2,processor,0.4,"(H, a, v, i, n, g, , s, a, i, d, , t, h, a, ...","Having said that, the processor is fantastic a..."
3,keyboard,0.4,"(H, a, v, i, n, g, , s, a, i, d, , t, h, a, ...","Having said that, the processor is fantastic a..."
0,price,0.18,"(M, a, c, b, o, o, k, , p, r, i, c, e, , i, ...",Macbook price is much too high for what they are.


Positive sentiment for processor and keyboard, which is correct. But this naive approach has problems identifying the sentiment for price.

## Real Review

Let's try a real review for the MacBook. It is much longer than the example reviews we looked at before. 

> https://www.t3.com/reviews/new-macbook-review
    

In [22]:
real_review = "MacBook review: The most portable MacBook ever is far from the most practical. The new MacBook crams more interesting tech than you can shake a USB (Type-C) stick at into its skinny 12-inch frame. It's a dazzling device for sure, one that would set tongues wagging down at the pub in a heartbeat — but whether it would slot into your life without causing a fuss is an entirely different matter. OS X 10.11 El Capitan features: what's new and what matters The thing is, the new MacBook isn't a MacBook Air with a Retina display, which is what many people on the street were expecting from Apple. While it borrows the MacBook Air's tapered design, along with the MacBook Pro with Retina's high-resolution display and skinny black bezel, its shallow keyboard, Intel Core M processor and single USB Type-C port position it as an entirely new category of laptop. I'm not convinced that power-hungry MacBook owners will rush out to trade their machines for one, but if you're looking for a laptop that's as portable as an iPad and runs OS X, the new MacBook is a luxurious, albeit flawed option. A fashion item with a designer price tag to match, it starts at £1,049 for the entry-level model with 256GB of storage, rising to £1,299 for the more powerful 512GB configuration. New MacBook: Size and build Like Dell's Windows-powered XPS 13, the new MacBook slims down the display's bezel to accommodate more screen. Despite being 12 inches, its footprint is almost identical to the 11-inch MacBook Air, and at 2.03 pounds (versus the Air's 2.38 pounds) barely registers when slung into a backpack for transportation. Measuring just 30 x 19.2 x 1.7cm (W x D x H), the new MacBook is a whisker longer and wider than the 11-inch Air (28 x 19.7 x 1.31cm) while shaving off around half a millimetre in height. It's crafted out of a durable aluminium that stands up well to knocks and scrapes. New MacBook: Features One of Apple's more contentious decisions was to give the new MacBook just one USB Type-C port. Smaller than a standard USB port and reversible, USB Type-C has been hailed as the future because it provides power, a connection to an external display (through DisplayPort, HDMI or VGA) and a USB port. The bad news is that to simultaneously use two or more of the above you'll need to pick up a USB-C Multi Port adapter, which Apple sells for £65. Apple had to make a trade-off between portability and convenience, and if you're set on using wired peripherals or charging your smartphone on the move, sorry: coughing up for and carrying around an adapter is a necessary evil. Along with a switch to USB Type-C, Apple had to engineer an entirely new keyboard and trackpad to make the new MacBook catwalk-thin. The innovative Force Touch Trackpad, which enables a third 'Force' click by pressing down on the trackpad with a certain degree of pressure, can be used for anything from activating a Quick Look preview to annotating attachments or seeing a file's information. Where traditional keyboards use a scissor mechanism, which tends to wobble around the edges, the new MacBook's keyboard uses Apple's new butterfly hinge under each individually-backlit key. Made from a single, strong piece of material, it makes every keypress responsive wherever you hit it. While precise, it provides minimal travel and feels closer to typing on an on-screen keyboard than a traditional tactile one. If you're looking to buy the new MacBook to hammer out long documents on the regular, see if you can try its keyboard out first - or you may find yourself taking advantage of Apple's 14-day returns policy. The new MacBook's speakers provide surprisingly loud and clear sound at high volumes with punchy mid-range tones and, for their size, impressive low-end. New MacBook: Display The centrepiece of the new MacBook, its 2,304 x 1,440 pixel-resolution display is a sight to behold. At 226 pixels-per-inch (PPI), it goes toe-to-toe with the 13-inch MacBook Pro with Retina (227 PPI) - I'd go as far to say that the new MacBook's thinner black bezel makes it the more attractive of the two. It's an IPS variant, which means bold colours, deep blacks and excellent viewing angles; crank up the brightness and you can even read websites comfortably outdoors - although you'll have to put up with screen glare and reflections. By default, OS X sets the resolution to 1,280 x 800, which renders crisp text, smooth lines and sharp images but leaves little room on the desktop for apps and windows. Upping it to 1,440 x 900 strikes a better balance between usable space and image quality. New MacBook: Performance The new MacBook houses Intel's Core M processor, which brings a few advantages and disadvantages. On the plus side, it uses so little power that Apple didn't need to put a fan inside, lending to its thin-ness. No fan means no noise, and very little heat dissipation - even when you're pushing the machine to its limits. So, here's the not so good part. Core M, a chip designed for mobile devices and 2-in-1 hybrids, is a good deal slower than the Core-series processors found in today's MacBook Air and MacBook Pro with Retina models, and as such isn't well-suited to heavy computing tasks such as editing large image files, videos and 3D rendering. On the other hand, the new MacBook's 8GB of RAM and fast 128GB or 256GB SSD mean it's easily nippy enough for everyday tasks - such as surfing the web, editing small-medium image files, streaming video, writing documents and even some (very) light gaming thanks to its integrated HD 5300 graphics. New MacBook: Battery Rather than a traditional rectangular battery, Apple came up with a terraced, contoured battery design to better fit the new MacBook's slim dimensions. Rated at 39.7Whr, it fell just short of Apple's 9-hour batter life claim during T3's rigorous battery life test, clocking in at just over 7 hours when looping a 1080p video over Wi-Fi. That's still impressive when you consider that it's driving all of those pixels, but if you value battery life above all else then the 13-inch MacBook Air, which has the legs to go for up to 12 hours, is still the king. New MacBook: Verdict A genuinely unique laptop for those seeking a Retina display in the most compact and light chassis around, the fact remains that the new MacBook's many compromises mean it won't suit everyone. It has the ability to stun and confuse in equal measure, packing enough power to be used as your main machine, depending on what you do. But even then you'll have to put up with switching adapters when hooking up monitors and wired peripherals and re-train your fingers to adapt to the keyboard's lack of tactile feedback. If money is no object, and you're prepared to see past its deficiencies, Apple's gorgeous new MacBook glistens like gold (literally, if you opt for it in Gold, rather than Silver or Space Grey). For everyone else, it may be worth at least holding out for a successor with one more USB port to appear down the line. New MacBook: is this the way all laptops should be? Including just one USB Type-C port and tampering with the MacBook Air's near-perfect keyboard were daring, if unsurprising moves considering Apple's penchant for minimalism. Holding the new MacBook in the hand is like seeing into the future of laptop design; it's the MacBook Air all over again. Tablets are expected to sell poorly for a second consecutive year in 2015 as people look toward thin and light computers that let them get real work done, an area where the new MacBook excels. But - and it's a biggy - it feels like the new MacBook is ahead of its time and will prove one too many compromises for most people. Using an adapter is awkward compared to full-size USB ports, adds cost and detracts from the machine's slick design. It's likely that future laptop makers - including Apple - will at least do what Google did with the Chromebook Pixel 2 and include two USB Type-C ports to lessen the pain."
r = TextBlob(real_review)
df4 = get_sentence_sentiment_df(r, aspects)
asp = get_aspect_level_sentiment(df4)

In [23]:
#print("Review text: ", r)
print("\n=== Review text contains ", len(r.ngrams(n=1)), " words ===")


=== Review text contains  1417  words ===


In [24]:
print("Document polarity: %.2f" % r.polarity)
print("Sentence polarity:")
df4.sort_values(by='Polarity',ascending=False)

Document polarity: 0.13
Sentence polarity:


Unnamed: 0,Sentence,Text,Polarity,Subjectivity,Aspects
30,"(S, o, ,, , h, e, r, e, ', s, , t, h, e, , ...","So, here's the not so good part.",0.7,0.6,[]
35,"(T, h, a, t, ', s, , s, t, i, l, l, , i, m, ...",That's still impressive when you consider that...,0.5,0.55,[battery]
26,"(U, p, p, i, n, g, , i, t, , t, o, , 1, ,, ...","Upping it to 1,440 x 900 strikes a better bala...",0.5,0.5,[]
24,"(I, t, ', s, , a, n, , I, P, S, , v, a, r, ...","It's an IPS variant, which means bold colours,...",0.433333,0.716667,[]
6,"(A, , f, a, s, h, i, o, n, , i, t, e, m, , ...",A fashion item with a designer price tag to ma...,0.4,0.75,"[price, storage]"
1,"(T, h, e, , n, e, w, , M, a, c, B, o, o, k, ...",The new MacBook crams more interesting tech th...,0.378788,0.484848,[]
45,"(B, u, t, , -, , a, n, d, , i, t, ', s, , ...",But - and it's a biggy - it feels like the new...,0.378788,0.484848,[]
0,"(M, a, c, B, o, o, k, , r, e, v, i, e, w, :, ...",MacBook review: The most portable MacBook ever...,0.366667,0.666667,[]
21,"(T, h, e, , n, e, w, , M, a, c, B, o, o, k, ...",The new MacBook's speakers provide surprisingl...,0.316061,0.596313,[size]
23,"(A, t, , 2, 2, 6, , p, i, x, e, l, s, -, p, ...","At 226 pixels-per-inch (PPI), it goes toe-to-t...",0.273939,0.677576,[]


In [25]:
asp.sort_values(by='Polarity',ascending=False)

#filter out aspects with neutral (zero) polarity
filtered_df = pd.DataFrame()

# iterating over rows using iterrows() function  
for i, j in asp.iterrows(): 
    if(j['Polarity']!=0.0):
        filtered_df = filtered_df.append(j)
    
sorted_df = filtered_df.sort_values(by='Polarity',ascending=False)
print("Aspect polarity:")
sorted_df

Aspect polarity:


Unnamed: 0,Aspect,Polarity,Sentence,Text
27,battery,0.5,"(T, h, a, t, ', s, , s, t, i, l, l, , i, m, ...",That's still impressive when you consider that...
5,storage,0.5,"(A, , f, a, s, h, i, o, n, , i, t, e, m, , ...",A fashion item with a designer price tag to ma...
24,battery,0.5,"(N, e, w, , M, a, c, B, o, o, k, :, , B, a, ...",New MacBook: Battery Rather than a traditional...
16,size,0.316061,"(T, h, e, , n, e, w, , M, a, c, B, o, o, k, ...",The new MacBook's speakers provide surprisingl...
0,display,0.257576,"(O, S, , X, , 1, 0, ., 1, 1, , E, l, , C, ...",OS X 10.11 El Capitan features: what's new and...
28,display,0.24789,"(N, e, w, , M, a, c, B, o, o, k, :, , V, e, ...",New MacBook: Verdict A genuinely unique laptop...
21,video,0.2075,"(O, n, , t, h, e, , o, t, h, e, r, , h, a, ...","On the other hand, the new MacBook's 8GB of RA..."
22,graphics,0.2,"(O, n, , t, h, e, , o, t, h, e, r, , h, a, ...","On the other hand, the new MacBook's 8GB of RA..."
12,trackpad,0.178413,"(T, h, e, , i, n, n, o, v, a, t, i, v, e, , ...","The innovative Force Touch Trackpad, which ena..."
6,display,0.154293,"(N, e, w, , M, a, c, B, o, o, k, :, , S, i, ...",New MacBook: Size and build Like Dell's Window...


What do you think, is this a good summary of the sentiment of the review?

Hmm. Okay. So, it's not perfect, but it certainly has __some__ accuracy. And since we can automate this process, we can run it across hundreds, thousands, or millions of reviews. 

Using the structure of sentences (parts of speech) to better detect aspect-level sentiment will certainly help. Time to get coding!

## Algo trading on sentiment

Before we end this notebook, let's look at a couple of tweets that have moved the stock markets....

"__Hack Crash__", Associated Press Tweet, 23 April 2013: (see https://doi.org/10.1177/0263276415583139)

> Breaking: Two Explosions in the White House and Barack Obama is injured




__Muddy Waters__, 6 Aug. 2019 Tweet: (see https://www.burfordcapital.com/media/1712/20200401-burford-v-lse-claimants-skeleton-argument-for-trial.pdf)

> Muddy Waters is now in a blackout period until tomorrow 8 am London time when we will announce a new short position on an accounting fiasco that’s potentially insolvent and possibly facing a liquidity crunch. Investors are bulled up about this company. We’re not 

In [26]:
hack_crash_tweet="Breaking: Two Explosions in the White House and Barack Obama is injured"
print("Hack Crash: polarity %.2f" % TextBlob(hack_crash_tweet).polarity)
print("Hack Crash: polarity %.2f" % TextBlob(hack_crash_tweet).subjectivity)

Hack Crash: polarity 0.00
Hack Crash: polarity 0.00


Notice also that the Hack Crash Tweet has subjectivity = 0.0; i.e., it is an __objective statement of fact__. 

__Question__: is this a problem for us?

In [27]:
muddy_waters_tweet="Muddy Waters is now in a blackout period until tomorrow 8 am London time when we will announce a new short position on an accounting fiasco that’s potentially insolvent and possibly facing a liquidity crunch. Investors are bulled up about this company. We’re not"
print("Muddy Waters: polarity %.2f" % TextBlob(muddy_waters_tweet).polarity)
print("Muddy Waters: subjectivity %.2f" % TextBlob(muddy_waters_tweet).subjectivity)

Muddy Waters: polarity 0.03
Muddy Waters: subjectivity 0.69


So, we get __neutral sentiment__ for the Muddy Waters tweet. __Question__: Does this seem correct to you?

Remember, we mentioned at the top of this notebook that TextBlob is trained on movie reviews? So, I'm guessing "insolvent", "liquidity crunch", and "bulled" are missing from the training corpus.

This suggests that we have much more work to do to handle financial datasets, if we want our trading algos to be able to react to news and social media. 

I never said it was going to be easy! :-)