# TextBlob: Sentiment Analysis &amp; Classifiers

## What is Sentiment Analysis?

Sentiment analysis is a method in NLP used to classify the emotion (or tone) and subjectiveness of human language. At the most common and basic level, the goal is to classify a text as positive, negative, or neutral in tone, and to determine how subjective it is. The aspect of subjectivity will only very briefly be noted in this workshop.

At a more complex level, sentiment analysis is a technique used to classify the specific emotions in human language, such as angry, happy, sad, excited, etc. So instead of simply learning/classifying three classes (positive, negative, neutral), the goal is to involve many specific classes.

## Why Use Sentiment Analysis?

The actual usefulness of sentiment analysis depends on the industry using it, but the most common reason to use it involve scraping lots of data (e.g. twitter feeds or reddit comments) to determine how customers/users feel about a particular brand, product, or service. 

There is also a use for sentiment analysis when analyzing financial securities (stock market): if a large proportion of people shift in sentiment about a particular market or stock, that is going to affect the price of securities involved.

## Specific Uses

* Insight into opinions on specific political policies
* Brand monitoring (how is a brand perceived?)
* Identify good and bad aspects of product or ads
* Impact of changes in sentiment on securities markets
* Will likely be used one day with virtual assistants and other AI
* Hotels can use it to know how they can improve their property and service

## Getting Started

In [3]:
# import what we need
import pandas as pd
from pandas import DataFrame as DF, Series

import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from textblob import TextBlob

In [5]:
# read data
cols = ['airline_sentiment','airline_sentiment_confidence',
        'airline','name','text']
data = pd.read_csv('data/tweets.csv', usecols=cols)

Below is the first 5 rows of our data. We will only be using the first two features, and the last feature.

In [6]:
data.head()

Unnamed: 0,airline_sentiment,airline_sentiment_confidence,airline,name,text
0,neutral,1.0,Virgin America,cairdin,@VirginAmerica What @dhepburn said.
1,positive,0.3486,Virgin America,jnardino,@VirginAmerica plus you've added commercials t...
2,neutral,0.6837,Virgin America,yvonnalynn,@VirginAmerica I didn't today... Must mean I n...
3,negative,1.0,Virgin America,jnardino,@VirginAmerica it's really aggressive to blast...
4,negative,1.0,Virgin America,jnardino,@VirginAmerica and it's a really big bad thing...


# Polarity & Subjectivity Using TextBlob `sentiment`

## Basic Sentiment Analysis

### Using the TextBlob `sentiment` method

TextBlob has a `sentiment` method that can be used on any `TextBlob` object. It returns two values:
* polarity: value in range [-1, 1], indicating how negative or positive the text is (close to 0.0 is neutral).
* subjectivity: value in range [0, 1], indicating how subjective the text is (1 is very subjective)

This method is very basic, and there is a lot to be desired, but it can still be helpful if you don't have opportunity to train a classifier, and just need some rough results.

In [8]:
lines = ["The food is on the table", "The food is green", "I don't like the food",
         "I do not like the food", "I like the food", "I don't love the food", "I do not love the food",
         "I hate the food", "I love the food", "The food is delicious"]

# analyze the sentences
sentiments = [b.sentiment for b in [TextBlob(l) for l in lines]]
for l,s in zip(lines, sentiments):
    print('{} \n(p={}, s={})'.format(l, s[0], s[1]), '\n')

The food is on the table 
(p=0.0, s=0.0) 

The food is green 
(p=-0.2, s=0.3) 

I don't like the food 
(p=0.0, s=0.0) 

I do not like the food 
(p=0.0, s=0.0) 

I like the food 
(p=0.0, s=0.0) 

I don't love the food 
(p=0.5, s=0.6) 

I do not love the food 
(p=-0.25, s=0.6) 

I hate the food 
(p=-0.8, s=0.9) 

I love the food 
(p=0.5, s=0.6) 

The food is delicious 
(p=1.0, s=1.0) 



As seen above, this method doesn't recognize negative contractions (e.g. don't), and it has trouble with ambiguous works that can take on multiple meanings (e.g. like, which is also used for comparision)

Let's see how it does with a couple book reviews.

## Using The `sentiment` Method on Tweets

We will get a subset of our data that contains only the first 10 rows that have a confidence level greater that 0.6. This is because we are uninterested in entries with a high level of uncertainty, because keeping low-confidence observations would reduce the certainty of evaluations that we will make later.

In [10]:
# get subset of tweets where confidence is > 0.6
subset = data[data.airline_sentiment_confidence > 0.6]\
    .head(10).copy().reset_index(drop=True)
tweets = subset.text

In [11]:
subset

Unnamed: 0,airline_sentiment,airline_sentiment_confidence,airline,name,text
0,neutral,1.0,Virgin America,cairdin,@VirginAmerica What @dhepburn said.
1,neutral,0.6837,Virgin America,yvonnalynn,@VirginAmerica I didn't today... Must mean I n...
2,negative,1.0,Virgin America,jnardino,@VirginAmerica it's really aggressive to blast...
3,negative,1.0,Virgin America,jnardino,@VirginAmerica and it's a really big bad thing...
4,negative,1.0,Virgin America,jnardino,@VirginAmerica seriously would pay $30 a fligh...
5,positive,0.6745,Virgin America,cjmcginnis,"@VirginAmerica yes, nearly every time I fly VX..."
6,neutral,0.634,Virgin America,pilot,@VirginAmerica Really missed a prime opportuni...
7,positive,0.6559,Virgin America,dhepburn,"@virginamerica Well, I didn't…but NOW I DO! :-D"
8,positive,1.0,Virgin America,YupitsTate,"@VirginAmerica it was amazing, and arrived an ..."
9,neutral,0.6769,Virgin America,idk_but_youtube,@VirginAmerica did you know that suicide is th...


### Compare the `sentiment` predictions with each line in `subset`

We want to get a sense of how each tweet is being classified

In [12]:
# print the tweets and predicted polarity line-by-line
for i,t in enumerate(tweets):
    s = TextBlob(t).sentiment
    target = subset.airline_sentiment[i]
    print(t, '\n', '{} (target: {}) \n'.format(s[0], target))

@VirginAmerica What @dhepburn said. 
 0.0 (target: neutral) 

@VirginAmerica I didn't today... Must mean I need to take another trip! 
 -0.390625 (target: neutral) 

@VirginAmerica it's really aggressive to blast obnoxious "entertainment" in your guests' faces &amp; they have little recourse 
 0.0062500000000000056 (target: negative) 

@VirginAmerica and it's a really big bad thing about it 
 -0.3499999999999999 (target: negative) 

@VirginAmerica seriously would pay $30 a flight for seats that didn't have this playing.
it's really the only bad thing about flying VA 
 -0.2083333333333333 (target: negative) 

@VirginAmerica yes, nearly every time I fly VX this “ear worm” won’t go away :) 
 0.4666666666666666 (target: positive) 

@VirginAmerica Really missed a prime opportunity for Men Without Hats parody, there. https://t.co/mWpG7grEZP 
 0.2 (target: neutral) 

@virginamerica Well, I didn't…but NOW I DO! :-D 
 1.0 (target: positive) 

@VirginAmerica it was amazing, and arrived an hour e

This basic sentiment analyzer missed the mark on 3/10 tweets (2 neutral and 1 negative). That's not too bad, but these results are nothing to celebrate. The perfmance declines quite a bit with larger texts.

Looking at the two tweets the `sentiment` method estimated incorrectly:

**@VirginAmerica I didn't today... Must mean I need to take another trip!**
This one is interpreted by the computer as negative, and perhaps it's correct. This one is full of ambiguity without any context, and that is probably why the target value in the set is neutral.

**@VirginAmerica it's really aggressive to blast obnoxious "entertainment" in your guests' faces &amp; they have little recourse**
This one is 

### Analyze polarity of each word in the last sentence above to see what's happening

In [13]:
words = TextBlob(tweets[2]).words
for w in words: print(w, TextBlob(w).sentiment[0], '\n')

VirginAmerica 0.0 

it 0.0 

's 0.0 

really 0.2 

aggressive 0.0 

to 0.0 

blast 0.0 

obnoxious 0.0 

entertainment 0.0 

in 0.0 

your 0.0 

guests 0.0 

faces 0.0 

amp 0.0 

they 0.0 

have 0.0 

little -0.1875 

recourse 0.0 



We can see that the `sentiment` method does not consider the words "obnoxious" or "aggressive" to be negative, which is a glaring problem for our analysis. This method is clearly limited and we need a better method.

# Naive Bayes Classifier for Sentiment Anlaysis

Here we will use a Naive Bayes Classifier (included with TextBlob) to create a better sentiment analyzer. We will only train on a small portion of our data since it takes a while to train. However, even with a small amount of training data we can get better results than the `sentiment` method.

There are other classifiers included with TextBlob, but this one is easy to use and gives good performance.

We will start with three goals
* learn to train and test/evaluate this classifier using a subset of our data
* compare the performance to the original sentiment method
* look at the features the classifier is extracting from the text

### Create train and test sets

* train the model on the first set
* test/evaluate it on the other

The set below named `reduced` is reduced in dimensionality (keeping only the features/columns we care about).

The `train` and `test` sets are created using something called a list comprehension. If you don't know what that is, it's okay, and you can look it up later. What is important is to know that the Naive Bayes classifier takes data in the form of a list of doubles, where each double is one observation (text, label), where label is the class label that belongs to the text.

In [17]:
# get reduced set
reduced = data[:, ['airline_sentiment','text']].copy()
reduced.rename(columns={'airline_sentiment': 'target'}, inplace=1)

# now create train and test sets for first 500 tweets
# for the TextBlob classifier we need a list of doubles (string, target)
train = [(s, t) for s,t in zip(reduced.iloc[:350].text, reduced.iloc[:350].target)]
test = [(s, t) for s,t in zip(reduced.iloc[350:500].text, reduced.iloc[350:500].target)]

InvalidIndexError: (slice(None, None, None), ['airline_sentiment', 'text'])

### Train and evaulate

In [15]:
# import the classifier
from textblob.classifiers import NaiveBayesClassifier

# train
cl = NaiveBayesClassifier(train)
# evaluate
cl.accuracy(test)

NameError: name 'train' is not defined

In [11]:
# a quick look at the distribution of class labels
reduced.target.value_counts()

negative    9178
neutral     3099
positive    2363
Name: target, dtype: int64

The classes in the test set are pretty much balanced, but the classes in the entire reduced set are not balanced.

Let's compare the 61% classifier accuracy to the performance of the `sentiment` method.

## When accuracy isn't good enough

**Need better scoring method for multi-class predictions**

Regular accuracy is simply the ratio of number of correct predictions to total number of predictions made. This pays no attention to how many classes there are, or how well each one is predicted.

**When it’s not good enough**
* there are more than two classes (in our case there are 3)
* there is an imbalance (at least one class with far fewer instances than another)

If there is a strong imbalance (and this does happen) where there are two classes one only happens 5% of the time, if all we do is predict everything to be the majority class, then we will automatically get 95% accuracy. That's meaningless in such a case.

**Precision and Recall are two useful metrics in these cases**

Precision = TP / (TP + FP) : how often predictions of a specific class are correct

TP : True Positive<br>
FP : False Positive

Recall = TP / (TP + FN) : how often specific classes are identified (not missed)

FN : False Negative

**Precision & Recall**

Precision = $\frac{TP}{TP + FP}$

Recall = $\frac{TP}{TP + FN}$

In [13]:
# create a score function that will give precision and recall values for each class
def score(true, predicted):
    eq = np.equal
    
    t = np.array(true)
    p = np.array(predicted)
    
    tp = np.array([eq((t == c)*(p == c), 1).sum() for c in np.unique(t)])
    fp = np.array([eq((t != c)*(p == c), 1).sum() for c in np.unique(t)])
    fn = np.array([eq((t == c)*(p != c), 1).sum() for c in np.unique(t)])

    precision = tp/(tp + fp)
    recall = tp/(tp + fn)
    
    return (np.unique(t), precision, recall)

### Evaluate classifier on larger set
\* **skip this; takes too long** \*

**With train/test split**

In [15]:
# create new train and test sets
# for the TextBlob classifier we need a list of doubles (string, target)

# train = [(s, t) for s,t in zip(reduced.iloc[:1500].text, reduced.iloc[:1500].target)]
# test = [(s, t) for s,t in zip(reduced.iloc[1500:2000].text, reduced.iloc[1500:2000].target)]

In [16]:
# train
# cl = NaiveBayesClassifier(train)

# evaluate
# cl.accuracy(test)
# 0.786

0.786

# Practice Problems

1. Create a pandas series of polarity values predicted for all entries in the reduced set using the sentiment method
2. Create a column in the reduced set with class labels mapped from the polarity values in (1.) using the following rules:
    - polarity  <   - 0.1 : ‘negative’
    - polarity  >  0.1 : ‘positive’
    - else : ‘neutral’
3. Compute the accuracy of the predicted labels from (2.) for the same range as the test set [350:500]
4. Update the score function to print a clean table of scores with (hint: use pandas)
    - rows for precision and recall
    - columns for class labels


# Naive Bayes Classifier: Digging Deeper

## Making Predictions

`NaiveBayesClassifier` has a `classify` method that takes text (a single string) as an argument. This means that we can either classify some string that we choose to type by hand, or classify tweets from our test set individually.

In [24]:
cl.classify('I love this airline')

'positive'

### Getting class probabilities

In [26]:
probs = cl.prob_classify('I love this airline')
probs.max()

'positive'

In [27]:
probs.prob('positive')

0.8788493380472053

In [28]:
probs.prob('negative')

0.01575421132591375

The above can be useful if you want to make modifications to how something is classified by setting a threshold. For example, you may want to only classify something as positive if the probability exceeds 0.9, instead of it simply having the highest probability.

## Informative Features

The method below gives us some insight into how the classifier is making decisions. For example, we can see that if a string contains the word "great", the there is are 8.7:1 odds that the string is positive instead of negative. All of the features are taken into account for one string, so that doesn't mean just because "great" is in the string it will be classified as positive.

In [17]:
cl.show_informative_features(10)

Most Informative Features
            contains(no) = True           negati : neutra =      9.7 : 1.0
         contains(great) = True           positi : negati =      9.7 : 1.0
        contains(Thanks) = True           positi : negati =      8.7 : 1.0
          contains(love) = True           positi : negati =      8.7 : 1.0
        contains(thanks) = True           positi : negati =      6.9 : 1.0
          contains(site) = True           negati : positi =      6.5 : 1.0
           contains(not) = True           negati : positi =      6.0 : 1.0
       contains(amazing) = True           positi : negati =      6.0 : 1.0
         contains(Thank) = True           positi : negati =      6.0 : 1.0
       contains(website) = True           negati : neutra =      5.5 : 1.0


**How to interpret this:**
* We are given rows that have `contains(feature) = True/False` and a comparison of two class labels with a ratio that indicates how likely one is over the other 
* The printed results are in descending order of importance
* Ex: `contains(no) = True` gives the ratio of 9.7 : 1.0, showing that it is extremely likely to be negative rather than neutral
* The default features for the Naive Bayes classifier are individual words found in the data

## Extracting Features

We are provided a method that serves one purpose: take a string and return a dictionary of all features in our classifier (individual words by default), and whether or not that word is in the string. It is essentially a binary feature vector.

In [22]:
cl.extract_features('I have no idea where this flight is taking me')

{'contains(while)': False,
 'contains(schedule)': False,
 'contains(week)': False,
 'contains(hard)': False,
 'contains(sorry)': False,
 'contains(t.co/zSuZTNAIJq)': False,
 'contains(views)': False,
 'contains(add)': False,
 'contains(issue)': False,
 'contains(quick)': False,
 'contains(Andrews)': False,
 'contains(Follow)': False,
 'contains(enter)': False,
 'contains(Many)': False,
 'contains(t.co/UT5GrRwAaA)': False,
 'contains(Holla)': False,
 'contains(Same)': False,
 'contains(cake)': False,
 'contains(t.co/gLXFwP6nQH)': False,
 'contains(NewsVP)': False,
 'contains(24hrs)': False,
 'contains(reimburse)': False,
 'contains(makes)': False,
 'contains(back-end)': False,
 'contains(PrincessHalf)': False,
 'contains(pros)': False,
 'contains(if)': False,
 'contains(wish)': False,
 'contains(t.co/XZ6qeG3nef)': False,
 'contains(bked)': False,
 'contains(account)': False,
 'contains(Lister)': False,
 'contains(keeps)': False,
 'contains(brand)': False,
 'contains(jump)': False,
 'con

## Classifying From Within a TextBlob

We can perform classification on the contents of a TextBlob object using an existing classifier (like the one we created earlier (named cl). The usefulness of this might seem questionable, since you can just pass a normal string to the classifier. However, something you will be doing other work with some text in the form of a blob, and then when you need to perform classification, you don't have to go back and get the raw string.

Using a clssifier in a `TextBlob` is as easy as passing the classifier as an argument when you create the blob.

**Note:** The classifier must be one that you have already trained.

Let's look at a couple examples:

In [18]:
b = TextBlob('I loved the flight', classifier=cl)
b.classify()

'positive'

In [19]:
b = TextBlob('I hated the flight', classifier=cl)
b.classify()

'neutral'

Our classifier probably didn't encounter the word "hate" or "hated". We can update our model to improve classification.

## Update Existing Classifiers With New Data

Our classifier obviously failed us when we tried to classify the string "I hate this flight."
We have the option of easily updating our classifier with new data, so let's do that now.

In [20]:
# new data is also a list of tuples
# be sure the class labels are correct
updates = [('I hated flying', 'negative'), ('I hate flying', 'negative'),
           ('I hate this airline', 'negative'), ('I hated the seats', 'negative')]
cl.update(updates)  # this is unfortunately slow

True

You can ignore the output `True`

**Note:** If you get the error `too many values to unpack (expected 2)`, try re-running the cell where we created the train/test sets and create/train the classifier from scratch.

Now that we have updated our classifier with new data, let's see how our original sentence is classified.

In [21]:
# let's see how it does now using 'I hated the flight'
b = TextBlob('I hated the flight', classifier=cl) # update
b.classify()

'negative'

An now we have the correct classification of `'negative'`

If you do not get the correct class, try running the update cell once more.

## Other Classifiers

TextBlob has a number of built in classifiers, all of which can be found in the documentation at the link below.

http://textblob.readthedocs.io/en/dev/api_reference.html#api-classifiers

# Pratice Problems

1. Train a decision tree classifier on the first 350 tweets in the reduced set (the training set from earlier) — call it something other than cl — and print/examine the tree structure using pseudocode method (hint: wrap in print)
2. Compute the accuracy on the test set [350:500] and compare to the Naive Bayes accuracy
3. Compare the precision and recall scores for the two classifiers. Does the decision tree perform better on any of the classes? (hint: remember that these classify one item at a time)
4. Create a new “balanced” training set of 50 observations from each class and update the current Naive Bayes (cl)
5. Score the updated classifier. Have the scores improved? How about accuracy?
