# Classification accuracy, confusion matrix,  precision and recall




In [1]:
import graphlab
from __future__ import division
import numpy as np
graphlab.canvas.set_target('ipynb')

# Loading amazon review dataset

In [8]:
products = graphlab.SFrame('amazon_baby.gl/')

# modifying reviews and extracting sentiments

I am removing punctuation from the reviews, and removing reviews with neutral sentiment which has rating=3 and setting reviews as positive and negative.

In [9]:
def remove_punctuation(text):
    import string
    return text.translate(None, string.punctuation) 
review_clean = products['review'].apply(remove_punctuation) #removing punctuation

products['word_count'] = graphlab.text_analytics.count_words(review_clean)

products = products[products['rating'] != 3] # Droping neutral sentiments

products['sentiment'] = products['rating'].apply(lambda rating : +1 if rating > 3 else -1) # Positive sentiment as +1 and negative sentiment as -1

Now, let's see the look of a dataset.

In [10]:
products

name,review,rating,word_count,sentiment
Planetwise Wipe Pouch,it came early and was not disappointed. i love ...,5.0,"{'and': 3, 'love': 1, 'it': 3, 'highly': 1, ...",1
Annas Dream Full Quilt with 2 Shams ...,Very soft and comfortable and warmer than it ...,5.0,"{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...",1
Stop Pacifier Sucking without tears with ...,This is a product well worth the purchase. I ...,5.0,"{'and': 3, 'ingenious': 1, 'love': 2, 'what': 1, ...",1
Stop Pacifier Sucking without tears with ...,All of my kids have cried non-stop when I tried to ...,5.0,"{'and': 2, 'all': 2, 'help': 1, 'cried': 1, ...",1
Stop Pacifier Sucking without tears with ...,"When the Binky Fairy came to our house, we didn't ...",5.0,"{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...",1
A Tale of Baby's Days with Peter Rabbit ...,"Lovely book, it's bound tightly so you may no ...",4.0,"{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",Perfect for new parents. We were able to keep ...,5.0,"{'and': 2, 'all': 1, 'right': 1, 'had': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",A friend of mine pinned this product on Pinte ...,5.0,"{'and': 1, 'fantastic': 1, 'help': 1, 'give': 1, ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",This has been an easy way for my nanny to record ...,4.0,"{'all': 1, 'standarad': 1, 'another': 1, 'when': ...",1
"Baby Tracker&reg; - Daily Childcare Journal, ...",I love this journal and our nanny uses it ...,4.0,"{'all': 2, 'nannys': 1, 'just': 1, 'food': 1, ...",1


## Splitting data into training and test sets

I am spliting the data into a 80-20 split where 80% is in the training set and 20% is in the test set.

In [11]:
train_data, test_data = products.random_split(.8, seed=1)

## Train a logistic regression classifier

I am training a logistic regression classifier with **sentiment** as the target and **word_count** as the features. I am using inbuilt logistic classifier of the graphlab.

In [12]:
model = graphlab.logistic_classifier.create(train_data, target='sentiment',
                                            features=['word_count'],
                                            validation_set=None)

# Model Evaluation


## classification accuracy

$$
\mbox{classification accuracy} = \frac{\mbox{# correctly classified data points}}{\mbox{# total data points}}
$$

In [13]:
accuracy= model.evaluate(test_data, metric='accuracy')['accuracy']
print "Test Accuracy: %s" % accuracy

Test Accuracy: 0.914536837053


## Confusion Matrix

It is very important means of evaluation for any classification model. Lets build the confusion matrix for the above model. In the case of binary classification (classification with 2 labels) , the confusion matrix is a 2-by-2 matrix.
```
              +---------------------------------------------+
              |                Predicted label              |
              +----------------------+----------------------+
              |          (+1)        |         (-1)         |
+-------+-----+----------------------+----------------------+
| True  |(+1) | # of true positives  | # of false negatives |
| label +-----+----------------------+----------------------+
|       |(-1) | # of false positives | # of true negatives  |
+-------+-----+----------------------+----------------------+
```
To print out the confusion matrix for a classifier, we have to use `metric='confusion_matrix'`.

In [15]:
confusion_matrix = model.evaluate(test_data, metric='confusion_matrix')['confusion_matrix']
confusion_matrix

target_label,predicted_label,count
1,-1,1406
-1,-1,3798
-1,1,1443
1,1,26689


## Computing the cost of mistakes


This is something very important. The cost of true value classifying as false and cost of false value classifying as true, are different most of the times. Put yourself in the shoes of a manufacturer that sells a baby product on Amazon.com and you want to monitor your product's reviews in order to respond to complaints.  Even a few negative reviews may generate a lot of bad publicity about the product. So you don't want to miss any reviews with negative sentiment, you'd rather put up with false alarms about potentially negative reviews instead of missing negative reviews entirely. In other words, **false positives cost more than false negatives**.

Suppose you know the costs involved in each kind of mistake: 
1. \$100 for each false positive.
2. \$1 for each false negative.
3. Correctly classified reviews incur no cost.


# Cost of this logistic regression classifier model

The total cost of the model can be calculated by multiplying the respective cost of false positives by the number of false positives and adding it to the respective cost of false negative by the number of false negatives.

In [17]:
false_positive_test = confusion_matrix[(confusion_matrix['target_label'] == -1) & (confusion_matrix['predicted_label'] == +1)]['count'][0]
false_negative_test = confusion_matrix[(confusion_matrix['target_label'] == +1) & (confusion_matrix['predicted_label'] == -1)]['count'][0]

In [18]:
cost_test = (100 * false_positive_test) + (1 * false_negative_test)
print cost_test

145706


## Precision and Recall

We may not always have exact dollar amounts for each kind of mistake. Instead, you may simply prefer to reduce the percentage of false positives to be less than, say, 3.5% of all positive predictions. This is where **precision** comes in:

$$
[\text{precision}] = \frac{[\text{# positive data points with positive predicitions}]}{\text{[# all data points with positive predictions]}} = \frac{[\text{# true positives}]}{[\text{# true positives}] + [\text{# false positives}]}
$$

So to keep the percentage of false positives below 3.5% of positive predictions, we must raise the precision to 96.5% or higher. 

**First**, let us compute the precision of the logistic regression classifier on the **test_data**.

In [19]:
precision = model.evaluate(test_data, metric='precision')['precision']
print "Precision on test data: %s" % precision

Precision on test data: 0.948706099815


A complementary metric is **recall**, which measures the ratio between the number of true positives and that of positive reviews:

$$
[\text{recall}] = \frac{[\text{# positive data points with positive predicitions}]}{\text{[# all positive data points]}} = \frac{[\text{# true positives}]}{[\text{# true positives}] + [\text{# false negatives}]}
$$

Let us compute the recall on the **test_data**.

In [23]:
recall = model.evaluate(test_data, metric='recall')['recall']
print "Recall on test data: %s" % recall

Recall on test data: 0.949955508098
