In [1]:
# In this exercise, you will do a sentiment analysis of text comments.

#     1) Load the data file DailyComments.csv from the Week 4 Data Files into a data frame.
#     2) Identify a scheme to categorize each comment as positive or negative. You can devise your own scheme or find a commonly used scheme to perform this sentiment analysis. However you decide to do this, make sure to explain the scheme you decide to use.
#     3) Implement your sentiment analysis with code and display the results. Note: DailyComments.csv is a purposely small file, so you will be able to clearly see why the results are what they are.
#     4) For up to 5% extra credit, find another set of comments, e.g., some tweets, and perform the same sentiment analysis.

In [2]:
# load all the necessary libraries 
from __future__ import division
from afinn import Afinn
import pandas as pd
import numpy as np

In [3]:
# Load the DailyComments.csv into the dataframe
Daily_Comments = pd.read_csv('DailyComments.csv')
Daily_Comments

Unnamed: 0,Day of Week,comments
0,Monday,"Hello, how are you?"
1,Tuesday,Today is a good day!
2,Wednesday,It's my birthday so it's a really special day!
3,Thursday,Today is neither a good day or a bad day!
4,Friday,I'm having a bad day.
5,Saturday,There' s nothing special happening today.
6,Sunday,Today is a SUPER good day!


In [4]:
# Identify a scheme to categorize each comment as positive or negative. 

For this exercise I will apply sentiment analysis to the DailyComments.csv file through an Unsupervised Learning (UL) technique based on the **AFINN** lexicon. Afinn has list of words with score values. Afinn preprocess the text(converting them to lower-case and removing any punctuation) before assigning values to the text.\
The afinn package is available only in English and Danish. So If our text is in a different language, then we will need to convert it to English and then use the afinn package. The afinn object contains a method, called score(), which receives a sentence as input and returns a score as output. The score may be either positive, negative or neutral. We calculate the score of any book, simply by summing all the scores of all the sentence of that book.

In [5]:
# Afinn word list url
afinn_url = ('https://raw.githubusercontent.com/fnielsen/afinn/master/afinn/data/AFINN-111.txt')

In [6]:
# Create a dataframe
afinn_df = pd.read_csv(afinn_url,
                       header=None,             # no column names
                       sep='\t',                # tab sepeated
                       names=['term', 'value']) #new column names

In [7]:
afinn_df

Unnamed: 0,term,value
0,abandon,-2
1,abandoned,-2
2,abandons,-2
3,abducted,-2
4,abduction,-2
...,...,...
2472,yucky,-2
2473,yummy,3
2474,zealot,-2
2475,zealots,-2


In [8]:
# Perform Sentiment analysis with AFINN on the Comments column in Daily_Comments
afinn = Afinn(emoticons=True)
afinn_scores = [afinn.score(text) for text in Daily_Comments.comments]
Daily_Comments['afinn'] = afinn_scores
Daily_Comments[['Day of Week', 'afinn', 'comments']]

Unnamed: 0,Day of Week,afinn,comments
0,Monday,0.0,"Hello, how are you?"
1,Tuesday,3.0,Today is a good day!
2,Wednesday,0.0,It's my birthday so it's a really special day!
3,Thursday,0.0,Today is neither a good day or a bad day!
4,Friday,-3.0,I'm having a bad day.
5,Saturday,0.0,There' s nothing special happening today.
6,Sunday,6.0,Today is a SUPER good day!


In [9]:
def sentiment_score(x):
    if x > 0.0:
        return "Positive"
    elif x == 0.0:
        return "Neutral"
    else:
        return "Negative"

In [10]:
sentiment = [sentiment_score(score) for score in Daily_Comments.afinn]
Daily_Comments['sentiment'] = sentiment

In [11]:
Daily_Comments[['Day of Week', 'afinn', 'comments', 'sentiment']]

Unnamed: 0,Day of Week,afinn,comments,sentiment
0,Monday,0.0,"Hello, how are you?",Neutral
1,Tuesday,3.0,Today is a good day!,Positive
2,Wednesday,0.0,It's my birthday so it's a really special day!,Neutral
3,Thursday,0.0,Today is neither a good day or a bad day!,Neutral
4,Friday,-3.0,I'm having a bad day.,Negative
5,Saturday,0.0,There' s nothing special happening today.,Neutral
6,Sunday,6.0,Today is a SUPER good day!,Positive


In [12]:
pd.crosstab(Daily_Comments.sentiment, Daily_Comments.afinn)

afinn,-3.0,0.0,3.0,6.0
sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Negative,1,0,0,0
Neutral,0,4,0,0
Positive,0,0,1,1


In [13]:
# For up to 5% extra credit, find another set of comments, e.g., some tweets, and perform the same sentiment analysis.

I will perform Sentiment Analysis using AFINN on the tweets about the airline industry. These tweets were scrapped from Twitter on 02/2015\
Kaggle link to the dataset: https://www.kaggle.com/crowdflower/twitter-airline-sentiment

In [14]:
df = pd.read_csv("tweets.csv")

In [15]:
tweets_df = df.sample(n=100)

In [16]:
afinn_scores_tweet = [afinn.score(x) for x in tweets_df.text]
tweets_df['afinn'] = afinn_scores_tweet
tweets_df[['text', 'afinn']]

Unnamed: 0,text,afinn
12960,"@AmericanAir thank you, truly appreciate the h...",6.0
1064,@united yes at 2am...but now back on a plane a...,1.0
12208,@AmericanAir When Flight Booking Problems an i...,-4.0
3728,@united thanks for not letting me switch fligh...,2.0
6633,@SouthwestAir flt 648 from Buf to MCO. Conf#FG...,0.0
...,...,...
4845,@SouthwestAir how long does it take for my Rap...,2.0
6121,@SouthwestAir why can't you take me to Knoxvil...,2.0
3332,@United my home for the next 8.5 hrs. 777 GFC...,3.0
6237,@SouthwestAir just added #passbook support to ...,2.0


In [17]:
sentiment_tweet = [sentiment_score(score) for score in tweets_df.afinn]

In [18]:
tweets_df['sentiment'] = sentiment_tweet

In [19]:
tweets_df[['text', 'afinn', 'sentiment']]

Unnamed: 0,text,afinn,sentiment
12960,"@AmericanAir thank you, truly appreciate the h...",6.0,Positive
1064,@united yes at 2am...but now back on a plane a...,1.0,Positive
12208,@AmericanAir When Flight Booking Problems an i...,-4.0,Negative
3728,@united thanks for not letting me switch fligh...,2.0,Positive
6633,@SouthwestAir flt 648 from Buf to MCO. Conf#FG...,0.0,Neutral
...,...,...,...
4845,@SouthwestAir how long does it take for my Rap...,2.0,Positive
6121,@SouthwestAir why can't you take me to Knoxvil...,2.0,Positive
3332,@United my home for the next 8.5 hrs. 777 GFC...,3.0,Positive
6237,@SouthwestAir just added #passbook support to ...,2.0,Positive
