# Twitter Sentiment Analysis

This is an attempt to create a model that can determine the sentiment behind a Tweet posted on a specific topic.  
In this notebook, I will be using a dataset that contains a load of Tweets taken from the Twitter API. 
The dataset is available [here](https://www.kaggle.com/kazanova/sentiment140).

Citation for the data: Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.

The dataset being used is the sentiment140 dataset. It contains 1,600,000 tweets extracted using the Twitter API. The tweets have been annotated (0 = Negative, 4 = Positive) and they can be used to detect sentiment.

[The training data isn't perfectly categorised as it has been created by tagging the text according to the emoji present. So, any model built using this dataset may have lower than expected accuracy, since the dataset isn't perfectly categorised.]

It contains the following 6 fields:

    - sentiment: the polarity of the tweet (0 = negative, 4 = positive)
    - ids: The id of the tweet (2087)
    - date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
    - flag: The query (lyx). If there is no query, then this value is NO_QUERY.
    - user: the user that tweeted (robotickilldozr)
    - text: the text of the tweet (Lyx is cool)


## Importing Necessary Packages

In [2]:
# Uitilities
import numpy as np
import pandas as pd



## Loading the dataset

In [5]:
colNames = ['Sentiment', 'id', 'date', 'flag', 'user', 'text']

df = pd.read_csv('data/training.1600000.processed.noemoticon.csv', encoding = "ISO-8859-1", names = colNames)
df.head()

Unnamed: 0,Sentiment,id,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."


Since Only *Sentiment* and *text* is required for sentiment analysis, other columns will be dropped.

In [11]:
df = df[['Sentiment', 'text']]
df.head()

Unnamed: 0,Sentiment,text
0,0,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,is upset that he can't update his Facebook by ...
2,0,@Kenichan I dived many times for the ball. Man...
3,0,my whole body feels itchy and like its on fire
4,0,"@nationwideclass no, it's not behaving at all...."


In [15]:
df.loc[df['Sentiment'] == 0]

Unnamed: 0,Sentiment,text
0,0,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,is upset that he can't update his Facebook by ...
2,0,@Kenichan I dived many times for the ball. Man...
3,0,my whole body feels itchy and like its on fire
4,0,"@nationwideclass no, it's not behaving at all...."
5,0,@Kwesidei not the whole crew
6,0,Need a hug
7,0,@LOLTrish hey long time no see! Yes.. Rains a...
8,0,@Tatiana_K nope they didn't have it
9,0,@twittera que me muera ?
