# Sentiment analysis using recurrent neural networks
One of the things I find fascinating about applying machine learning algorithms to natural language processing tasks is how you get get a bunch of words that actually mean something, convert them to [a bunch of numbers](), and the computer is able to understand what those numbers represent in the vocabulary of the NLP task at hand.

In this notebook, I will use a [recurrent neural network]() to perform [sentiment analysis]() on a given dataset. The task at hand is as straightforward as NLP classification tasks can be. However, I will spend some time explaining the reasoning behind each data pre-processing step taken to turn the input data (words) into "a bunch of numbers" that ML/DL algorithms are familiar with.

### Recurrent neural networks
Recurrent neural networks (RNNs) are suitable for deep learning tasks that take in a sequence of data as input. The individual data in the sequence usually tell a complete story as a whole but may not offer meaningful information standalone. Also, the number of elements in the sequence of data do not need to be fixed for a given ML task as RNNs are very flexible in dealing with variable sizes of input.

Good examples of tasks suitable for RNNs are:
1. Predicting the price of a stock based on its prices in the last 30 days.
2. Predicting the next word in a sentence given preceeding words.
3. Speech to text translation

### The task at hand
Human beings routinely express their pleasure (or displeasure) about particular experiences. It could be whether a movie you watched was good (or not), if your commute to work was pleasant or stressful, if you enjoyed your lunch or if you had a tedious experience getting an insurance company to honour your claim. The question to answer here is "based on what this person said, do they feel good or feel bad about their experience?"

The goal of sentiment analysis in NLP is to enable a machine answer this question. For example, you want a machine to be able to look at the following sentence

> I really enjoyed playing FIFA today

and indicate that the user "felt good" (or "felt bad" or did not feel anything at all) about this.

<img src="https://vitalflux.com/wp-content/uploads/2021/10/sentiment-analysis-machine-learning-techniques.png" alt="Sentiment analysis" style="width: 500px;"/>


### Dataset
The [Twitter US airline sentiment](https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?resource=download) dataset will be used for this task.

In [1]:
import csv

csv_file = open("twitter-us-airline-sentiment.csv")
csvreader = csv.reader(csv_file)

headers = []
headers = next(csvreader)

sentiment_index = headers.index("airline_sentiment")
tweet_index = headers.index("text")

dataset = []
for row in csvreader:
    sentiment = row[sentiment_index]
    text = row[tweet_index]
    dataset.append((text, sentiment))

print("{} items in the dataset".format(len(dataset)))
dataset[0:20]

14640 items in the dataset


[('@VirginAmerica What @dhepburn said.', 'neutral'),
 ("@VirginAmerica plus you've added commercials to the experience... tacky.",
  'positive'),
 ("@VirginAmerica I didn't today... Must mean I need to take another trip!",
  'neutral'),
 ('@VirginAmerica it\'s really aggressive to blast obnoxious "entertainment" in your guests\' faces &amp; they have little recourse',
  'negative'),
 ("@VirginAmerica and it's a really big bad thing about it", 'negative'),
 ("@VirginAmerica seriously would pay $30 a flight for seats that didn't have this playing.\nit's really the only bad thing about flying VA",
  'negative'),
 ('@VirginAmerica yes, nearly every time I fly VX this “ear worm” won’t go away :)',
  'positive'),
 ('@VirginAmerica Really missed a prime opportunity for Men Without Hats parody, there. https://t.co/mWpG7grEZP',
  'neutral'),
 ("@virginamerica Well, I didn't…but NOW I DO! :-D", 'positive'),
 ("@VirginAmerica it was amazing, and arrived an hour early. You're too good to me.",
  '