# Sequence Models - RNNs

We have developed some fairly sophisticated NLP models, but none has made use of the *order* of words, with the partial exception of word embeddings.

What might our models therefore be missing out on?

The idea of an RNN is to _remember_ recent inputs during training. So the network will make use not just of inputs and weights but also of _internal states_, which capture information about previous time-steps.

Recurrent models process data *in sequence*, so they are useful for analyzing:

- time-series data
- text
- drawings

Here's a demo of a drawing-to-'photo' network: [Image-to-Image](https://affinelayer.com/pixsrv/)

In [1]:
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils import to_categorical
from sklearn.feature_extraction.text import TfidfVectorizer

Using TensorFlow backend.


## Reading in the Twitter data

In [28]:
import pandas as pd
tweets = pd.read_csv('Tweets.csv')
tweets.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [29]:
tweets.shape

(14640, 15)

In [193]:
# Let's look at the text of the first tweet.



In [194]:
# We'll use the translate method to eliminate the punctuation
# from our strings.



In [195]:
# Let's make sure that worked.



In [196]:
# The TF-IDF Vectorizer will look for words uncommon in the corpus
# as a whole.



In [197]:
# We'll transform the data, turn it into a dense matrix, and
# then into a DataFrame!



In [198]:
# How big is our DataFrame now?



In [201]:
# Examining our target



In [202]:
# Coding up our target



In [203]:
# Defining X and y



In [204]:
# Train-Test Splitting



In [205]:
# Categoricalizing our target



In [206]:
# Building a dense NN



In [207]:
# Compiling



In [208]:
# Recording the history of fit



## This time with an LSTM

Long short-term memory network: Architecturally speaking, the networks have *forget* gates in addition to input and output gates. These forget gates control the extent to which previous states affect future states.

Because of this capacity for memory, LSTMs (and RNNs in general) have been compared to [Turing  machines](https://medium.com/@Lordunlocked/turing-machines-what-are-they-and-why-you-should-care-aaf030c37d40), and there is some debates about whether it makes sense to think of RNNs as Turing-complete.

In [176]:
# LSTMs can take a long time to set up and to train, so
# I'm only going to use the first 1000 tweets.



In [209]:
# TF-IDF



In [210]:
# DataFrame



In [211]:
# X and y



In [212]:
# Split



In [213]:
# Categoricalize



In [214]:
# 3-d-ing for the LSTM



In [215]:
# Model build



In [216]:
# Compile




In [217]:
# Fitting our model

