<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Neural Nets for Sequential Data

-----
**OBJECTIVES**

- Explore Recurrent Architectures for sequential data
- Explore Convolutional Architectures for sequential data
- Use RNN's to model numeric time series data
- Use RNN's and CNN's to model text data
------

## The Recurrent Node

Compared to a convential neuron, our recurrent neuron will take in an output from the layer itself.  

<center>
   <img src = https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Recurrent_neural_network_unfold.svg/440px-Recurrent_neural_network_unfold.svg.png />
</center>



### The Network Architecture

```python
state_t = 0
for input_t in input_sequences:
    output_t = tanh(W@input_t + U@state_t + b)
    state_t = output_t
```



### A Basic Sequence of Stock Prices

To begin, let's bring in stock data from yahoo using the `pandas_datareader`. 

In [None]:
import pandas_datareader as pdr
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
#!pip install -U pandas_datareader

In [None]:
#get apple stock
apple = pdr.get_data_yahoo("AAPL")
#apple2 = pd.read_csv('appl.csv', index_col = 0)

In [None]:
#take a peek
apple.head()

In [None]:
# apple.to_csv('appl.csv')

In [None]:
#plot the adjusted close
apple['Adj Close'].plot()

In [None]:
#look at the percent change
apple['Adj Close'].pct_change().plot()

In [None]:
#determine X and y
apple_pchange = apple.pct_change().dropna()
X = apple_pchange[['High', 'Low', 'Open', 'Volume']]
y2 = apple_pchange['Adj Close']

In [None]:
#train test split: no shuffle!


 #make a classification problem

In [None]:
from sklearn.model_selection import train_test_split

### Scaling
------

Per usual we need to scale our data for the network.

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
#instantiate


In [None]:
#fit and transform


### `TimeSeriesGenerator`
-------
This creates stacks of sequenced data of which we specify the length of the sequences desired.

In [None]:
from keras.preprocessing.sequence import TimeseriesGenerator

In [None]:
#list(train_sequences)

In [None]:
#test sequences


### Model with `SimpleRNN`
--------

- 1 `SimpleRNN` layer
- 1 hidden `Dense` layer

In [None]:
from keras.models import Sequential
from keras.layers import SimpleRNN, LSTM, GRU, Dense

In [None]:
#build the network


In [None]:
#compilation


In [None]:
#fit it


In [None]:
#loss?


### The `LSTM` and `GRU` layers

In [None]:
#network with LSTM
model2 = Sequential()
model2.add(LSTM(16))
model2.add(Dense(50, activation = 'relu'))
model2.add(Dense(1, activation = 'sigmoid'))

In [None]:
#compile
model2.compile(loss = 'bce', metrics = ['accuracy'])

In [None]:
#fit


In [None]:
#examine the loss
plt.plot(history.history['loss'], label = 'Train')
plt.plot(history.history['val_loss'], label = 'Val')
plt.legend();

In [None]:
plt.plot(history.history['accuracy'], label = 'Train')
plt.plot(history.history['val_accuracy'], label = 'Val')
plt.legend();

In [None]:
#GRU layer
model3 = Sequential()
model3.add(GRU(16))
model3.add(Dense(50, activation = 'relu'))
model3.add(Dense(1, activation = 'sigmoid'))

In [None]:
#compile
model3.compile(loss = 'bce', metrics = ['accuracy'])

In [None]:
#train_sequences.targets

In [None]:
#fit


In [None]:
#examine the loss


In [None]:
#stacking layers
model4 = Sequential()
model4.add(GRU(32, return_sequences = True))
# model4.add(GRU(16, return_sequences = True))
# model4.add(GRU(16, return_sequences = True))
model4.add(GRU(16))
model4.add(Dense(100, activation = 'relu'))
model4.add(Dense(1, activation = 'sigmoid'))
model4.compile(loss = 'bce', metrics = ['accuracy'])
history = model4.fit(train_sequences, validation_data = test_sequences, epochs = 10)

### Practice

Use pandas datareader to access stock data for a ticker of interest.  Build and compare different sequential models using `GRU` layers.   

### Sequential Models for Text
-------

Now, we use the Keras `Tokenizer` to preprocess our spam data and feed it through different architectures of sequential network models.

In [None]:
import pandas as pd
import numpy as np

In [None]:
from keras.preprocessing.text import Tokenizer

In [None]:
spam = pd.read_csv('data/sms_spam.csv')

In [None]:
spam.head()

### `Tokenizer`
------
Here, we set the limit to the number of words at 500, then fit the texts, and finally transform our text to sequences of integer values with the `.texts_to_sequences`.  To assure the same length we use the `pad_sequences` function.  

In [None]:
#create a tokenizer and specify the vocabulary
tokenizer = Tokenizer(500)

In [None]:
#fit it on text
tokenizer.fit_on_texts(spam['text'])

In [None]:
#generate sequences
sequences = tokenizer.texts_to_sequences(spam['text'])

In [None]:
sequences[:3]

In [None]:
from keras.preprocessing.sequence import pad_sequences

In [None]:
#pad sequences to 100
X = pad_sequences(sequences, maxlen = 100)

In [None]:
#take a peek
X[0]

### Model
-------

In [None]:
from keras.layers import Embedding

In [None]:
#sequential model
text_model1 = Sequential()
#embedding layer
text_model1.add(Embedding(input_dim = tokenizer.num_words, output_dim = 64))
#simple RNN
text_model1.add(SimpleRNN(16))
#dense layer
text_model1.add(Dense(20, activation = 'relu'))
#output
text_model1.add(Dense(1, activation = 'sigmoid'))
#compilation
text_model1.compile(loss = 'bce', metrics = ['accuracy'])

In [None]:
#make y binary
y = np.where(spam['type'] == 'ham', 0, 1)

In [None]:
#baseline?


In [None]:
#fit it


### Improving the Model
-----

- `LSTM` layers
- `GRU` layers
- `recurrent_dropout`
- `dropout`
- `Bidirectional` layers

In [None]:
# model = Sequential()
# model.add(Embedding(input_dim = tokenizer.num_words, output_dim = 64))
# model.add(GRU(16))
# model.add(Dense(20, activation = 'relu'))
# model.add(Dense(1, activation = 'sigmoid'))
# model.compile(loss = 'bce', optimizer = 'adam', metrics = ['acc'])
# model.fit(X, y, epochs = 10)

In [None]:
# model = Sequential()
# model.add(Embedding(input_dim = tokenizer.num_words, output_dim = 64))
# model.add(GRU(16, recurrent_dropout = 0.2))
# model.add(Dense(20, activation = 'relu'))
# model.add(Dense(1, activation = 'sigmoid'))
# model.compile(loss = 'bce', optimizer = 'adam', metrics = ['acc'])
# model.fit(X, y)

In [None]:
# model = Sequential()
# model.add(Embedding(input_dim = tokenizer.num_words, output_dim = 64))
# model.add(GRU(16, dropout = 0.2, recurrent_dropout = 0.2))
# model.add(Dense(20, activation = 'relu'))
# model.add(Dense(1, activation = 'sigmoid'))
# model.compile(loss = 'bce', optimizer = 'adam', metrics = ['acc'])
# model.fit(x_seq, y)

In [None]:
from keras.layers import Bidirectional

In [None]:
# model = Sequential()
# model.add(Embedding(input_dim = tokenizer.num_words, output_dim = 64))
# model.add(Bidirectional(GRU(16)))
# model.add(Dense(20, activation = 'relu'))
# model.add(Dense(1, activation = 'sigmoid'))
# model.compile(loss = 'bce', optimizer = 'adam', metrics = ['acc'])
# model.fit(x_seq, y)

### Convolutional Networks in 1D
--------

In [None]:
from keras.layers import Conv1D, MaxPooling1D