# Recurrent Neural Network

Recurrent neural networks (RNN) have autoregressive layers. They are suitable for natural language processing (NLP) and time series modelling, though for the former they have been supplanted by Transformer-based models in recent years.

Several things to note when using RNN:
1. **Samples must have the same number of features**. 
   Truncate or pad each sample as needed.
2. **RNN is slow to train**. 
   Start with a small subsample of data
   move to the full data only after you verify that your model is working.


## Text Data

We will use the IMDB to illustrate how to use a RNN for natural language processing. 

### A. Load Data

We first load the IMDB data then process it. The two most important processing we will apply are:
1. *How many unique words to keep?* Words that are too infrequent should be ignored because there will not be enough data to figure out their meaning. All such words will be converted to a special out-of-vocabulary character.
2. *How many features?* In the context of NLP, this translates to how many words each sample are allowed to have. Longer sequences are truncated while shorter ones are padded with a special character, usually `0`.

We will also take a random subsample of data to speed up training in class.

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from sklearn.utils import resample

# Load data
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=20000)

# Resample
x_train,y_train,x_test,y_test = resample(x_train,y_train,
                                         x_test,y_test,
                                         n_samples=1000)

# Data processing


print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

### B. Model

Now we build our model. The model has the following structure:

1. Input
2. Embedding layer
3. Recurrent layers
4. Fully-connected layers
5. Output

An embedding layer translates each word into a vector, allowing much richer representation of the meaning of each word than just a single number. The initial translation is random, but the layer will learn through back propagation just like any other layer in the model.

#### Standard RNN

First, let us try standard RNN using Keras' `SimpleRNN` class:

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,Dense,Embedding
from tensorflow.keras.layers import SimpleRNN



#### Long-Short Term Memory (LSTM)

Next we will try out the LSTM layer. We simply need to change `SimpleRNN` to `LSTM`:

In [None]:
from tensorflow.keras.layers import LSTM


#### Bidirectional RNN

Finally, let us try out bi-directional LSTM. This can be done by enclosing the LSTM layer with `Bidirectional()`. 

Note that this will *double* the number of neurons in the targeted layer. Not only does this make training slower, it is also not a fair comparison with above because the number of parameters have increased by a lot. You should cut the number of neurons if you want a fair comparison.

In [None]:
from tensorflow.keras.layers import Bidirectional


### C. Hyperparameters Tuning

Hyperparameter tuning is necessary in order to get good performance.
The major hyperparameters you need to consider are:
- Size of the embedding (i.e. How long a vector do you need to represent each word?)
- Number of recurrent neurons
- Number of recurrent layers
- Number of fully-connected neurons
- Number of fully-connected layers
- Dropout rate
- Optimzer
- Number of epochs

Putting everything together:

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from sklearn.utils import resample
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,Dense,Embedding
from tensorflow.keras.layers import LSTM, Bidirectional

# Settings
max_features = 20000 # How many words to keep?
maxlen = 80  # cut texts after this number of words
n_samples = 1000 # Running with full data takes a lot of time
batch_size = 32

# Load data
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
if (n_samples != None):
    x_train,y_train,x_test,y_test = resample(x_train,y_train,
                                             x_test,y_test,
                                             n_samples=n_samples)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

# Data processing
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

# Model
print('Build model...')
inputs = Input(shape=(maxlen,))
x = Embedding(max_features, 128)(inputs)
x = Bidirectional(LSTM(128, dropout=0.2))(x)
x = Dense(128)(x)
output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=output)
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print(model.summary())

# Training
print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

## Time Series Data

In this part will use RNN to predict stock index. 

First, load our data:

In [None]:
import numpy as np
import pandas as pd

# Import data
hsi = pd.read_csv("../Data/hsi.csv")
hsi = hsi.dropna()
hsi

Next, we need to process our data. For each target $y_t$, the corresponding features are 
$x_t = \left[ y_{t-1},y_{t-2},...,y_{t-n} \right]$.
We can generate this by using pandas' `df.shift()` or Keras' `timeseries_dataset_from_array()`.

In [None]:
from tensorflow.keras.preprocessing import timeseries_dataset_from_array

data = hsi["Close"]


Finally, our model. Two main difference when compared with NLP modelling:
1. No embedding layer. It is usually impossible to put one in because time series data is often continuously distributed.
2. No bidirectional layer. A bidirectional layer has access to both the past and the future, the latter we have no access to when it comes to actual inferencing.

In [None]:
from tensorflow.keras.layers import Input, Dense, LSTM
from tensorflow.keras.models import Model
from tensorflow.keras.layers import LSTM

