## Recurrent Neural Networks
- Feedforward neural networks (e.g. MLPs and CNNs) are powerful, but they are not optimized to handle "sequential" data
- In other words, they do not possess "memory" of previous inputs
- For instance, consider the case of translating a corpus. You need to consider the **"context"** to guess the next word to come forward

<br>
- RNNs are suitable for dealing with sequential format data since they have **"recurrent"** structure
- To put it differently, they keep the **"memory"** of earlier inputs in the sequence
</br>
<img src="http://www.wildml.com/wp-content/uploads/2015/09/rnn.jpg" style="width: 600px"/>

<br>
- However, in order to reduce the number of parameters, every layer of different time steps shares same parameters
</br>

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" style="width: 600px"/>

# Understanding RNN structure
- Distinguished from feedforward nets, RNNs are structures that can well handle data with "sequential" format by preserving previous "state" 
- Thus, grasping concepts of **"sequences"** and (hidden) **"states"** in RNNs is crucial

In [1]:
import numpy as np
from keras.models import Model, Sequential
from keras.layers import *

Using TensorFlow backend.


## 1. SimpleRNN 

Input shape of SimpleRNN should be 3D tensor => (batch_size, timesteps, input_dim)
- **batch_size**: ommitted when creating RNN instance (== None). Usually designated when fitting model.
- **timesteps**: number of input sequence per batch
- **input_dim**: dimensionality of input sequence

In [4]:
# for instance, consider below array
x = np.array([[
             [1,    # => input_dim 1
              2,    # => input_dim 2 
              3],   # => input_dim 3     # => timestep 1                            
             [4, 5, 6]                   # => timestep 2
             ],                                  # => batch 1
             [[7, 8, 9], [10, 11, 12]],          # => batch 2
             [[13, 14, 15], [16, 17, 18]]        # => batch 3
             ])

In [5]:
print('(Batch size, timesteps, input_dim) = ',x.shape)

(Batch size, timesteps, input_dim) =  (3, 2, 3)


In [41]:
x = np.random.normal(0,1,(100,5))
y = 3*x

In [315]:
def const_init(value):
    def kk(shape):
        print(shape)
        return value*np.ones(shape)
    return kk

def random_init(shape):
    print(shape)
    return np.random.normal(0,1,shape)

x = Input(shape = (2, 1))
# x1 = SimpleRNN(3,activation='linear',
#                kernel_initializer=k_init(3),
#                recurrent_initializer=k_init(2),
#                bias_initializer=k_init(0),
#                return_sequences=1)(x)
x1 = SimpleRNN(3,activation='tanh',
               kernel_initializer=random_init,
               recurrent_initializer=random_init,
               bias_initializer=random_init,
               return_sequences=1)(x)
# rnn = GRU(4)(x)
# rnn = LSTM(4)(x)

model = Model(inputs=x,outputs=x1)
model.summary()
model.compile(optimizer='adam',loss='mse')

(1, 3)
(3, 3)
(3,)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_97 (InputLayer)        (None, 2, 1)              0         
_________________________________________________________________
simple_rnn_101 (SimpleRNN)   (None, 2, 3)              15        
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________


In [316]:
from keras.backend import int_shape
sh = int_shape(x)[1:]
bsh = int_shape(x1)[-1]

In [317]:
X = np.random.normal(0,1,sh)
x_in = np.expand_dims(X,0)
model.predict(x_in)

array([[[-0.75832   ,  0.08595122, -0.5028327 ],
        [-0.99885464,  0.6005186 , -0.7647044 ]]], dtype=float32)

# $$y=\sigma(X.W_{xh}+W_{hh}h_{t-1}+b)$$

In [318]:
whx,whh,b = model.layers[1].get_weights()
h = np.zeros(bsh)
for i in range(sh[0]):
#     print(X[i].shape,whx.shape,whh.shape,h.shape)
    hn = np.tanh(np.dot(X[i],whx)+np.dot(whh.T,h)+b)
    h = hn
    print(h)
# pred = hn
# pred

[-0.75831992  0.08595121 -0.50283269]
[-0.99885467  0.6005185  -0.76470438]


![](https://miro.medium.com/max/332/1*28XR1ajfW1WuTOkjpOc9xA.png)

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 500px"/>

<center> Standard RNN </center>

### trainable parameters: [h(h+i) + h]

In [83]:
# input shape (batch_size, timesteps, input_dim)
rnn = SimpleRNN(50)(Input(shape = (10, 30)))

**return_sequences** = **False** ====> output_shape = **(batch_size = None, num_units)**

In [84]:
rnn = SimpleRNN(50)(Input(shape = (10, 30)))
print(rnn.shape)

(?, 50)


**return_sequences = True** ====> output_shape = **(batch_size, timesteps, num_units)**

In [85]:
rnn = SimpleRNN(50, return_sequences = True)(Input(shape = (10, 30)))
print(rnn.shape)

(?, ?, 50)


## 2. GRU
- GRU, Popular variant of LSTM, does not have cell state
- Hence, it has only hidden state, as simple RNN

![](https://miro.medium.com/max/862/1*GSZ0ZQZPvcWmTVatAeOiIw.png)
![](https://miro.medium.com/max/602/1*1HJUlwKMWmAkHhUkwy9g3g.png)

In [327]:
gru = GRU(50)(Input(shape = (10, 30)))
print(gru.shape)

(?, 50)


In [329]:
output = GRU(50, return_sequences = True)(Input(shape = (10, 30)))
print(output.shape)

(?, ?, 50)


## 3. LSTM
- Outputs of LSTM are quite similar to those of RNNs, but there exist subtle differences
- If you compare two diagrams below, there is one more type of "state" that is preserved to next module

![](https://miro.medium.com/max/1003/1*ZX2mVCwMIOhftEaf4FTOYQ.png)

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 500px"/>

<center> LSTM </center>

![](https://miro.medium.com/max/770/1*6vw1g-HNuOgRYPj-IGhddQ.png)

In addition to "hidden state (ht)" in RNN, there exist "cell state (Ct)" in LSTM structure

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png" style="width: 500px"/>

<center> Hidden State </center>

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png" style="width: 500px"/>

<center> Cell State </center>

In [319]:
lstm = LSTM(50)(Input(shape = (10, 30)))

In [320]:
print(lstm.shape)

(?, 50)


In [326]:
lstm = LSTM(50, return_sequences = True)(Input(shape = (10, 30)))
print(lstm.shape)         # shape of output

(?, ?, 50)


In [90]:
## Load Dataset

In [91]:
import numpy as np

from sklearn.metrics import accuracy_score
from keras.datasets import reuters
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

In [92]:
# parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3

In [93]:
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words = num_words, maxlen = maxlen, test_split = test_split)

Downloading data from https://s3.amazonaws.com/text-datasets/reuters.npz


In [94]:
# pad the sequences with zeros 
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding = 'post')
X_test = pad_sequences(X_test, padding = 'post')

In [95]:
X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))

In [96]:
y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)

In [97]:
y_train = y_data[:1395]
y_test = y_data[1395:]

In [98]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(1395, 49, 1)
(599, 49, 1)
(1395, 46)
(599, 46)


## 1. Vanilla RNN
- Vanilla RNNs have a simple structure
- However, they suffer from the problem of "long-term dependencies"
- Hence, they are not able to keep the **sequential memory" for long

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 600px"/>

In [100]:
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, Activation
from keras import optimizers
from keras.wrappers.scikit_learn import KerasClassifier

In [101]:
def vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [102]:
model = KerasClassifier(build_fn = vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)

In [104]:
model.fit(X_train, y_train,verbose=0)

<keras.callbacks.History at 0x7facaf998748>

In [105]:
y_pred = model.predict(X_test)



In [106]:
y_test_ = np.argmax(y_test, axis = 1)

In [107]:
print(accuracy_score(y_pred, y_test_))

0.7562604340567612


## 2. Stacked Vanilla RNN
- RNN layers can be stacked to form a deeper network

<img src="https://lh6.googleusercontent.com/rC1DSgjlmobtRxMPFi14hkMdDqSkEkuOX7EW_QrLFSymjasIM95Za2Wf-VwSC1Tq1sjJlOPLJ92q7PTKJh2hjBoXQawM6MQC27east67GFDklTalljlt0cFLZnPMdhp8erzO" style="width: 500px"/>

In [108]:
def stacked_vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = True))   # return_sequences parameter has to be set True to stack
    model.add(SimpleRNN(50, return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [109]:
model = KerasClassifier(build_fn = stacked_vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)

In [110]:
model.fit(X_train, y_train,verbose=0)

<keras.callbacks.History at 0x7facaf2e7588>

In [111]:
y_pred = model.predict(X_test)



In [112]:
print(accuracy_score(y_pred, y_test_))

0.7378964941569283


## 3. LSTM
- LSTM (long short-term memory) is an improved structure to solve the problem of long-term dependencies

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 600px"/>

In [113]:
from keras.layers import LSTM

In [114]:
def lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [115]:
model = KerasClassifier(build_fn = lstm, epochs = 200, batch_size = 50, verbose = 1)

In [116]:
model.fit(X_train, y_train,verbose=0)

<keras.callbacks.History at 0x7facad3b45c0>

In [117]:
y_pred = model.predict(X_test)



In [118]:
# accuracy improves by adopting LSTM structure
print(accuracy_score(y_pred, y_test_))

0.8464106844741235


## 4. Stacked LSTM
- LSTM layers can be stacked as well

In [330]:
def stacked_lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = True))
    model.add(LSTM(50, return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [331]:
model = KerasClassifier(build_fn = stacked_lstm, epochs = 200, batch_size = 50, verbose = 1)

In [332]:
model.fit(X_train, y_train)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

Epoch 83/200
Epoch 84/200
Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155

Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200
Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


<keras.callbacks.History at 0x7faca53e35f8>

In [333]:
y_pred = model.predict(X_test)



In [334]:
print(accuracy_score(y_pred, y_test_))

0.8497495826377296
