<a href="https://colab.research.google.com/github/tfindiamooc/tfindiamooc.github.io/blob/master/colabs/Understanding_recurrent_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import keras
keras.__version__

Using TensorFlow backend.


'2.0.8'

# Understanding recurrent neural networks

This notebook contains the code samples found in Chapter 6, Section 2 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). 

## SimpleRNN

The process we just naively implemented in Numpy corresponds to an actual Keras layer: the `SimpleRNN` layer:


In [0]:
import numpy as np

In [0]:
timesteps = 100         # Number of timesteps in the input sequence.
input_features = 32     # Dimensionality of the input feature space
output_features = 64    # Dimensionality of the output feature space

Get the input

In [0]:
inputs = np.random.random((timesteps, input_features))


In [0]:
inputs

array([[0.45079367, 0.67935867, 0.40936707, ..., 0.20748137, 0.37800912,
        0.80487785],
       [0.52687406, 0.21872629, 0.33978462, ..., 0.35346522, 0.34128635,
        0.78492911],
       [0.32006564, 0.15994408, 0.46670835, ..., 0.91766486, 0.92197527,
        0.19737359],
       ...,
       [0.34860521, 0.68529053, 0.02586581, ..., 0.63584958, 0.08719603,
        0.42478642],
       [0.02468279, 0.38909779, 0.86342791, ..., 0.28777785, 0.44621177,
        0.6704985 ],
       [0.79125874, 0.66503392, 0.76177463, ..., 0.36051282, 0.37230447,
        0.99266594]])

Initial state is all-zero vector

In [0]:
state_t = np.zeros((output_features))

In [0]:
state_t

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Initialize weights randomly

In [0]:
W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features, ))

Let's implement RNN

In [0]:
successive_outputs = []
for input_t in inputs:
  output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)  # Combines input with the current state to obtain the current output
  successive_outputs.append(output_t)
  state_t = output_t          # Updates state of the network for the next timestep
  
# The final output is a 2D tensor of shape (timesteps, output_features)
final_output_sequence = np.concatenate(successive_outputs, axis=0)

### Issues:
* Too simplistic for real life usecases
* It is not possible to learn long term dependendencies with SimpleRNN.  This is due to *vanishing gradient problem* - as you add more layers to the network with many layers, it eventually becomes untrainable.

LSTMs solve that problem.

## LSTM

``` 
   output_t = activation(dot(output_t, U_0) + dot(input_t, W_0) + dot(c_t, V_0) + b_0)
   
   i_t = activation(dot(state_t, U_i) + dot(input_t, W_i) + b_i)
   f_t = activation(dot(state_t, U_f) + dot(input_t, W_f) + b_f)
   k_t = activation(dot(state_t, U_k) + dot(input_t, W_k) + b_k)
   
   c_t+1 = i_t * k_t + c_t * f_t

```

LSTM allows past inform to be reinjected at a later time, thus fighting the vanishing gradient problem.