# <span style="color:#0b486b">  FIT3181: Deep Learning (2022)</span>
***
*CE/Lecturer (Clayton):*  **Dr Trung Le** | trunglm@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Lim Chern Hong** | lim.chernhong@monash.edu <br/>  <br/>
*Tutor:*  **Mr Thanh Nguyen** \[Thanh.Nguyen4@monash.edu \] |**Mr Tuan Nguyen**  \[tuan.ng@monash.edu \] |**Mr Anh Bui** \[tuananh.bui@monash.edu\] | **Dr Binh Nguyen** \[binh.nguyen1@monash.edu \] | **Mr Md Mohaimenuzzaman** \[md.mohaimen@monash.edu \] |**Mr James Tong** \[james.tong1@monash.edu \]
<br/> <br/>
Faculty of Information Technology, Monash University, Australia
***on Technology, Monash University, Australia
******

# <span style="color:#0b486b">Tutorial 08a: Fundamental of RNN in TF 2.x</span> <span style="color:red">*****</span> #

This tutorial is designed to facilitate you in understanding the fundamental building blocks of a Recurrent Neural Network (RNN) including:
- The computational process of a standard and simple RNN cell.
- How to declare and manipulate with standard RNN, LSTM, and GRU cells.

We first import the necessary modules.

In [1]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import tensorflow as tf
import numpy as np
import os

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

## <span style="color:#0b486b">I. Fundamental of RNN</span> ##

### <span style="color:#0b486b">I.1. Manual RNN</span> ###

We now implement a simple basic RNN on our own. This basic RNN has two hidden states to take two inputs (i.e., the sequence length is $2$).

The computation process is as follows:
- $h_0 = tanh(X_0 \times U + b)$.
- $h_1 = tanh(X_1 \times U + h_0 \times W +b)$.

In the following code, $X0$ is a mini-batch with batch size $4$ consisting of  the data of time step $0$.
- $X0$'s shape is $[batch\_size \times input\_size]$

$X1$ is a mini-batch with batch size $4$ consisting of  the data of time step $1$.
- $X1$'s shape is $[batch\_size \times input\_size]$

In [2]:
import numpy as np

X0 = np.array([[0.0, 1.0, -2.0], 
               [-3.0, 4.0, 5.0], 
               [6.0, 7.0, -8.0],
               [6.0, -1.0, 2.0]], dtype= np.float32) # t = 0
X1 = np.array([[9.0, 8.0, 7.0], 
               [0.0, 0.0, 0.0], 
               [6.0, 5.0, 4.0],
               [1.0, 2.0, 3.0]], dtype= np.float32) # t = 1

In [7]:
U.shape

TensorShape([3, 5])

We now demonstrate the computational process for a standard RNN with sequence length $2$.

In [4]:
hidden_size = 5
input_size = 3

U = tf.Variable(tf.random.normal(shape=[input_size, hidden_size],dtype=tf.float32))
W = tf.Variable(tf.random.normal(shape=[hidden_size, hidden_size],dtype=tf.float32))
b = tf.Variable(tf.zeros([1, hidden_size], dtype=tf.float32))

h0 = tf.tanh(tf.matmul(X0, U) + b)
h1 = tf.tanh(tf.matmul(X1, U) + tf.matmul(h0, W)  + b)

In [5]:
print("h0= {}".format(h0.numpy()))

h0= [[ 0.78652716 -0.97147447 -0.4345065  -0.9987761  -0.6222663 ]
 [ 1.         -1.         -1.          0.9997632   0.9999971 ]
 [ 0.9999817  -1.         -1.         -1.         -1.        ]
 [-0.999861    1.         -0.40754828  0.9999876  -1.        ]]


In [6]:
print("h1= {}".format(h1.numpy()))

h1= [[ 1.         -1.         -1.          0.99999607 -1.        ]
 [ 0.44846708  0.55633074  0.650563    0.9698714  -0.53894323]
 [ 0.9998064  -1.         -1.          0.8546817  -1.        ]
 [ 0.9774557  -0.9986503  -0.9999881   0.99967086  0.99856204]]


**<span style="color:red">Exercise 1</span>:** Explain why $h_0$  and  $h_1$  have the above shapes.

**<span style="color:red">Exercise 2</span>:** Extend to a given $L$ time steps

### <span style="color:#0b486b">I.2. Recurrent cells in Tensorflow Keras</span> ###

Tensorflow Keras supports most of the necessary recurrent cells (layers) which you might need in your real projects. The following figure shows all recurrent cells (layers) supported by TF Keras.

<img src="./images/RNN_layers.png" align="left" width=180/>

#### <span style="color:#0b486b">Simple RNN cell</span> ####

We start with the introduction of the *standard and simple RNN cell*. The following figure shows the signature of the class *tf.keras.layers.SimpleRNN* and its parameters.

<img src="./images/SimpleRNN_cell.png" align="left" width=1200/>

It looks a bit complicated. However, there are some important things you need to be clear about now. 

First, you can imagine that your RNN consists of many recurrent layers, each of which consists of many cells (e.g., simple RNN cell, LSTM cell, and GRU cell). You need to input to a recurrent layer a 3D tensor with the shape $batch\_size \times timesteps \times input\_size$ and take the output as a 3D tensor with the shape $batch\_size \times timesteps \times output\_size$.
- $batch\_size$ means the number of sequences (sentences) in a mini-batch, $timesteps$ means the sequence length or number of tokens/words in your sequences, and $input\_size$ specifies the input size of each token. Later you will know that for symbolic tokens like words, we need to embed them to feature vectors using an embedding matrix.

Moreover, by default, the *output* returned by a recurrent cell/layer is the last hidden value (i.e., the value of the last cell or $h_L$).

Second, the meaning of the parameters *return_state* and *return_sequences*.
- *return_state = True* indicates that the last hidden state (i.e., $h_L$) will be returned in addition to the output.
- *return_sequences = True* indicates that the concatenation of all hidden values for of all hidden cells ($[h_1, h_2,...,h_L]$) will be returned in addition to the output in the form of a 3D tensor with the shape $batch\_size \times timesteps \times hidden\_size$. Otherwise, it returns the last hidden state $h_L$ with the shape $batch\_size \times hidden\_size$,

Note that regarding the terminologies, there are some equivalent terms that you need to pay attention to:
- `timesteps = seq_length` or sequence length which specifies the number of cells in a recurrent layer.
- `state_size = hidden_size` which represents the common size of cells in a given recurrent layer, meaning the common size of $h_1, h_2,...,h_L$ where $L$ is the sequence length.

<img src="./images/all_in_once.png" align="left" width=1200/>


The picture above shows the illustration of RNN architecture. Basically, we have three recurrent layers (e.g., *Hidden 1, Hidden 2, and Hidden 3*). Each recurrent layer consists of many GRU cells.
- Input to Hidden 1 layer is 3D tensor $batch\_size \times seq\_len \times embed\_size$ and output is the 3D tensor $batch\_size \times seq\_len \times state\_size_1$. Here $state\_size_1$ is the common hidden state size of all GRU cells on the Hidden 1 layer.
- Input to Hidden 2 layer is 3D tensor $batch\_size \times seq\_len \times state\_size_1$ and output is the 3D tensor $batch\_size \times seq\_len \times state\_size_2$. Here $state\_size_2$ is the common hidden state size of all GRU cells on the Hidden 2 layer.
- Input to Hidden 3 layer is 3D tensor $batch\_size \times seq\_len \times state\_size_2$ and output is the 3D tensor $batch\_size \times seq\_len \times state\_size_3$. Here $state\_size_3$ is the common hidden state size of all GRU cells on the Hidden 3 layer.

Regarding how to transform a batch of sentences to sequences of indices as a 2D tensor $batch\_size \times seq\_len$ and then use the embedding layer to further transform to a 3D tensor with the shape $batch\_size \times seq\_len \times embed\_size$, please refer to Tute 8b.

The following code only returns $output$ as the last hidden value with the shape $32 \times 4$.

In [9]:
inputs = np.random.random([32, 10, 8]).astype(np.float32)
simple_rnn = tf.keras.layers.SimpleRNN(4)

output = simple_rnn(inputs)  # The output has shape `[32, 4]`.

In [7]:
print(output.shape)

(32, 4)


We set *return_sequences=True* to return  $whole\_sequence\_output$  including all hidden values of cells.

In [8]:
simple_rnn = tf.keras.layers.SimpleRNN(4, return_sequences=True, return_state=True)

# whole_sequence_output has shape `[32, 10, 4]`.
# final_state has shape `[32, 4]`.
whole_sequence_output, final_state = simple_rnn(inputs)

In [9]:
print(whole_sequence_output.shape)
print(final_state.shape)

(32, 10, 4)
(32, 4)


**<span style="color:red">Exercise 3</span>:** Please run the following code and explain why we obtain the corresponding result (pay attention to $return\_sequences=False, return\_state=True$).

In [15]:
simple_rnn = tf.keras.layers.SimpleRNN(4, return_sequences=False, return_state=True)

# whole_sequence_output has shape `[32, 10, 4]`.
# final_state has shape `[32, 4]`.
output, final_state = simple_rnn(inputs)
print(output.shape)
print(final_state.shape)

(32, 4)
(32, 4)


In [11]:
bool_arr = final_state.numpy()==output.numpy()
bool_list = bool_arr.ravel().tolist()
and_all = all(bool_list)
print("the same" if and_all else "different")

the same


#### <span style="color:#0b486b">LSTM cell</span> ####

LSTM cell/layer is encapsulated in *the class tf.keras.layers.LSTM*. The signature and parameters of the LSTM class are similar to that of the standard RNN cell class.

In [12]:
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)

(32, 4)


There is a difference between the LSTM cell and other cells: when setting $return\_sequences=True$, it will return the final long-term memory $c_L$ as shown in the following code.

In [13]:
lstm = tf.keras.layers.LSTM(4, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_hidden_state = lstm(inputs)
print(whole_seq_output.shape)    #h = [h1, h2,..., hL]
print(final_memory_state.shape)  #cL
print(final_hidden_state.shape)   #hL

(32, 10, 4)
(32, 4)
(32, 4)


#### <span style="color:#0b486b">GRU cell</span> ####

GRU cell/layer is packaged in *the class tf.keras.layers.GRU*. The signature and paramters of the GRU class are similar to that of the standard RNN cell class.

In [14]:
inputs = tf.random.normal([32, 10, 8])
gru = tf.keras.layers.GRU(4)
output = gru(inputs)
print(output.shape)

(32, 4)


In [15]:
gru = tf.keras.layers.GRU(4, return_sequences=True, return_state=True)
whole_sequence_output, final_hidden_state = gru(inputs)
print(whole_sequence_output.shape)
print(final_hidden_state.shape)

(32, 10, 4)
(32, 4)


---
### <span style="color:#0b486b"> <div  style="text-align:center">**THE END**</div> </span>