# COMP3222/COMP6246 Machine Learning Technologies (2018/19)

## Week 11 - Recurrent Neural Networks (Chapter 14)

Follow each code block at your own pace, you can have a look at the book or ask questions to demonstrators if you find something confusing.

# 1. Basic Theory

Until now, we looked into basic preceptrons, convolutional neural network (CNN) and how to implement them in TensorFlow. In practice these techniques are used in tasks such as: searching images, self-driving cars, automatic video classification and many more. There are different neural architectures that are used in Deep Learning. In previous lab we showed that CNN is essentially for `"processing a grid of values"`. However, Deep Learning community also found another architecture specifically for `"processing a sequence of values"` which are called **Recurrent Neural Networks (RNN)** [Goodfellow 2016]. In practice, recurrent neural networks are used for analyzing time series: stock prices, car trajectories, sentiment analysis and more. 

_Extra_: Have a look at [this interactive example](https://distill.pub/2016/handwriting/), which generates new strokes in your handwriting style, which is using RNNs. The model is explained in [this paper](https://arxiv.org/abs/1308.0850).

## 1.2 Bare-bones RNN

Let's implement an RNN with five recurrent neurons without using TensorFlow's RNN implementation/utilities. 

In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Let's assume some artificial data with three input (if our objective is to predict words in a sentence
n_inputs = 3  # then for instance: first word, second word, third word can be the input of our model)
n_neurons = 5 # number of neurons

X0 = tf.placeholder(tf.float32, [None, n_inputs]) # t=0 batch
X1 = tf.placeholder(tf.float32, [None, n_inputs]) # t=1 batch

# Weights on inputs (all steps share this), initialy they are set random
Wx = tf.Variable(tf.random_normal(shape=[n_inputs, n_neurons],dtype=tf.float32))

# Connection weights for the outputs of the previous timestep (all steps share this), initialy they are set random 
Wy = tf.Variable(tf.random_normal(shape=[n_neurons,n_neurons],dtype=tf.float32))

# bias vector, all zeros for now
b = tf.Variable(tf.zeros([1, n_neurons], dtype=tf.float32))

# outputs of timestep 0
Y0 = tf.tanh(tf.matmul(X0, Wx) + b)

# outputs of timestep 1
Y1 = tf.tanh(tf.matmul(Y0, Wy) + tf.matmul(X1, Wx) + b)
# Y1 = activation_function(dot_product(Y0, Wy) + dot_product(X1, Wx) + bias_vector)

init = tf.global_variables_initializer()

# Mini-batch:        instance1  instance2   instance3 instance4
X0_batch = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]]) # t = 0 (for instance, first word of a sentence)
X1_batch = np.array([[9, 8, 7], [0, 0, 0], [6, 5, 4], [3, 2, 1]]) # t = 1 (for instance, second word of a sentence)

# within the session
with tf.Session() as sess:
    init.run()
    # get the outputs of each step
    Y0_val, Y1_val = sess.run([Y0, Y1], feed_dict={X0: X0_batch, X1: X1_batch})

In [3]:
print(Y0_val) # layers output at t=0

[[ 0.9391149   0.99631876 -0.9985466  -0.8736268  -0.79987127]
 [ 1.          0.9999993  -0.99999785  0.3628914   0.60871404]
 [ 1.          1.         -1.          0.97094965  0.98692995]
 [ 0.9999467   0.99282587  0.99547905  1.          0.9999999 ]]


In [4]:
print(Y1_val) # layers output at t=1

[[ 1.          0.9999995  -0.99999994  0.99999535  0.9999999 ]
 [-0.6194218  -0.99774283 -0.938938    0.51976573  0.99024975]
 [ 1.          0.93615544 -0.99941397  0.9999731   0.9999991 ]
 [ 0.99998814 -0.21627878  0.8582206   0.9909318   0.999858  ]]


**Questions:** For the given example in the comments 
 * How would you define the outputs?
     * Why are there five columns?
     * Why are there four rows?
 * What would be difference between `instance1` at $t=0$ and `instance1` at $t=1$?


### References
[Goodfellow, 2016] : https://www.deeplearningbook.org/