<a href="https://colab.research.google.com/github/tugcegurbuz/Deep-Learning-with-TensorFlow/blob/master/6_IntroductionToLSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description:** Introduction to LSTM

----

**How LSTM Works:**

An usual flow of operations for the LSTM unit is as such: First off, the Keep Gate has to decide whether to keep or forget the data currently stored in memory. It receives both the input and the state of the Recurrent Network, and passes it through its Sigmoid activation. If $K
_t$ has value of 1 means that the LSTM unit should keep the data stored perfectly and if $K_t$ a value of 0 means that it should forget it entirely. Consider $S_{t-1}$ as the incoming (previous) state, $x_t$ as the incoming input, and $W_k$, $B_k$ as the weight and bias for the Keep Gate. Additionally, consider $Old_{t-1}$ as the data previously in memory. What happens can be summarized by this equation:

$$K_t = \sigma(W_k \times [S_{t-1}, x_t] + B_k)$$

$$Old_t = K_t \times Old_{t-1}$$

Then, the input and state are passed on to the Input Gate, in which there is another Sigmoid activation applied. Concurrently, the input is processed as normal by whatever processing unit is implemented in the network, and then multiplied by the Sigmoid activation's result $I_t$, much like the Keep Gate. Consider $W_i$ and $B_i$ as the weight and bias for the Input Gate, and $C_t$ the result of the processing of the inputs by the Recurrent Network. $New_t$ is the new data to be input into the memory cell. This is then **added**  to whatever value is still stored in memory.

$$I_t = \sigma(W_i\times[S_{t-1},x_t]+B_i)$$

$$New_t = I_t \times C_t$$

The conjunction of Keep and Input gates are working in analog manner so that we can keep part of the old data and add only the part of new data. Let's see what would happen if we set Forget gates as 0 and input gate as 1: (The old data will be forgetten totally and new data will overwrite it)

$$Old_t = 0 \times Old_{t-1}$$

$$New_t = 1 \times C_t$$

$$Cell_t = C_t$$

The Output Gate functions in a similar manner. To decide what we should output, we take the input data and state and pass it through a Sigmoid function as usual. The contents of our memory cell, however, are pushed onto a *Tanh* function to bind them between a value of -1 to 1. Consider $W_o$ and $B_o$ as the weight and bias for the Output Gate.

$$O_t = \sigma(W_o \times [S_{t-1},x_t] + B_o)$$

$$Output_t = O_t \times tanh(Cell_t)$$

Finally, output has put into RNN.

### One LSTM Unit Model

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn

print(tf.__version__)

1.14.0


In [0]:
#Start the session
sess = tf.Session()

In [3]:
#Define number of hidden units that is also equal to the size of the output
hidden_num = 4

#Define LSTM unit
lstm_cell = rnn.BasicLSTMCell(hidden_num)

W0722 22:10:31.577492 140688586958720 deprecation.py:323] From <ipython-input-3-25a689f5a3fe>:4: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.


In [4]:
#Create a sample input
sample_input = tf.constant([[3,2,2,2,2,2]],dtype=tf.float32)
print (sess.run(sample_input))

[[3. 2. 2. 2. 2. 2.]]


In [5]:
#We have to pass 2 elements to LSTM, the prv_output and prv_state (h and c).
#So, we initialize a state vector which is a tuple with 2 elements, 
#each one is of size [1 x 4], one for passing prv_output to next time step, 
#and another for passing the prv_state to next time stamp.

state = (tf.zeros([1,hidden_num]),)*2

#Run the LSTM unit
with tf.variable_scope("LSTM_sample1"):
    output, state_next = lstm_cell(sample_input, state)

W0722 22:10:31.627731 140688586958720 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0722 22:10:31.643100 140688586958720 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:738: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


In [6]:
#Initialize the variables
sess.run(tf.global_variables_initializer())

#Let's see the next state
print (sess.run(state_next))

LSTMStateTuple(c=array([[0.8184926 , 0.44456646, 0.5677054 , 0.00412068]], dtype=float32), h=array([[0.34325895, 0.2783329 , 0.25642425, 0.00280078]], dtype=float32))


In [7]:
#Let's see the output
print(sess.run(output))

[[0.34325895 0.2783329  0.25642425 0.00280078]]


### Stack LSTM Model

In [8]:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn

print(tf.__version__)

1.14.0


In [0]:
#Let's start a new session
sess.close()
sess = tf.Session()

In [0]:
#Define input dimension
input_dim = 6

In [0]:
#Initialize a vector which will hold all cells
cells = []

In [12]:
#First Cell
hidden_num_1 = 4 #4 hidden nodes
cell1 = rnn.LSTMCell(hidden_num_1)
cells.append(cell1)

W0722 22:10:37.786900 140688586958720 deprecation.py:323] From <ipython-input-12-d401a692e49c>:2: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.


In [0]:
#Second Cell
hidden_num_2 = 5 #5 hidden nodes
cell2 = rnn.LSTMCell(hidden_num_2)
cells.append(cell2)

In [14]:
#Create the stacked LSTM
stacked_lstm = rnn.MultiRNNCell(cells)

W0722 22:10:39.630931 140688586958720 deprecation.py:323] From <ipython-input-14-9329350013ab>:1: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.


In [15]:
#Let's create RNN from stacked LSTM

#Data: Batch size x time steps x features.
data = tf.placeholder(tf.float32, [None, None, input_dim])

#RNN
output, state = tf.nn.dynamic_rnn(stacked_lstm, data, dtype=tf.float32)

W0722 22:10:40.794391 140688586958720 deprecation.py:323] From <ipython-input-15-003b165e6aac>:4: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API


In [16]:
#Define a sample input

#Lets say the input sequence length is 3, the dimensionality of the inputs is 6.
#The input would be (2, 3, 6) shape

sample_input = [[[1,2,3,4,3,2], [1,2,1,1,1,2],[1,2,2,2,2,2]],
                [[1,2,3,4,3,2],[3,2,2,1,1,2],[0,0,0,0,3,2]]]
sample_input

[[[1, 2, 3, 4, 3, 2], [1, 2, 1, 1, 1, 2], [1, 2, 2, 2, 2, 2]],
 [[1, 2, 3, 4, 3, 2], [3, 2, 2, 1, 1, 2], [0, 0, 0, 0, 3, 2]]]

In [17]:
output

<tf.Tensor 'rnn/transpose_1:0' shape=(?, ?, 5) dtype=float32>

In [18]:
sess.run(tf.global_variables_initializer())
sess.run(output, feed_dict={data: sample_input})

array([[[-0.01378634,  0.00034683,  0.03838394,  0.02868149,
          0.05398444],
        [-0.04609454,  0.01659227,  0.06698639,  0.03768126,
          0.11669935],
        [-0.07558274,  0.0281612 ,  0.09719814,  0.04549597,
          0.18208914]],

       [[-0.01378634,  0.00034683,  0.03838394,  0.02868149,
          0.05398444],
        [-0.03996516,  0.03149408,  0.04251051,  0.02974791,
          0.08969524],
        [-0.05997536,  0.03263183,  0.04433766,  0.02388369,
          0.10596982]]], dtype=float32)