# Recurrent Neural Network And Long Short Team Memory Networks

In this module you become familiar with Recursive Neural Networks (RNNs) and Long-Short Term Memory Networks (LSTM), a type of RNN considered the breakthrough for speech to text recongintion. RNNs are frequently used in most AI applications today, and can also be used for supervised learning. 

Learning Objectives
- Explain how a Recurrent Neural Network works
- Become familiar with the most common architectures for Recurrent Neural Networks
- Gain practice using RNNs and LSTM for classification and image applications

## Recurrent Neural Networks (RNNs)

In this section, we will cover:
- Recurrent Neural Networks (RNNs)
- Practical and Mathematical details
- Limitations of RNNs in practice

### Variable Length Sequences of Words

Processing of images often forces them into a specific input dimension.
- Not obvious how to do this with text.
- For example: classify tweets as positive, negative, or neutral.
  - Tweets can have a variable number of words.
  - What to do?

### Ordering of Words is Important

Want to do better than "bag of words" implementations.
- Ideally, each word is processed or understood in the appropriate context
(but need to have some notion of "context").
- Words should be handled differently depending on "context".
- Also, each word should update the context.

### Idea: Use the Notion of "Recurrence"

Input words one by one.
- This way, we can handle variable lengths of text.
- The response to a word depends on the words that preceded it.

### Idea: Use the Notion of "Recurrence"

Network outputs two things:
- Prediction: What would be the prediction if the sequence ended with that word.
- State: Summary of everything that happened in the past.

## State and Recurrent Neural Networks

![](./images/46_RecurrenceNet.png)

![](./images/47_UnrollingTheRNN.png)

- $w_i$ : i_th word
- U: Matrix U - linear transformation 
- W: vector cointain informations 
- $s_i$ : state at i, $s_0$ with input word is vector zero
- V : Matrix V - another  transformation
- $o_i$ : output

From bottom to top: Kernel parts - recurrent parts - Dense parts

![](./images/48_UnrollingTheRNN.png)



## Details Recurrent Neural Networks

### Mathematical Details

$W_i$ is the word at position i

$s_i$ is the state at position i

$o_i$ is the output at position

$S_i = f(Uw_i + Ws_{i-1} $ 
(Core RNN)

$o_i = softmax (Vs_i)$
(subsequent dense layer)

In other words:
- current state = function1 (old state, current input).
- current output = function2(current state).
- We learn function1 and function2 by training our network!

r= dimension of input vector

S = dimension of hidden state

t = dimension of output vector (after dense layer)

U is a s x r matrix

W is as × s matrix

V is at x s matrix

Note: The weight matrices U, V, W are the same across all positions.

### Practical Details

Often, we train on just the "final" output and ignore intermediate outputs.
- Slight variation called Backpropagation Through Time (BPTT) is used to train RNNs.
- Sensitive to length of sequence (due to "vanishing/exploding gradient" problem).

In practice, we still set a maximum length to our sequences.
- If input is shorter than maximum, we "pad" it.
- If input is longer than maximum, we truncate.

### Other Uses of RNNs

RNNs often focus on text applications.

But, RNNs can be used for other sequential data:
- Forecasting: Customer Sales, Loss Rates, Network Traffic.
- Speech Recognition: Call Center Automation, Voice Applications.
- Manufacturing Sensor Data
- Genome Sequences

### Weaknesses of RNNs

Nature of state transition means is hard to keep information
from distant past in current memory without reinforcement.

In the next lecture, we introduce LSTMs,
these have a more complex mechanism for updating the state.

## Long-Short Term Memory (LSTM) Networks


## LSTM Explanation



## Gated Recurrent Unit



## Gated Recurrent Unit Details