# [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

## Recurrent Neural Networks
RNNs are networks with loops in them, allowing information to persist. In the diagram below, a chunk of neural network, $A$, looks at some input $x_t$ and outputs a value $h_t$. A loop allows information to be passed from one step of the network to the next


![unrolled](img/RNN-unrolled.png)

## Notations
![notations](img/notation.png)

## Vanila RNN
![simplernn](img/SimpleRNN.png)

$$\large h_t = tanh(W^{(hh)}h_{t-1} + W^{(hx)}x_t)$$

# Long Short Term Memory networks
usually just called "**LSTM**s" – are a special kind of RNN, capable of learning long-term dependencies. They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way:

![LSTM](img/LSTM3-chain.png)

# Step-by-Step LSTM Walk Through

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.
![memory](img/LSTM3-C-line.png)

#### forget gate layer
![focus_f](img/LSTM3-focus-f.png)

#### input gate layer
![focus_i](img/LSTM3-focus-i.png)

#### update state layer
![focus_C](img/LSTM3-focus-C.png)

#### filter state layer
![focus_o](img/LSTM3-focus-o.png)

# Variants on Long Short Term Memory

This variant adds “peephole connections.” This means that we let the gate layers look at the cell state.

![var-peepholes](img/LSTM3-var-peepholes.png)

## Gated Recurrent Unit (GRU)
GRU combines the forget and input gates into a single “update gate.” It also merges the cell state and hidden state, and makes some other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular.



![gru](img/LSTM3-var-GRU.png)