---
title: "LSTM (Long Short-Term Memory)"
date: "5/30/2023"
categories:
  - LSTM
---

## LSTM (Long Short-Term Memory)
- LSTM is a type of RNN architecture that is widely used for sequnce modeling task 
- LSTM overcome RNN limitation(vanising gradient) by introducing a memory cell and three gating mechanisms.
- Memory cell in LSTM allows to store and access information over long sequence
- LSTMs use a series of gates which control how the information in a sequence of data comes into, is stored in and leaves the network. they are -
    - forgot gate
    - input gate
    - output gate

### Application
- **NLP task**- named entity recognition, sentiment analysis, machine translation etc.
- **Speech Recognition** - automatic speech recognition, speech-to-text conversion etc.
- **Time Series Analysis and Forecasting** - stock market prediction, weather forecasting etc.

### Architecture

#### Forget gate layer
First step in the process is Forgot gate. This gate telling  the LSTM how much information keep from previous state. Output of this gate is between 0 and 1. Output of this forgot gate multiply with previous LSTM output.
<img height = 300 width=600 src='http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png'><br>
output of forgot gate is 0 implies `->` Forget all previous memory<br>
output of forgot gate is 1 implies `->` Keep all previous memory<br>
output of forgot gate is 0.5 implies `->` Keep some of previous memory<br>

#### Input gate layer
Goal of this step is how much information take from new memory. This sigmoid layer is call input gate decides which values we’ll update.

output of forgot gate is 0 implies `->` Didn't take anything from generated memory<br>
output of forgot gate is 1 implies `->` Take anything from generated memory<br>
output of forgot gate is 0.5 implies `->` Take partially from generated memory<br>
<img height = 300 width=600 src='http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png'>

The new **memory network** is a tanh activated neural network which has learned how to combine the previous hidden state and new input data to generate a ‘new memory update vector’. This vector essentially contains information from the new input data given the context from the previous hidden state

<img height = 300 width=600 src='http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png'>


#### Output Gate
The output gate, deciding the new hidden state.
 - First, we run a sigmoid layer which decides what parts of the cell state we’re going to output.
 - Then, we put the cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.


<img height = 300 width=600 src='http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png'>


### Drawbacks LSTM
- Computational Complexity
- Training Time
- Difficulty in Hyperparameter Tuning

### Reference
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/


# Implementation of LSTM from scratch

In [1]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
from torch.utils.data import DataLoader
from torchvision import datasets
import torchvision.transforms as transform
from torch.functional import F


  warn(


In [2]:
train_data = datasets.MNIST('/',train=True,download=True,transform = transform.ToTensor())
test_data = datasets.MNIST('/',train=False,download=True,transform = transform.ToTensor())

In [3]:
class MNISTDataset():
    def __init__(self,data) -> None:
        self.data = data
    def __len__(self):
        return len(self.data)
    def __getitem__(self,ind):
        x,target = self.data[ind]
        # print(x.size()) # torch.Size([1, 28, 28])
        x = x.view((1,28*28))
        # print(x.size()) # torch.Size([1, 784])
        return {'input':x,
                'label':target}

In [4]:
BATCH_SIZE = 64
train_dataloader = DataLoader(MNISTDataset(train_data),batch_size=BATCH_SIZE,shuffle=True)
test_dataloader = DataLoader(MNISTDataset(train_data),batch_size=BATCH_SIZE,shuffle=True)

In [5]:
from torch.functional import F
class LSTMCell(nn.Module):
    def __init__(self,input_size,hidden_size,bias=False) -> None:
        super().__init__()
        self.input_size = input_size # input = (batch_size,input_size)
        self.hidden_size = hidden_size # h
        self.linear1 = nn.Linear(self.input_size,4*self.hidden_size,bias=bias)
        self.linear2 = nn.Linear(self.hidden_size,4*self.hidden_size,bias=bias)
        self.reset_parameters()

    def reset_parameters(self):
        std = 1.0 / np.sqrt(self.hidden_size)
        for w in self.parameters():
            w.data.uniform_(-std, std)

    def forward(self,input,h_x=None):
        """_summary_

        Args:
            input = (batch_size,input_size)
            h_x (_type_): previous hidden state output
        """

        if h_x ==None:
            # Hidden state output shape = (batch_size, hidden_units)
            # batch_size refers to the number of samples or sequences processed in each batch.
            # hidden_units represents the number of hidden units or neurons in the LSTM layer.
            hx = Variable(input.new_zeros(input.shape[0],self.hidden_size))
            h_x = (hx,hx)
        h_t1,c_t1 = h_x # 
   
        # print("h_t1",h_t1.size())
        # print(self.linear1(input).size() , self.linear2(h_t1).size())
        gate = self.linear1(input) + self.linear2(h_t1)
        # print("gate",gate.size())
        f_t_,i_t_,c_t_,o_t_ = torch.chunk(gate,4,dim=1)
        
        
        f_t = F.sigmoid(f_t_)
        i_t = F.sigmoid(i_t_)
        c_t = F.tanh(c_t_)
        o_t = F.sigmoid(o_t_)
        
        c_t = f_t*c_t1 + i_t*c_t
        h_t = o_t * F.tanh(c_t)

        return (h_t,c_t)


In [6]:
lstm = LSTMCell(28*28,4)
MNISTDataset(train_data)[0]['input'].shape
h_t,c_t = lstm(MNISTDataset(train_data)[0]['input'])



In [7]:
h_t.shape,c_t.shape

(torch.Size([1, 4]), torch.Size([1, 4]))

In [8]:
class LSTMModel(nn.Module):
    def __init__(self,input_size,num_layers,hidden_size,bias,output_size) -> None:
        super(LSTMCell,self).__init__()
        self.input_size = input_size
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.bias = bias
        self.output_size = output_size
        self.lstm_cell_list = nn.ModuleList()
        
        self.lstm_cell_list.append(LSTMCell(self.input_size,self.hidden_size,self.bias))
        for _ in range(1,self.num_layers):
            self.lstm_cell_list.append(LSTMCell(self.input_size,self.hidden_size,self.bias))
        sself.sc = nn.Linear(self.hidden_size,self.output_size)
    
    def forward(self,x,hx=None):
        if hx==None:
            h0 = Variable(self.num_layers,input.new_zeros(input.shape[0],self.hidden_size))
            

SyntaxError: incomplete input (1258053652.py, line 17)

In [46]:
lstm = LSTMCell(28*28,4)

In [47]:
MNISTDataset(train_data)[0]['input'].shape

torch.Size([1, 784])

In [9]:
nn.ModuleList?

In [29]:
torch.rand(1,10)

tensor([[0.0275, 0.8263, 0.1286, 0.2876, 0.1218, 0.6262, 0.9347, 0.8457, 0.3715,
         0.5222]])

In [None]:
https://github.com/hadi-gharibi/pytorch-lstm/blob/master/lstm.ipynb