 # 5.5 Long Short Term Memory (LSTM) Networks
 
 References: [https://colah.github.io/posts/2015-08-Understanding-LSTMs/](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
 
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is specifically designed to handle sequential data, such as time series, speech, and text1. LSTM networks are capable of learning long-term dependencies in sequential data, which makes them well suited for tasks such as language translation, speech recognition, and time series forecasting.

![Traditional RNN](Images/RNN_unrolled.png)

![](Images/RNN_layer.png)

A traditional RNN has a single hidden state that is passed through time, which can make it difficult for the network to learn long-term dependencies. LSTMs address this problem by introducing a memory cell, which is a container that can hold information for an extended period of time. 



The memory cell is controlled by three gates: 
*   the input gate, 
*   the forget gate, and 
*   the output gate. 
  
___

![LSTM](Images/LSTM_with_4_layers.png)

___

![Operations](Images/LSTM_operations.png)

___

These gates decide what information to add to, remove from, and output from the memory cell.

## The cell state of LSTM

*   The key to LSTMs is the *cell state*, the horizontal line running through the top of the diagram.
*   The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

![The cell state](Images/cell_state.png)

## The gates of LSTM

* Gates are a way to optionally let information through. 
* They are composed out of a *sigmoid neural net layer* and a *pointwise multiplication operation*.
* The sigmoid layer outputs numbers between *zero* and *one*. 
    * a `value=0` means `“let nothing through,”` while 
    * a `value=1` means `“let everything through!”`.
*  An LSTM has three of these gates, to protect and control the cell state.

![A gate of LSTM](Images/gate.png)

## Step-by-Step LSTM Walk 

### Step 1. Decide what information to throw away from the cell state.

*Using a sigmoid forget gate layer*:

![Forget Gate](Images/forget_gate_layer.png)


### Step 2. Decide what new information to store in the cell state.

*Using a sigmoid input gate layer*  and  *tanh layer* creating a vector of new candidates,$\tilde{C}_t$ to be added to the cell state

![](Images/sigmoid_input_gate_with_tanh_gate.png)

### Step 3. Update to the new cell state

![](Images/creating_new_cell_state.png)

### Step 4. Decide what to output.

![](Images/output.png)


In [1]:
import nltk
import numpy as np
import pandas as pd
import re 
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('stopwords')

class Preprocessing:
    
    def __init__(self):
    
        self.data = 'datasets/tweets.csv'

        self.X_raw = None
        self.y = None
        self.X_cleaned = None
        self.X_tokenized = None
        self.X_stopwords_removed = None
        self.X_lemmatized = None
        #self.vocabular = None
        #self.word2idx = None
        #self.vector_size = None
        #self.X_encoded = None
        #self.X_padded = None

    def load_data(self):
        # Reads the raw csv file and split into
        # features (X) and target (y)
        
        df = pd.read_csv(self.data)
        #df.drop(['id','keyword','location'], axis=1, inplace=True)
        
        self.X_raw = df['text'].values
        self.y = df['target'].values

    def clean_text(self):
        # Removes special symbols and just keep
        # words in lower or upper form
        
        self.X_cleaned = [x.lower() for x in self.X_raw]
        self.X_cleaned = [re.sub(r'[^\w\s]', '', x) for x in self.X_cleaned]
        
    def text_tokenized(self):
        # Tokenizes each sentence by implementing the nltk tool
        
        self.X_tokenized = [word_tokenize(x) for x in self.X_cleaned]

    def text_stopwords_removed(self):
        ## Create a list of stopwords
        
        stop_words = set(stopwords.words("english"))
        no_stopwords = []
        
        for tokens in self.X_tokenized:
            tokens = [token for token in tokens if token not in stop_words]
            no_stopwords.append(tokens)
            
        self.X_stopwords_removed = no_stopwords

    def text_lemmatized(self):
    
        lemmatizer = WordNetLemmatizer()

        text_lemmas = []
        for tokens in self.X_stopwords_removed:
            lemmas = [lemmatizer.lemmatize(word, pos="v") for word in tokens]
            lemmas = [lemmatizer.lemmatize(word, pos="n") for word in lemmas]
            lemmas = [lemmatizer.lemmatize(word, pos="a") for word in lemmas]
            lemmas = [lemmatizer.lemmatize(word, pos="r") for word in lemmas]
            lemmas = [lemmatizer.lemmatize(word, pos="s") for word in lemmas]
            text_lemmas.append(lemmas)
        
        self.X_lemmatized = text_lemmas
        
    #preprocessing.load_data()
    #preprocessing.clean_text()
    #preprocessing.text_tokenized()
    #preprocessing.text_stopwords_removed()
    #preprocessing.text_lemmatized()

[nltk_data] Downloading package punkt to /home/repl/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /home/repl/nltk_data...
[nltk_data] Downloading package omw-1.4 to /home/repl/nltk_data...
[nltk_data] Downloading package stopwords to /home/repl/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [2]:
import copy

class Encoding:
    
    def __init__(self, lemmatized_texts, num_words):
        
        self.X_lemmatized = lemmatized_texts
        self.num_words = num_words
        self.vector_size = None
        self.fdist = None
        self.X_encoded_texts = None
        self.text4encoding = None
        self.X_padded_codes = None
    
    def text_encoding(self):

        vocabulary = dict()
        fdist = nltk.FreqDist()  
        
        for tokens in self.X_lemmatized:  
            for word in tokens:
                fdist[word] += 1
        
        self.fdist = fdist
        common_words = fdist.most_common(self.num_words)

        for idx, word in enumerate(common_words):
            vocabulary[word[0]] = (idx+1)
        
        self.vocabulary = vocabulary
      
        encoded_texts = list()
        texts4encoding = list()
        
        for tokens in self.X_lemmatized:
            temp_codes = list()
            temp_words = list()
            
            for word in tokens:
                if word in self.vocabulary.keys():
                    temp_codes.append(self.vocabulary[word])
                    temp_words.append(word)
                             
            encoded_texts.append(temp_codes)
            texts4encoding.append(temp_words)

        self.vector_size = np.max([len(x) for x in encoded_texts])
        self.X_encoded_texts = encoded_texts
        self.texts4encoding = texts4encoding
  
    def codes_padding(self):
        pad_idx = 0
        padded_codes = list()
        
        codes_from_texts = copy.deepcopy(self.X_encoded_texts)
        for encoded_text in codes_from_texts:
            while len(encoded_text) < self.vector_size:
                encoded_text.append(pad_idx)
            padded_codes.append(encoded_text)

        self.X_padded_codes = np.array(padded_codes)

In [3]:
from torch.utils.data import Dataset, DataLoader

class DatasetMapping(Dataset):

    def __init__(self, X, y):
        self.X = X
        self.y = y
        
    def __len__(self):
        return len(self.X)
      
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]
    
from sklearn.model_selection import train_test_split

class   DatasetLoading:
    
    def __init__(self, padded_codes, targets):
        
        self.X = padded_codes
        self.y = targets
        self.X_train = None
        self.y_train = None
        self.X_test = None
        self.y_test = None
        
    def data_split(self):
        
        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.20, random_state=20231116)    

    def data_mapping(self):
        
        self.train = DatasetMapping(self.X_train, self.y_train)
        self.test = DatasetMapping(self.X_test, self.y_test)

    def data_loading(self):
        self.loader_train = DataLoader(self.train, batch_size=params.batch_size)
        self.loader_test = DataLoader(self.test, batch_size=params.batch_size)  

In [83]:
from dataclasses import dataclass

@dataclass
class Parameters:
    # Preprocessing parameters
    vector_size: int = 25   # standard length of each row vector in the input
    num_words: int = 9300  # number of words in the vocabulary
    test_size = 0.20         
    random_state = 42
   
    # Model parameters
    embedding_dim: int = 256
    num_layers: int = 2 # number of lstm layers
    #out_size: int = 32
    #tride: int = 2
    #dilation: int = 2
   
    
    # Training parameters
    epochs: int = 20
    batch_size: int = 128
    learning_rate: float = 0.001
    dropout: float = 0.5
    
params=Parameters()

In [5]:
### Step 1. Preprocessing
data = Preprocessing()
data.load_data()
data.clean_text()
data.text_tokenized()
data.text_stopwords_removed()
data.text_lemmatized()

### Step 2. Encoding
code = Encoding(data.X_lemmatized, params.num_words)
code.text_encoding()
code.codes_padding()

### Step 3. Dataset and DataLoader
dsl = DatasetLoading(code.X_padded_codes, data.y)
dsl.data_split()
dsl.data_mapping()
dsl.data_loading()

In [52]:
print(f"input length: {len(data.X_lemmatized)}")
print(f"               {len(code.X_padded_codes)}")
print(f"total number of tokens:  {np.sum([len(x) for x in data.X_lemmatized])}")
print(f"                         {sum(code.fdist.values())}")
print(f"total number of unique tokens: {len(code.fdist)}")
print(f"length of vocabulary: {len(code.vocabulary)}")
print(f"max number of tokens per row: {np.max([len(x) for x in data.X_lemmatized])}")
print(f"                              {set([len(x) for x in code.X_padded_codes])}")

input length: 7613
               7613
total number of tokens:  76462
                         76462
total number of unique tokens: 19999
length of vocabulary: 9300
max number of tokens per row: 25
                              {25}
X_train shape: (6090, 25)
X_test shape: (1523, 25)
y_train shape: (6090,)
y_test shape: (1523,)


In [54]:
for x_batch, y_batch in dsl.loader_train:
    print(x_batch.size(), y_batch.size())

torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])


In [81]:
print(f"\nbatch_size:  {params.batch_size}")
print(f"X_train shape: {dsl.X_train.shape}")
print(f"y_train: (quotient, remainder) = {divmod(len(dsl.X_train), 128)}")
print(f"         128 x 47 + 74 = {128*47 + 74}")
print(f"\nX_test shape: {dsl.X_test.shape}")
print(f"\y_test: (quotient, remainder) = {divmod(len(dsl.X_test), 128)}")
print(f"         128 x 11 + 115 = {128*11 + 115}")


batch_size:  128
X_train shape: (6090, 25)
y_train: (quotient, remainder) = (47, 74)
         128 x 47 + 74 = 6090

X_test shape: (1523, 25)
\y_test: (quotient, remainder) = (11, 115)
         128 x 11 + 115 = 1523


In [56]:
for x_batch, y_batch in dsl.loader_test:
    print(x_batch.size(), y_batch.size())

torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([128, 25]) torch.Size([128])
torch.Size([115, 25]) torch.Size([115])


In [108]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LSTMNet(nn.ModuleList):

    def __init__(self, params):
        super(LSTMNet, self).__init__()
        
        self.batch_size = params.batch_size
        self.hidden_dim = params.embedding_dim
        self.num_layers = params.num_layers
        self.input_size = params.num_words+1
        
        self.dropout = nn.Dropout(0.5)
        self.embedding = nn.Embedding(self.input_size, self.hidden_dim, padding_idx=0)
        self.lstm = nn.LSTM(input_size=self.hidden_dim, hidden_size=self.hidden_dim, num_layers=self.num_layers, batch_first=True)
        
        self.fc1 = nn.Linear(in_features=self.hidden_dim, out_features=256)
        self.fc2 = nn.Linear(256, 1)
        
    def forward(self, x):
        
        h = torch.zeros((self.num_layers, x.size(0), self.hidden_dim))
        c = torch.zeros((self.num_layers, x.size(0), self.hidden_dim))
        
        torch.nn.init.xavier_normal_(h)
        torch.nn.init.xavier_normal_(c)
        
        out = self.embedding(x)
        out, (hidden, cell) = self.lstm(out, (h,c))
        out = self.dropout(out)
        out = torch.relu_(self.fc1(out[:,-1,:]))
        out = self.dropout(out)
        out = torch.sigmoid(self.fc2(out))
        return out

In [109]:
import math
import torch.optim as optim

lstm_model = LSTMNet(params)
optimizer = optim.RMSprop(lstm_model.parameters(), lr=params.learning_rate)

loader_train = dsl.loader_train
loader_test = dsl.loader_test
y_train = dsl.y_train
y_test = dsl.y_test


# Starts training phase
for epoch in range(params.epochs):
    
    # Set model in training model
    lstm_model.train()
    train_predictions = []
    
    for x_batch, y_batch in loader_train:

        y_batch = y_batch.type(torch.FloatTensor)
            
        # Feed the model
        y_pred = lstm_model(x_batch)
            
        # Reshape y_pred to a vector
        y_pred = y_pred.view(-1)
         
        # Loss calculation
        loss = F.binary_cross_entropy(y_pred, y_batch)
         
        # Clean gradientes
        optimizer.zero_grad()
         
        # Gradients calculation
        loss.backward()
         
        # Gradients update
        optimizer.step()
         
        # Save predictions
        train_predictions += list(y_pred.detach().numpy())
        
    #Metrics calculation for train accuracy
     
    true_positives = 0
    true_negatives = 0
    
    for true, pred in zip(y_train, train_predictions):
        if (pred >= 0.5) and (true == 1):
            true_positives += 1
        elif (pred < 0.5) and (true == 0):
            true_negatives += 1
        else:
        	pass
    train_accuracy = (true_positives + true_negatives) / len(y_train)
    
    # Metrics calculation for test accuracy
    
    # Set the model in evaluation mode
    lstm_model.eval()
    test_predictions = []
    
    # Start evaluation phase
    with torch.no_grad():
        for x_batch, y_batch in loader_test:
            y_pred = lstm_model(x_batch)
            test_predictions += list(y_pred.detach().numpy())
    
    true_positives = 0
    true_negatives = 0
    
    for true, pred in zip(y_test, test_predictions):
        if (pred >= 0.5) and (true == 1):
            true_positives += 1
        elif (pred < 0.5) and (true == 0):
            true_negatives += 1
        else:
        	pass
    test_accuracy = (true_positives + true_negatives) / len(y_test)
     
    
    
    print("Epoch: %d, loss: %.5f, Train accuracy: %.5f, Test accuracy: %.5f" % (epoch+1, loss.item(), train_accuracy, test_accuracy))
        

Epoch: 1, loss: 0.67631, Train accuracy: 0.56470, Test accuracy: 0.58569
Epoch: 2, loss: 0.65785, Train accuracy: 0.58046, Test accuracy: 0.62180
Epoch: 3, loss: 0.52507, Train accuracy: 0.73120, Test accuracy: 0.73999
Epoch: 4, loss: 0.42199, Train accuracy: 0.81149, Test accuracy: 0.75378
Epoch: 5, loss: 0.30641, Train accuracy: 0.86289, Test accuracy: 0.74064
Epoch: 6, loss: 0.23199, Train accuracy: 0.88982, Test accuracy: 0.75246
Epoch: 7, loss: 0.32090, Train accuracy: 0.90591, Test accuracy: 0.75115
Epoch: 8, loss: 0.29578, Train accuracy: 0.92217, Test accuracy: 0.74524
Epoch: 9, loss: 0.22986, Train accuracy: 0.93596, Test accuracy: 0.73933
Epoch: 10, loss: 0.17212, Train accuracy: 0.94138, Test accuracy: 0.74261
Epoch: 11, loss: 0.18812, Train accuracy: 0.94516, Test accuracy: 0.74984
Epoch: 12, loss: 0.09530, Train accuracy: 0.95107, Test accuracy: 0.76494
Epoch: 13, loss: 0.11883, Train accuracy: 0.94910, Test accuracy: 0.75312
Epoch: 14, loss: 0.14110, Train accuracy: 0.958