# Recurrent Neural Network: LSTM vs. GRU

Yiren Zhou (yz2365)

This notebook implements GRU for sentiment analysis to detect when a review on IMDB is positive or negative. The result is compared with the LSTM one, which has already been implemented at https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/2%20-%20Upgraded%20Sentiment%20Analysis.ipynb [1]. We will compare their result in sentiment analysis and describle their behavior. The simpler version RNN is also referenced to understand the code: https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/1%20-%20Simple%20Sentiment%20Analysis.ipynb [2].

## 1. Preparing Data

In [1]:
# import the required packages
import torch
from torchtext import data
from torchtext import datasets
import random

# set seed as 1234 to make the result reproducible
SEED = 1234

torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

# the parameter of field defines how the data should be processed
# TEXT field to handle the review, use spaCy as tokenizer
TEXT = data.Field(tokenize='spacy')
# LABEL field to handle the sentiment
LABEL = data.LabelField(tensor_type=torch.FloatTensor)

# the dataset IMDB is downloaded
# first split the data into train set and test set, the default split is 70/30
train, test = datasets.IMDB.splits(TEXT, LABEL)

# further splict the train set into train and validation set
train, valid = train.split(random_state=random.seed(SEED))

In [2]:
# build the vocabulary dictionary for the worlds with top 25000 frequency
# train on 6 billion tokens and 100 dimensions
# The glove is the algorithm to calculate the vectors
TEXT.build_vocab(train, max_size=25000, vectors="glove.6B.100d")
LABEL.build_vocab(train)

In [3]:
# create the iterator, sort the examples use sort_key
# use the length of the sentences, and then partitions them into buckets

BATCH_SIZE = 64

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train, valid, test), 
    batch_size=BATCH_SIZE, 
    sort_key=lambda x: len(x.text), 
    repeat=False)

## Build the Model
GRU, like LSTM, also belongs to the family of RNN. We take one-hot vectors as input and output a fuzzy classification between 0-1.

In [4]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        """
        define the layer of the module. The layers contain embedding layer, linear layer and RNN. 
        vocab_size: length of TEXT.vocab
        embedding_dim: dimension of the embedded word vector 
        hidden_dim: size of hidden states
        output_dim: dimension of output, scalar 0/1 lead to dimension 1
        n_layers: number of layers in the recurrent neural network
        bidirectional: whether to add an extra layer that processes values from last to first, hence bidirectional
        dropout: a regularization method to avoid overfitting, drops nodes to decrease model complexity
        """
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        # here use GRU instead of LSTM 
        self.rnn = nn.GRU(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional, dropout=dropout)
        self.fc = nn.Linear(hidden_dim*2, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        """
        The forward method is called when we feed examples into our model. 
        For GRU, we do not have the cell state, therefore it is removed.
        """
        
        #x = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(x))
        
        #embedded = [sent len, batch size, emb dim]
        
        output, hidden = self.rnn(embedded)
        
        #output = [sent len, batch size, hid dim * num directions]
        #hidden = [num layers * num directions, batch size, hid. dim]
        #cell = [num layers * num directions, batch size, hid. dim]
        
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
                
        #hidden [batch size, hid. dim * num directions]
            
        return self.fc(hidden.squeeze(0))

In [5]:
# input dimension equal to the vocabulary size
INPUT_DIM = len(TEXT.vocab)
# embedding dimension is the size of the dense word vectors
# hidden dimension is the size of the hidden states
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
# output dimension is 1, since 1/0 is a scalar
OUTPUT_DIM = 1
# number of layers equals 2
N_LAYERS = 2
# use the bidirectional RNN
BIDIRECTIONAL = True
# Define the dropout rate for regularization
DROPOUT = 0.5

# setup the model with predefined parameters
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS, BIDIRECTIONAL, DROPOUT)

In [6]:
# check the size of pretrained embeddings
pretrained_embeddings = TEXT.vocab.vectors

print(pretrained_embeddings.shape)

torch.Size([25002, 100])


In [7]:
# assign pretrained embeddings to GRU's embedding layer 
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-0.4096, -0.5753,  0.1126,  ...,  0.4092,  0.1856,  0.1066],
        [ 0.2110, -0.2472,  0.6508,  ..., -0.1627,  0.4507, -1.1627],
        [-0.2379, -0.1095,  0.4314,  ...,  0.6665,  0.3200,  0.8872]])

## Train the Model

We use adaptive moment estimation (Adam) to train the GRU.

In [8]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

In [9]:
# The loss function is binary cross entropy with logits 
criterion = nn.BCEWithLogitsLoss()

# use GPU if detected, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Set model and criterion with the device 
model = model.to(device)
criterion = criterion.to(device)

In [10]:
# calculate how many rounded predictions equal the actual labels and average it across the batch
# evaluate the accuracy of the algorithm
import torch.nn.functional as F

def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(F.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum()/len(correct)
    return acc

In [11]:
def train(model, iterator, optimizer, criterion):
    """
    the training function for GRU 
    For each batch, the gradients are first set to 0. 
    Then the batch of sentences is fed into the model. 
    We finally calculate the loss and accuracy, the gradient of each parameter
    and change the parameters with Adam algorithm.
    """
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.text).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [12]:
def evaluate(model, iterator, criterion):
    """
    the evaluate function for GRU 
    do not set the gradients to 0 or update model parameters
    """
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [13]:
# number of epochs equals 5
N_EPOCHS = 5

# repeat 5 epochs and print the train loss, train accuracy, validation loss and validation accuracy
for epoch in range(N_EPOCHS):

    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    print(f'Epoch: {epoch+1:02}, Train Loss: {train_loss:.3f}, Train Acc: {train_acc*100:.2f}%, Val. Loss: {valid_loss:.3f}, Val. Acc: {valid_acc*100:.2f}%')

  return Variable(arr, volatile=not train)


Epoch: 01, Train Loss: 0.621, Train Acc: 63.55%, Val. Loss: 0.406, Val. Acc: 82.49%
Epoch: 02, Train Loss: 0.343, Train Acc: 85.62%, Val. Loss: 0.274, Val. Acc: 89.16%
Epoch: 03, Train Loss: 0.225, Train Acc: 91.45%, Val. Loss: 0.257, Val. Acc: 89.68%
Epoch: 04, Train Loss: 0.161, Train Acc: 94.02%, Val. Loss: 0.247, Val. Acc: 89.95%
Epoch: 05, Train Loss: 0.120, Train Acc: 95.64%, Val. Loss: 0.268, Val. Acc: 89.83%


In [14]:
# calculate the test loss/accuracy of the GRU model for the test set

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f}, Test Acc: {test_acc*100:.2f}%')

  return Variable(arr, volatile=not train)


Test Loss: 0.333, Test Acc: 87.42%


## User Input

In [15]:
import spacy
nlp = spacy.load('en')

def predict_sentiment(sentence):
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    prediction = F.sigmoid(model(tensor))
    return prediction.item()

In [16]:
predict_sentiment("This film is terrible")



0.03237535059452057

In [17]:
predict_sentiment("This film is great")



0.9510335922241211

## Conclusion

For the LSTM implementation in [1], we have that Test Loss: 0.384, Test Acc: 85.34%.

For our GRU implementation, we have that Test Loss: 0.333, Test Acc: 87.42%.

After examiing the value of validation loss and validation accuracy of the model in the test set, we conclude that the accuracy of GRU is a little bit higher than LSTM. Therefore, GRU is better on sentiment analysis for this IMDB data set.