<a href="https://colab.research.google.com/github/ravimashru/END-assignments/blob/main/Assignment_4_Sentiment_Analysis_IMDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# END - Assignment 4

In this assignment, [this notebook](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/2%20-%20Upgraded%20Sentiment%20Analysis.ipynb) was used as a starting point.



---
### Assignment
---

Change this code in such a way that:

1. it has 3 LSTM layers
2. it has used a for loop to do so in the forward function
3. the dropout value used is 0.2
4. trained on the text that is reversed (for example "my name is Rohan" becomes "Rohan is name my"
5. achieves 87% or more accuracy
6. once done, share the Github link as well (after training on Google Colab, move the file to GitHub).

---

In [26]:
print(torch. __version__)

1.7.0+cu101


In [None]:
# Colab System Specs
# https://colab.research.google.com/drive/151805XTDg--dgHb3-AXJCpnWaqRhop_2

# CPU info
# !cat /proc/cpuinfo

# Memory Info
# !cat /proc/meminfo

In [1]:
!nvidia-smi

Thu Nov 19 12:02:20 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
import torch
from torchtext import data
from torchtext import datasets

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize = 'spacy', include_lengths = True)
LABEL = data.LabelField(dtype = torch.float)

We then load the IMDb dataset.

In [3]:
from torchtext import datasets

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:07<00:00, 10.8MB/s]


Then create the validation set from our training set.

In [4]:
import random

train_data, valid_data = train_data.split(random_state = random.seed(SEED))

In [5]:
MAX_VOCAB_SIZE = 25_000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.200d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)

.vector_cache/glove.6B.zip: 862MB [06:30, 2.21MB/s]                          
100%|█████████▉| 399067/400000 [00:40<00:00, 10002.23it/s]

As before, we create the iterators, placing the tensors on the GPU if one is available.

Another thing for packed padded sequences all of the tensors within a batch need to be sorted by their lengths. This is handled in the iterator by setting `sort_within_batch = True`.

In [6]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    sort_within_batch = False,
    device = device, )

## Multiple LSTM Layers
The original notebook had 2 stacked LSTMs using the `num_layers` property of `torch.nn.LSTM`.

In this assignment, an instance of `torch.nn.ModuleList` was used to hold 3 individual instances of `torch.nn.LSTM` and a `for` loop was used to pass data between the LSTM cells.

In [7]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)

        self.rnns = nn.ModuleList([
                      nn.LSTM(embedding_dim, 
                           hidden_dim,
                           bidirectional=bidirectional),

                      nn.LSTM(hidden_dim, 
                           hidden_dim,
                           bidirectional=bidirectional),

                      nn.LSTM(hidden_dim, 
                           hidden_dim,
                           bidirectional=bidirectional)
        ])
        
        self.fc = nn.Linear(hidden_dim, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        self.inplace_dropout = nn.Dropout(dropout)
        
    def forward(self, text, text_lengths):
        
        embedded = self.dropout(self.embedding(text))

        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, enforce_sorted=False)
        
        # packed_output, (hidden, cell) = self.rnn(packed_embedded)
        for i, rnn in enumerate(self.rnns):
          packed_embedded, (hidden, cell) = rnn(packed_embedded)
          self.inplace_dropout(packed_embedded.data)
        
        # hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
        hidden = self.dropout(hidden)
            
        return self.fc(hidden)

## Reduced dropout
The value of the dropout had to be reduced from a value of 0.5 in the original notebook to 0.2.

## Training on reverse text
To understand how bi-directional LSTMs work, the `bidirectional` property was set to `False` and the input to the model was manually reversed using the `torch.flip` method.

This resulted in an accuracy as high as that obtained when using `bidirectional=True` which makes the behavior of that property clear.

In [8]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 200
HIDDEN_DIM = 256
OUTPUT_DIM = 1
BIDIRECTIONAL = False
DROPOUT = 0.2
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = RNN(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM,
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX)

We'll print out the number of parameters in our model. 

Notice how we have almost twice as many parameters as before!

In [9]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 6,522,321 trainable parameters


The final addition is copying the pre-trained word embeddings we loaded earlier into the `embedding` layer of our model.

We retrieve the embeddings from the field's vocab, and check they're the correct size, _**[vocab size, embedding dim]**_ 

In [10]:
pretrained_embeddings = TEXT.vocab.vectors

print(pretrained_embeddings.shape)

torch.Size([25002, 200])


We then replace the initial weights of the `embedding` layer with the pre-trained embeddings.

**Note**: this should always be done on the `weight.data` and not the `weight`!

In [11]:
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[-0.1117, -0.4966,  0.1631,  ..., -1.8542,  0.4022,  0.4238],
        [ 0.2078,  1.1879, -0.7320,  ...,  1.3663, -0.4598,  0.6668],
        [-0.0715,  0.0935,  0.0237,  ...,  0.3362,  0.0306,  0.2558],
        ...,
        [ 0.3127,  0.4397,  0.0985,  ..., -0.0854, -0.0217, -0.1378],
        [-0.4698, -0.2754, -0.1802,  ...,  0.1676, -0.1875,  0.3729],
        [ 0.9528, -0.1623, -0.2516,  ..., -0.4873,  0.1463,  0.2355]])

**Note**: like initializing the embeddings, this should be done on the `weight.data` and not the `weight`!

In [12]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

print(model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0715,  0.0935,  0.0237,  ...,  0.3362,  0.0306,  0.2558],
        ...,
        [ 0.3127,  0.4397,  0.0985,  ..., -0.0854, -0.0217, -0.1378],
        [-0.4698, -0.2754, -0.1802,  ...,  0.1676, -0.1875,  0.3729],
        [ 0.9528, -0.1623, -0.2516,  ..., -0.4873,  0.1463,  0.2355]])


We can now see the first two rows of the embedding weights matrix have been set to zeros. As we passed the index of the pad token to the `padding_idx` of the embedding layer it will remain zeros throughout training, however the `<unk>` token embedding will be learned.

## Train the Model

In [13]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters(), weight_decay=1e-4)

The rest of the steps for training the model are unchanged.

We define the criterion and place the model and criterion on the GPU (if available)...

In [14]:
criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

100%|█████████▉| 399067/400000 [01:00<00:00, 10002.23it/s]

We implement the function to calculate accuracy...

In [15]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

We define a function for training our model. 

As we have set `include_lengths = True`, our `batch.text` is now a tuple with the first element being the numericalized tensor and the second element being the actual lengths of each sequence. We separate these into their own variables, `text` and `text_lengths`, before passing them to the model.

**Note**: as we are now using dropout, we must remember to use `model.train()` to ensure the dropout is "turned on" while training.

In [16]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        text, text_lengths = batch.text

        # Reverse the sentences
        # dim 0 => sentence
        # dim 1 => batch
        reverse_text = torch.flip(text, [0])

        combined_text = torch.cat([text, reverse_text], axis=1)
        combined_labels = torch.cat([batch.label, batch.label])
        combined_lengths = torch.cat([text_lengths, text_lengths])
        
        predictions = model(combined_text, combined_lengths.cpu()).squeeze()

        loss = criterion(predictions, combined_labels)
        
        acc = binary_accuracy(predictions, combined_labels)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Then we define a function for testing our model, again remembering to separate `batch.text`.

**Note**: as we are now using dropout, we must remember to use `model.eval()` to ensure the dropout is "turned off" while evaluating.

In [17]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_lengths = batch.text
            
            predictions = model(text, text_lengths.cpu()).squeeze()
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

And also create a nice function to tell us how long our epochs are taking.

In [18]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

Finally, we train our model...

In [19]:
N_EPOCHS = 20

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        print('Saving best model weights...')
        torch.save(model.state_dict(), 'tut2-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Saving best model weights...
Epoch: 01 | Epoch Time: 3m 50s
	Train Loss: 0.679 | Train Acc: 55.14%
	 Val. Loss: 0.584 |  Val. Acc: 69.85%
Epoch: 02 | Epoch Time: 3m 47s
	Train Loss: 0.620 | Train Acc: 62.51%
	 Val. Loss: 0.631 |  Val. Acc: 63.81%
Saving best model weights...
Epoch: 03 | Epoch Time: 3m 47s
	Train Loss: 0.599 | Train Acc: 64.12%
	 Val. Loss: 0.479 |  Val. Acc: 78.67%
Saving best model weights...
Epoch: 04 | Epoch Time: 3m 47s
	Train Loss: 0.543 | Train Acc: 68.04%
	 Val. Loss: 0.463 |  Val. Acc: 79.64%
Saving best model weights...
Epoch: 05 | Epoch Time: 3m 47s
	Train Loss: 0.595 | Train Acc: 63.90%
	 Val. Loss: 0.402 |  Val. Acc: 83.14%
Saving best model weights...
Epoch: 06 | Epoch Time: 3m 48s
	Train Loss: 0.498 | Train Acc: 70.21%
	 Val. Loss: 0.401 |  Val. Acc: 84.56%
Saving best model weights...
Epoch: 07 | Epoch Time: 3m 46s
	Train Loss: 0.460 | Train Acc: 72.20%
	 Val. Loss: 0.322 |  Val. Acc: 86.53%
Epoch: 08 | Epoch Time: 3m 48s
	Train Loss: 0.432 | Train Acc: 

...and get our new and vastly improved test accuracy!

In [20]:
model.load_state_dict(torch.load('tut2-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.314 | Test Acc: 87.50%


## User Input

We can now use our model to predict the sentiment of any sentence we give it. As it has been trained on movie reviews, the sentences provided should also be movie reviews.


In [21]:
import spacy
nlp = spacy.load('en')

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    length = [len(indexed)]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    length_tensor = torch.LongTensor(length)
    prediction = torch.sigmoid(model(tensor, length_tensor))
    return prediction.item()

An example negative review...

In [22]:
predict_sentiment(model, "This film is terrible")

0.32540008425712585

In [23]:
predict_sentiment(model, "terrible is film This")

0.3562086522579193

An example positive review...

In [24]:
predict_sentiment(model, "This film is great")

0.8533573150634766

In [25]:
predict_sentiment(model, "great is film This")

0.8867491483688354

In [27]:
def rev_sentence(sentence): 
  
    # first split the string into words 
    words = sentence.split(' ')  
  
    # then reverse the split string list and join using space 
    reverse_sentence = ' '.join(reversed(words))  
  
    # finally return the joined string 
    return reverse_sentence   

In [28]:
sentence = "Totally complete sci-fi comic book action movie with an excellent performance from Downey supported by a simple but solid script, superb effects and brilliant score."

rev_txt = rev_sentence(sentence)

rev_txt

'score. brilliant and effects superb script, solid but simple a by supported Downey from performance excellent an with movie action book comic sci-fi complete Totally'

In [29]:
predict_sentiment(model, rev_txt)

0.9568549990653992