# Text Classification with Neural Networks
In this notebook, machine learning models are implemented to predict the sentiment of movie reviews using the [IMDb movie reviews dataset](https://ai.stanford.edu/~amaas/data/sentiment/) through the `torchtext` package. Specifically, classifiers based on Convolutional Neural Networks (CNN's) and Recurrent Neural Networks (RNN's) are implemented.

## Load Packages

First download the dataset using [torchtext](https://torchtext.readthedocs.io/en/latest/index.html), which is a package that supports NLP for PyTorch. 

The `torchdata` package will need to be installed prior to running. 

In [2]:
import random
from collections import defaultdict
from tqdm.notebook import tqdm

import torch
import torchtext
from torch.utils import data

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

random.seed(22)

In [3]:
# set device to cuda for faster model training
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', DEVICE)

Using device: cuda


## Preprocess Data

The train and test data are preprocessed and tokenized.

In [4]:
def preprocess(review):
  res = []
  for x in review.split(' '):
    remove_beg=True if x[0] in {'(', '"', "'"} else False
    remove_end=True if x[-1] in {'.', ',', ';', ':', '?', '!', '"', "'", ')'} else False
    if remove_beg and remove_end: res += [x[0], x[1:-1], x[-1]]
    elif remove_beg: res += [x[0], x[1:]]
    elif remove_end: res += [x[:-1], x[-1]]
    else: res += [x]
  return res

train_data = torchtext.datasets.IMDB(root='.data', split='train')
train_data = list(train_data)
train_data = [(x[0], preprocess(x[1])) for x in train_data]
train_data, test_data = train_data[0:10000] + train_data[12500:12500+10000], train_data[10000:12500] + train_data[12500+10000:], 

print('Num. Train Examples:', len(train_data))
print('Num. Test Examples:', len(test_data))

Num. Train Examples: 20000
Num. Test Examples: 5000


Here some examples of reviews with a score and text from the pre-processed training data.

In [5]:
print("\nSAMPLE DATA:")
for x in random.sample(train_data, 5):
  print('Sample label:', x[0])
  print('Sample text:', ' '.join(x[1]), '\n')


SAMPLE DATA:
Sample label: neg
Sample text: I felt asleep , watching it!! ! ( and I had tickets for the midnight- premiere ) Any questions ? The most disturbing scene , as far as I can remember , was the techno-dance-i-dont-know-what-that-was-scene . By the way what an ending! ? 

Sample label: neg

Sample label: neg
Sample text: I saw this regurgitated pile of vignettes tonight at a preview screening and I was straight up blown away by how bad it was . <br /><br />First off , the film practically flaunted its gaping blind spots . There are no black or gay New Yorkers in love ? Or who , say , know the self-involved white people in love ? I know it's not the love Crash of anvil-tastic inclusiveness but you can't pretend to have a cinematic New York with out these fairly prevalent members of society . Plus , you know the people who produced this ish thought Crash deserved that ham-handed Oscar , so where is everyone ? <br /><br />Possibly worse than the bizarre and willful socioeconomic

## Dataloader: The `TextDataset` Class

The dataset contains the tokenized data for the model. The following functions are implemented: 

* **Create Dictionaries** (i.e., `build_dictionary(self)`): Creates the dictionaries `idx2word` and `word2idx`. Each word in the dataset is represented with a unique index, that is stored in the dictionaries. The hyperparameter `threshold` is used to control word appearances in the dictionary.

* **Text Conversion** (i.e., `convert_text(self)`): Converts each review in the dataset to a list of indices, given by the `word2idx` dictionary. This is stored in the `textual_ids` variable. Words not present in the `word2idx` dictionary are replaced with the `<UNK>` token.

* **Get Text** (i.e., `get_text(self, idx)`): Returns the review at `idx` in the dataset as an array of indices corresponding to the words in the review. Padding or truncation is applied to the review. 

* **Get Label** (i.e., `get_label(self, idx)`): Returns the value `1` if the label for `idx` in the datset is `positive`, and `0` if it is `negative`. 

* **Get Number of Reviews** (i.e., `__len__(self)`): Returns the total number of reviews in the dataset.

* **Get Item** (i.e., `__getitem__(self, idx)`): Returns the converted text, and the label.

In [6]:
PAD = '<PAD>'
END = '<END>'
UNK = '<UNK>'

In [7]:
class TextDataset(data.Dataset):
  def __init__(self, examples, split, threshold, max_len, idx2word=None, word2idx=None):
    self.examples = examples
    assert split in {'train', 'val', 'test'}
    self.split = split
    self.threshold = threshold
    self.max_len = max_len

    # dictionaries
    self.idx2word = idx2word
    self.word2idx = word2idx
    if split == 'train':
      self.build_dictionary()
    self.vocab_size = len(self.word2idx)
    
    # convert text to indices
    self.textual_ids = []
    self.convert_text()
  
  def build_dictionary(self): 
    assert self.split == 'train'
    self.idx2word = {0:PAD, 1:END, 2: UNK}
    self.word2idx = {PAD:0, END:1, UNK: 2}

    # build dictionaries
    word_freq = defaultdict(float)
    for review in self.examples:
      text = review[1] # text of review
      for word in text:
        word_freq[word.lower()] += 1 # increment word frequency (lower cased)
    for word in word_freq.keys():
      if word_freq[word] >= self.threshold: # add new word and id to dictionaries once threshold is exceeded
        new_id = len(self.idx2word)
        self.idx2word[new_id] = word
        self.word2idx[word] = new_id
  
  def convert_text(self):
    for review in self.examples:
      label = review[0]
      text = review[1]
      converted_text = []

      # text conversion
      converted_text = list(map(lambda word: self.word2idx[word] if word in self.word2idx.keys() else self.word2idx[UNK], text))
      converted_text.append(self.word2idx[END])
      self.textual_ids.append((label, converted_text))

  def get_text(self, idx):
    text = self.textual_ids[idx][1]

    # pad or truncate the review as needed
    updated_text = text[:self.max_len] if len(text) >= self.max_len else text + [0]*(self.max_len-len(text))
    updated_text = torch.tensor(updated_text)

    return updated_text
  
  def get_label(self, idx):
    label = self.textual_ids[idx][0]
    score = 1 if label == 'pos' else 0
    score = torch.tensor(score) # sentiment score of the review as a tensor

    return score

  def __len__(self):
    num = len(self.examples) # number of reviews

    return num
  
  def __getitem__(self, idx):
    # get the processed text and score of the review
    review, label = self.get_text(idx), self.get_label(idx)

    return review, label

The IMDb movie reviews are processed into a `TextDataset` object. Here is an example of a review item in the dataset.

In [8]:
train_dataset = TextDataset(train_data, 'train', threshold=10, max_len=150)
print('Vocab size:', train_dataset.vocab_size, '\n')

randidx = random.randint(0, len(train_dataset)-1)
text, label = train_dataset[randidx]
print('Example text:')
print(train_data[randidx][1])
print(text)
print('\nExample label:')
print(train_data[randidx][0])
print(label)

Vocab size: 19002 

Example text:
['"', 'A', 'Guy', 'Thing', '"', 'tries', 'to', 'capture', 'the', 'feeling', 'of', '"', "There's", 'Something', 'About', 'Mary', '"', 'or', '"', 'Meet', 'the', 'Parents', '"', 'but', 'comes', 'off', 'more', 'like', 'it', 'was', 'edited', 'up', 'out', 'of', 'cutting-room', 'rejects', 'of', 'those', 'two', 'films', '.', 'Thankfully', 'I', 'rented', 'it', 'on', 'a', '5-day', 'rental', 'because', 'I', "couldn't", 'sit', 'and', 'watch', 'more', 'than', '20', 'minutes', 'at', 'a', 'time.<br', '/><br', '/>The', 'premise', 'is', 'decent', 'and', 'I', 'liked', 'the', 'scenes', 'where', 'other', 'guys', 'automatically', 'cover', 'up', 'for', "Paul's", 'missteps', '(', 'the', 'checker', 'at', 'the', 'Save-mart', 'was', 'great', ')', 'but', 'the', 'script-writing', 'is', 'absolutely', 'horrible', '.', 'The', 'dialog', 'falls', 'flat', 'most', 'of', 'the', 'time', 'and', 'just', 'when', 'you', 'think', 'that', 'things', 'are', 'finally', 'going', 'to', 'get', 'on', 

## Convolutional Neural Network: The `CNN` Class
The convolutional neural network for text classification is defined here. The following functions are implemented:

* **Initialization** (i.e., `__init__(self, ...)`): Initializes the following layers and other features for the neural network.
    * An embedding layer to represent the words in the vocabulary.
    * Multiple convolution layers with filter size based on different filter heights. There is one input channel, and specific number of output channels for different filters to train the model on.
    * A dropout layer with specified `dropout`.
    * A linear layer with concatenated output of all cnn layers as input (i.e., size is number of out channels across all filter heights) with output to each class (i.e. size is number of classes).

* **Feed Forward** (i.e., `forward(self, texts)`): Iteratively pass texts (with shape `[batch_size, max_len]`) through layers and compute the output as described below. Since each CNN looks at different filter heights (i.e. n-grams), the model will look at different number of words at a time. Each CNN will learn `out_channel` number of features from the words it sees at a time. Then non-linearity and max-pooling are applied to all channels to find any important n-grams in the entire text. Everything is performed on a batch simultaneously, hence the `batch_size` dimension is used in many of the steps discussed below.
    * Pass texts through the embedding layer. The output shape is `[batch_size, max_len, embed_size]`.
    * Modify dimensions of embedded output to fit into multiple convolution layers (i.e., output shape is `[batch_size, 1, max_len, embed_size]`).
    * Pass the text embeddings into each convolution layer. Output will have shape `[batch_size, out_channels, *, 1]` where `*` depends on `filter_height` and `stride`.
    * Convert the output shape to `[batch_size, out_channels, *]`.
    * Apply non-linearity on the output with ReLU function.
    * Take the max value across last dimension to have shape `[batch_size, out_channels]`
    * Concatenate outputs from all CNN layers with shape `[batch_size, (out_channels*num_cnn_layers)]`
    * Pass the output through the linear layer with shape `[batch_size, num_classes]`

In [9]:
class CNN(nn.Module):
  def __init__(self, vocab_size, embed_size, out_channels, filter_heights, stride, dropout, num_classes, pad_idx):
    super(CNN, self).__init__()

    # embedding layer
    self.embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_size, padding_idx=pad_idx)
    # multiple convolution layers
    self.conv_layers = nn.ModuleList( [nn.Conv2d(in_channels=1, out_channels=out_channels, kernel_size=[filter_height, embed_size]) for filter_height in filter_heights] )
    # dropout layer
    self.dropout = nn.Dropout(dropout)
    # linear layer
    self.linear = nn.Linear(out_channels * len(filter_heights), num_classes)

  def forward(self, texts):
    # embed texts -- [batch_size, max_len, embed_size]
    x = self.embedding(texts)
    # modify embedded output dimensions -- [batch_size, 1, MAX_LEN, embed_size] (i.e., unsqueeze)
    x = x.unsqueeze(1)
    # pass embedded texts into cnn layers -- [batch_size, out_channels, *, 1]
    # modify dimension of outputs -- [batch_size, out_channels, *] (i.e., squeeze)
    # apply non-linearity
    x = [ F.relu( conv_layer(x) ).squeeze(3) for conv_layer in self.conv_layers ]
    # get max value across last dimension via pooling -- [batch_size, out_channels]
    x = [ F.max_pool1d( value, value.size(2) ).squeeze(2) for value in x ]
    # concatenate outputs from cnn layers -- [batch_size, (out_channels*num_of_cnn_layers)]
    x = torch.cat(x, 1)
    # apply dropout
    x = self.dropout(x)
    # pass output through linear layer -- [batch_size, num_classes]
    x = self.linear(x)

    # a softmax is applied later in model training
    return x

In [10]:
# helper function to count number of parameters in a model
count_parameters = lambda model: sum(p.numel() for p in model.parameters() if p.requires_grad)

### Model Training

First, the train and test dataloaders are initialized. A dataloader is responsible for providing batches of data to the model. The datasets for the train and test data are instantiated first, then the training vocabulary is used for both.

In [11]:
# adjustable parameters (defaults: THRESHOLD=5, MAX_LEN=200, BATCH_SIZE=32)
THRESHOLD = 5
MAX_LEN = 200
BATCH_SIZE = 32

train_dataset = TextDataset(train_data, 'train', THRESHOLD, MAX_LEN)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True)
print('train vocab size:', train_dataset.vocab_size)

test_dataset = TextDataset(test_data, 'test', THRESHOLD, MAX_LEN, train_dataset.idx2word, train_dataset.word2idx)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=1, drop_last=False)
print('test vocab size:', test_dataset.vocab_size)

train vocab size: 29730
test vocab size: 29730


Here is a function to train the CNN model on the data.

If it is necessary to save the model periodically, here is a resource for that: https://pytorch.org/tutorials/beginner/saving_loading_models.html.

In [12]:
def train_model(model, num_epochs, data_loader, optimizer, criterion):
  print('Training Model...')
  model.train()
  for epoch in tqdm(range(num_epochs)):
    epoch_loss = 0
    epoch_acc = 0
    for texts, labels in data_loader:
      texts = texts.to(DEVICE) # shape: [batch_size, MAX_LEN]
      labels = labels.to(DEVICE) # shape: [batch_size]

      optimizer.zero_grad()

      output = model(texts)
      acc = accuracy(output, labels)
      
      loss = criterion(output, labels)
      loss.backward()
      optimizer.step()

      epoch_loss += loss.item()
      epoch_acc += acc.item()
    print('[TRAIN]\t Epoch: {:2d}\t Loss: {:.4f}\t Train Accuracy: {:.2f}%'.format(epoch+1, epoch_loss/len(data_loader), 100*epoch_acc/len(data_loader)))
  print('Model Trained!\n')

Here is a helper function to compute model accuracy per batch.

In [13]:
def accuracy(output, labels):
  preds = output.argmax(dim=1) # find predicted class
  correct = (preds == labels).sum().float() # convert into float for division 
  acc = correct / len(labels)
  return acc

Model instantiantiation with hyperparameters.

Recommended hyperparameters: 
* Vocabulary size: `train_dataset.vocab_size` (Don't change)
* Embedding size: `128`
* Number of out channels: `64`
* Filter heights (i.e., n-gram sizes): `[2, 3, 4]`
* Stride: `1`
* Dropout: `0.5`
* Number of classes: `2` (Don't change)
* Pad token index: `train_dataset.word2idx[PAD]` (Don't change)

In [14]:
cnn_model = CNN(vocab_size = train_dataset.vocab_size,
            embed_size = 128, 
            out_channels = 64, 
            filter_heights = [2, 3, 4], 
            stride = 1, 
            dropout = 0.5, 
            num_classes = 2,
            pad_idx = train_dataset.word2idx[PAD])

# load the model on the device (cuda or cpu)
cnn_model = cnn_model.to(DEVICE)

print('The model has {:,d} trainable parameters'.format(count_parameters(cnn_model)))

The model has 3,879,746 trainable parameters


The criterion, or loss function, is cross-entropy loss (https://en.wikipedia.org/wiki/Cross_entropy).

The optimizer, which performs gradient descent, is the Adam optimizer (https://arxiv.org/pdf/1412.6980.pdf).

In [15]:
LEARNING_RATE = 5e-4 # adjustable parameter (default=5e-4)

# loss function
criterion = nn.CrossEntropyLoss().to(DEVICE)

# optimizer
optimizer = optim.Adam(cnn_model.parameters(), lr=LEARNING_RATE)

Here the model is trained (it is recommended to use GPU for faster training time ~4 minutes).

In [16]:
N_EPOCHS = 10 # adjustable parameter (default=10)

# train model for N_EPOCHS epochs
train_model(cnn_model, N_EPOCHS, train_loader, optimizer, criterion)

Training Model...


  0%|          | 0/10 [00:00<?, ?it/s]

[TRAIN]	 Epoch:  1	 Loss: 0.6779	 Train Accuracy: 61.05%
[TRAIN]	 Epoch:  2	 Loss: 0.5426	 Train Accuracy: 72.02%
[TRAIN]	 Epoch:  3	 Loss: 0.4892	 Train Accuracy: 76.11%
[TRAIN]	 Epoch:  4	 Loss: 0.4473	 Train Accuracy: 79.19%
[TRAIN]	 Epoch:  5	 Loss: 0.4126	 Train Accuracy: 81.21%
[TRAIN]	 Epoch:  6	 Loss: 0.3774	 Train Accuracy: 83.26%
[TRAIN]	 Epoch:  7	 Loss: 0.3465	 Train Accuracy: 84.80%
[TRAIN]	 Epoch:  8	 Loss: 0.3096	 Train Accuracy: 86.72%
[TRAIN]	 Epoch:  9	 Loss: 0.2774	 Train Accuracy: 88.42%
[TRAIN]	 Epoch: 10	 Loss: 0.2426	 Train Accuracy: 90.01%
Model Trained!



### Model Evaluation

Here is a function to perform evaluation of the trained model.

In [17]:
def evaluate(model, data_loader, criterion, use_tqdm=False):
  print('Evaluating performance on the test dataset...')
  model.eval()
  epoch_loss = 0
  epoch_acc = 0
  all_predictions = []
  print("\nSOME PREDICTIONS FROM THE MODEL:")
  iterator = tqdm(data_loader) if use_tqdm else data_loader
  total = 0
  for texts, labels in iterator:
    bs = texts.shape[0]
    total += bs
    texts = texts.to(DEVICE)
    labels = labels.to(DEVICE)
    
    output = model(texts)
    acc = accuracy(output, labels) * len(labels)
    pred = output.argmax(dim=1)
    all_predictions.append(pred)
    
    loss = criterion(output, labels) * len(labels)
    
    epoch_loss += loss.item()
    epoch_acc += acc.item()

    if random.random() < 0.0015 and bs == 1:
      print("Prediction:", pred.item(), '\tCorrect Output:', labels.item())
      print("Input: "+' '.join([data_loader.dataset.idx2word[idx] for idx in texts[0].tolist() if idx not in {data_loader.dataset.word2idx[PAD], data_loader.dataset.word2idx[END]}]), '\n')

  full_acc = 100*epoch_acc/total
  full_loss = epoch_loss/total
  print('[TEST]\t Loss: {:.4f}\t Accuracy: {:.2f}%'.format(full_loss, full_acc))
  predictions = torch.cat(all_predictions)
  return predictions, full_acc, full_loss

In [18]:
evaluation = evaluate(cnn_model, test_loader, criterion, use_tqdm=True) # compute test data accuracy

Evaluating performance on the test dataset...

SOME PREDICTIONS FROM THE MODEL:


  0%|          | 0/5000 [00:00<?, ?it/s]

Prediction: 1 	Correct Output: 0
Input: <UNK> thinking of the revelation that the main character in " <UNK> " comes to at films end , <UNK> am reminded of last years " <UNK> " with <UNK> <UNK> . <UNK> only difference between the two films is the literal physical weight of the characters.<br /><br <UNK> understated , yet entirely realistic portrayal of small town life . <UNK> title is cause for contemplation . <UNK> , we , the audience are the ones in the " <UNK> " as we are given no <UNK> in the films slim 90 minute running time . <UNK> reactions were often smug and judgmental , clearly indicating how detached people can be from seeing any thread of humanity in characters so foreign to themselves . <UNK> characters are the ones people refer to as those that put <UNK> <UNK> . back in office for a second <UNK> /><br <UNK> <UNK> to consider how reality television has spoiled our sense of reality when watching an audience jump to their feet for the exit as soon as the credits role . <UNK> 

## Recurrent Neural Network: The RNN Class

The text classification model based on recurrences is defined here. The following functions are implemented:

* **Initialization** (i.e., `__init__(self, ...)`): Initializes the following layers and other features for the neural network.
    * An embedding layer to represent the words in the vocabulary.
    * A recurrent network (i.e. GRU) with `batch_first=True` and `bidirectional`.
    * A dropout layer with specified `dropout`.
    * A linear layer takes output of the last time step as input with output to each class (i.e. size is number of classes). In the bidirectional case, the outputs of the last time step of the forward and backward directions are concatenated.

* **Feed Forward** (i.e., `forward(self, texts)`): Pass texts (with shape `[batch_size, max_len]`) through layers and compute the output as described below.
    * Pass texts through the embedding layer. The output shape is `[batch_size, max_len, embed_size]`.
    * Pass result through recurrent network.
    * Concatenate the outputs of the last timestep for each direction.
    * Apply dropout.
    * Pass the output through the linear layer with shape `[batch_size, num_classes]`

In [19]:
class RNN(nn.Module):
  def __init__(self, vocab_size, embed_size, hidden_size, num_layers, bidirectional, dropout, num_classes, pad_idx):
    super(RNN, self).__init__()

    self.bidirectional = bidirectional

    # embedding layer
    self.embedding = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_size, padding_idx=pad_idx)
    # recurrent network (GRU)
    self.rnn = nn.GRU(input_size=embed_size, hidden_size=hidden_size, num_layers=num_layers, dropout=dropout, batch_first=True, bidirectional=bidirectional)
    # dropout layer
    self.dropout = nn.Dropout(dropout)
    # linear layer
    B = 2 if bidirectional else 1
    self.linear = nn.Linear(B*hidden_size, num_classes)

  def forward(self, texts):
    # embed texts -- [batch_size, max_len, embed_size]
    x = self.embedding(texts)
    # rnn layer
    x, hn = self.rnn(x)
    # concatenate the outputs of the last timestep for each direction -- [batch_size, num_dirs*hidden_size]
    x = torch.cat( (hn[-2, :, :], hn[-1, :, :]), dim=1 ) if self.bidirectional else hn[-1, :, :]
    # apply dropout
    x = self.dropout(x)
    # pass output through the linear layer -- [batch_size, num_classes]
    x = self.linear(x)

    # softmax is applied in training
    return x

### Model Training

First, the train and test dataloaders are initialized.

In [20]:
# adjustable parameters (defaults: THRESHOLD=5, MAX_LEN=200, BATCH_SIZE=32)
THRESHOLD = 5
MAX_LEN = 200
BATCH_SIZE = 32

train_dataset = TextDataset(train_data, 'train', THRESHOLD, MAX_LEN)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True)
print('train vocab size:', train_dataset.vocab_size)

test_dataset = TextDataset(test_data, 'test', THRESHOLD, MAX_LEN, train_dataset.idx2word, train_dataset.word2idx)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=1, drop_last=False)
print('test vocab size:', test_dataset.vocab_size)

train vocab size: 29730
test vocab size: 29730


Model instantiantiation with hyperparameters.

Recommended hyperparameters: 
* Vocabulary size: `train_dataset.vocab_size` (Don't change)
* Embedding size: `128`
* Hidden size: `128`
* Number of layers: `2`
* Bidirectional: `True`
* Dropout: `0.5`
* Number of classes: `2` (Don't change)
* Pad token index: `train_dataset.word2idx[PAD]` (Don't change)

In [21]:
rnn_model = RNN(vocab_size = train_dataset.vocab_size,
            embed_size = 128, 
            hidden_size = 128, 
            num_layers = 2,
            bidirectional = True,
            dropout = 0.5,
            num_classes = 2,
            pad_idx = train_dataset.word2idx[PAD])

# load your model on device
rnn_model = rnn_model.to(DEVICE)

print('The model has {:,d} trainable parameters'.format(count_parameters(rnn_model)))

The model has 4,300,546 trainable parameters


Here, the same criterion and optimizer are used as with the CNN.

In [22]:
LEARNING_RATE = 5e-4 # adjustable parameter (default=5e-4)

# loss function
criterion = nn.CrossEntropyLoss().to(DEVICE)

# optimizer
optimizer = optim.Adam(rnn_model.parameters(), lr=LEARNING_RATE)

Here the model is trained (it is recommended to use GPU for faster training time ~2 minutes).

In [23]:
N_EPOCHS = 10 # adjustable parameter (default=6)

# train model for N_EPOCHS epochs
train_model(rnn_model, N_EPOCHS, train_loader, optimizer, criterion)

Training Model...


  0%|          | 0/10 [00:00<?, ?it/s]

[TRAIN]	 Epoch:  1	 Loss: 0.6702	 Train Accuracy: 58.11%
[TRAIN]	 Epoch:  2	 Loss: 0.5663	 Train Accuracy: 71.33%
[TRAIN]	 Epoch:  3	 Loss: 0.4362	 Train Accuracy: 80.58%
[TRAIN]	 Epoch:  4	 Loss: 0.3320	 Train Accuracy: 86.39%
[TRAIN]	 Epoch:  5	 Loss: 0.2618	 Train Accuracy: 89.97%
[TRAIN]	 Epoch:  6	 Loss: 0.2011	 Train Accuracy: 92.58%
[TRAIN]	 Epoch:  7	 Loss: 0.1458	 Train Accuracy: 94.72%
[TRAIN]	 Epoch:  8	 Loss: 0.0974	 Train Accuracy: 96.78%
[TRAIN]	 Epoch:  9	 Loss: 0.0678	 Train Accuracy: 97.82%
[TRAIN]	 Epoch: 10	 Loss: 0.0448	 Train Accuracy: 98.55%
Model Trained!



### Model Evaluation

Now the RNN is evaluated.

In [24]:
evaluation = evaluate(rnn_model, test_loader, criterion, use_tqdm=True) # compute test data accuracy

Evaluating performance on the test dataset...

SOME PREDICTIONS FROM THE MODEL:


  0%|          | 0/5000 [00:00<?, ?it/s]

Prediction: 0 	Correct Output: 0
Input: <UNK> saw the <UNK> . showing and <UNK> must say that this movie was nothing special . <UNK> <UNK> did not leave the theater wanting my time back ( as <UNK> don't actually pay for movies anymore ) <UNK> didn't really find any redeeming <UNK> /><br <UNK> were a few lines and such that made me chuckle , but mostly the film seemed to consist of rampant fan service to the younger ( in mind more than age as this film is rated <UNK> ) male audience . <UNK> fan service seemed out of place and rather distracting as well . <UNK> know you all want to hear <UNK> <UNK> . say his infamous line , but let's be honest , it's a whole lot of hype for very little pay off . <UNK> only truly horrible part of the film was the <UNK> , which looked very <UNK> and did not mesh well with the live action on the screen.<br /><br <UNK> <UNK> am a reasonable man , <UNK> knew going into the theater that <UNK> wasn't going to be seeing " <UNK> " and <UNK> am at least thankful t