# Lab 5: Neural Networks for Text Classification

This lab introduces (deep) neural networks for text classification using Pytorch, and applies it to the datasets we previously used with naïve Bayes and logistic regression. Pytorch is a framework for machine learning with neural networks, which is widely used in fields such as Computer Vision and NLP.

You may also find [Pytorch's tutorials](https://pytorch.org/tutorials/) useful to give more depth on different parts of the framework.

### Outcomes

- Be able to construct and train a neural network classifier in Pytorch.
- Understand how to use word embeddings as input to a neural network.
- Know how to compare classifier performance on a test set.

### Overview

We first format the data so it can be input to the neural network. Then we see how to construct a neural network with Pytorch, then train and test it. Finally, we introduce pretrained embeddings to the model.


# 1. Loading the Data

This section contains the same loader code as earlier labs, which loads the sentiment dataset from TweetEval.


In [1]:
import os
import sys

path = os.path.abspath(os.path.join(".."))

if path not in sys.path:
    sys.path.append(path)

In [2]:
from Modules.neural_network_classifiers.feedforward_tweet_eval import (
    prepare_data,
)

sequence_length = 40
batch_size = 64

(
    tokenizer,
    train_loader,
    test_loader,
    dev_loader,
    num_embeddings,
    output_dim,
) = prepare_data(sequence_length=sequence_length, batch_size=batch_size)

  from .autonotebook import tqdm as notebook_tqdm
Found cached dataset tweet_eval (/Users/qr23940/git/dialogue_and_narrative/notebooks/data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)
Found cached dataset tweet_eval (/Users/qr23940/git/dialogue_and_narrative/notebooks/data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)
Found cached dataset tweet_eval (/Users/qr23940/git/dialogue_and_narrative/notebooks/data_cache/tweet_eval/emotion/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


# 2. Preparing the Data

Now we put the dataset into a suitable format for a Pytorch NN classifier.


In [3]:
%load_ext autoreload
%autoreload 2

As inputs to the Sklearn classifiers in week 3, we used CountVectorizer to extract a single vector representation for a _whole document_.
However, one motivation for using a neural network is that it can process the individual words in the sentence in order, and learn how to combine information from different tokens automatically. This means we don't need to convert the document to a fixed-length vector during the preprocessing phase.
Instead, as input to our neural network, we will pass in a sequence of tokens, where each token is represented by its _input_id_, which is its index into the vocabulary.

The first step is to compute the vocabulary. This can be done in various ways, but here we will stick with the familiar CountVectorizer method:


Now, we have to map the tokens to their IDs -- their indexes in the vocabulary.


Our neural network's input layer has a fixed size, so we need to somehow make all of our documents have the same number of tokens. We can do this by setting a fixed sequence length, then _padding_ the short documents with a special token. Any documents that exceed the length will be truncated. Let's plot a histogram to understand the length distribution of the texts.


The code cell below in intended to pad any documents that are too short and truncate any that are too long, so that we obtain a set of sequences of equal length.

**TODO 2.1:** Complete the padding code below to insert 0s at the start of any sequences that are too short, and to truncate any sequences that are too long.


We now have our data in the right format. When training, the neural network will process the data in randomly-chosen mini-batches, rather than all at once.
To enable this, we wrap our dataset in a DataLoader, which allows the network to select batches of data:

DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader


# 3. Constructing the Network


We will build a NN with three different layers for sentiment classification.

### Embedding layer

In the embedding layer, the network will create its own embeddings for the index with a given embedding dimension.
The module `nn.Embedding()` creates a simple lookup table that stores embeddings of words in a fixed dictionary with fixed size.
This module is often used to store word embeddings and retrieve them using indices.
The module's input is a list of indices, and the output is the corresponding word embeddings.

[Documentation for Embedding Class](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)

### Fully-connected layer

Fully-connected layers in a neural network are those layers where all the inputs from the previous layer are connected to every unit of the fully-connected layer.
Here we will use fully-connected layers for the hidden layer and output layer. In Pytorch this kind of layer is implemented by the 'Linear' class. The name 'linear' is used because the nonlinear part is provided by the activation functions, which act like another layer in the network.

https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

## Activation functions

In Pytorch, the activation function is not included in the Linear class (or other kinds of neural network layer). An example of an activation function is ReLU, which is commonly used in the hidden layers of a neural network:

https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

In Pytorch, we construct a neural network by connecting up the output of each component to the input of the next, thereby creating a computation graph.
To complete a fully-connected hidden layer, we connect the ouput of a Linear layer to the input of a ReLU activation function, thereby creating a nonlinear function.

**TODO 3.1** Complete the constructor for a NN with three layers by adding the missing dimensions.

**TODO 3.2** Complete the forward function that maps the input data to an output by adding the missing line.


**TODO 3.3** Create a NN with the FFTextClassifier class we wrote.

**Hint:** `model = FFTextClassifier(...)`


After desigining our network, we need to create a training function to calculate the loss for each input and perform backpropagation to optimise the network.
During training, the weights of all the layers will be updated.

We build a training function to train the NN over a fixed number of epochs (an epoch is one iteration over the whole training dataset).
The function also prints the performance of both training and development/validation set after each epoch. There are some high-level wrapper libraries that do this stuff for you, but when learning about neural networks, it's useful to see what's going on inside.

**TODO 3.4** Complete the code below to compute the validation accuracy and loss after each training epoch.


The last step before we start training is defining the loss function and optimizer.

Here we use cross-entropy loss and the Adam optimizer (it tends to find a better solution in a small number of iterations than SGD).
The module `nn.CrossEntropyLoss()` combines `LogSoftmax` and `NLLLoss` in one single class so that we don't have to implement the softmax layer within the forward() method.

Cross Entropy Loss: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

Optimization: https://pytorch.org/docs/stable/optim.html

**TODO 3.4** Finally, train the network for 10 epochs!


In [4]:
from torch import optim
from torch import nn
from Modules.neural_network_classifiers.feedforward import (
    FeedforwardTextClassifier,
)


loss_fn = nn.CrossEntropyLoss()
embedding_dim = 25
hidden_dim = 32

model = FeedforwardTextClassifier(
    loss_fn=loss_fn,
    num_embeddings=num_embeddings,
    embedding_dim=embedding_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    sequence_length=sequence_length,
)

num_epochs = 10
learning_rate = 0.0005
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

model.train_(
    num_epochs=num_epochs,
    train_loader=train_loader,
    dev_loader=dev_loader,
    optimizer=optimizer,
)

Epoch = 0
Training loss = 1.281
Training accuracy = 42.2 %
Validation loss = 1.272
Validation accuracy = 43.3 %
Epoch = 1
Training loss = 1.201
Training accuracy = 47.1 %
Validation loss = 1.251
Validation accuracy = 45.2 %
Epoch = 2
Training loss = 1.153
Training accuracy = 48.8 %
Validation loss = 1.284
Validation accuracy = 39.3 %
Epoch = 3
Training loss = 1.105
Training accuracy = 52.3 %
Validation loss = 1.329
Validation accuracy = 35.8 %
Epoch = 4
Training loss = 1.069
Training accuracy = 53.5 %
Validation loss = 1.310
Validation accuracy = 39.3 %
Epoch = 5
Training loss = 1.028
Training accuracy = 55.8 %
Validation loss = 1.318
Validation accuracy = 39.3 %
Epoch = 6
Training loss = 0.982
Training accuracy = 59.2 %
Validation loss = 1.369
Validation accuracy = 37.4 %
Epoch = 7
Training loss = 0.951
Training accuracy = 60.3 %
Validation loss = 1.397
Validation accuracy = 40.6 %
Epoch = 8
Training loss = 0.915
Training accuracy = 62.2 %
Validation loss = 1.397
Validation accuracy =

**TODO 3.5:** Evaluate the model on test set using the function below. Complete the code to count the correct classifications.


In [5]:
test_loss, test_accuracy = model.test_(loader=test_loader)

print(f"Test loss = {test_loss:.3f}")
print(f"Test accuracy = {test_accuracy:.1f} %")

Test loss = 1.382
Test accuracy = 40.8 %


# 4. Pretrained Embeddings

Now let's use pretrained word embeddings as inputs instead of learning them from scratch during training.
Here, we will use a pretrained embedding matrix to initialise the embedding layer, which will then be updated during training.

The class below extends the FFTextClassifier class. This means that it inherits all of its functionality, but we now overwrite the constructor (the `__init__` method).
This way, we don't need to define the forward function again, as it will be the same as before.

**TODO 4.1** As before, complete the arguments below to set the dimensions of the neural network layers.


**TODO 4.2** Using the above class, construct, train and test the classifier with pretrained embeddings. You will need to create a new optimizer object.


In [6]:
from torch import optim
from Modules.neural_network_classifiers.feedforward_embeddings import (
    FeedforwardTextClassifierEmbeddings,
)

model = FeedforwardTextClassifierEmbeddings(
    loss_fn=loss_fn,
    num_embeddings=num_embeddings,
    embedding_dim=embedding_dim,
    hidden_dim=hidden_dim,
    output_dim=output_dim,
    sequence_length=sequence_length,
    tokenizer=tokenizer,
)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

model.train_(
    num_epochs=num_epochs,
    train_loader=train_loader,
    dev_loader=dev_loader,
    optimizer=optimizer,
)

  ] = torch.from_numpy(  # type: ignore


Epoch = 0
Training loss = 1.285
Training accuracy = 41.7 %
Validation loss = 1.244
Validation accuracy = 42.8 %
Epoch = 1
Training loss = 1.210
Training accuracy = 45.4 %
Validation loss = 1.208
Validation accuracy = 42.8 %
Epoch = 2
Training loss = 1.163
Training accuracy = 49.6 %
Validation loss = 1.215
Validation accuracy = 43.3 %
Epoch = 3
Training loss = 1.133
Training accuracy = 50.1 %
Validation loss = 1.215
Validation accuracy = 45.7 %
Epoch = 4
Training loss = 1.084
Training accuracy = 53.5 %
Validation loss = 1.214
Validation accuracy = 42.0 %
Epoch = 5
Training loss = 1.029
Training accuracy = 57.4 %
Validation loss = 1.197
Validation accuracy = 44.1 %
Epoch = 6
Training loss = 0.985
Training accuracy = 60.1 %
Validation loss = 1.205
Validation accuracy = 45.5 %
Epoch = 7
Training loss = 0.935
Training accuracy = 62.1 %
Validation loss = 1.228
Validation accuracy = 45.7 %
Epoch = 8
Training loss = 0.887
Training accuracy = 64.7 %
Validation loss = 1.223
Validation accuracy =

In [7]:
test_loss, test_accuracy = model.test_(loader=test_loader)

print(f"Test loss = {test_loss:.3f}")
print(f"Test accuracy = {test_accuracy:.1f} %")

Test loss = 1.325
Test accuracy = 47.6 %
