Welcome to the exercise sheet about Recurrent Neural Networks. In this exercise sheet, we will take a closer look into RNNs, LSTMs and other variations.


The main task is to implement the same models as in the lecture and run the classification on the MNIST dataset.

## Imports


Let's first import all the dependencies we will need for this exercise.

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
import torch.optim as optim

## Loading the Dataset and making it iterable


In [None]:

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())


batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Exercise 1.1: Creating the model classes

Implement the RNN and the LSTM models from the lecture starting with one hidden layer and a tanh activation function for the RNN. Hint: The PyTorch packages provides built-in RNN and LSTM models.

In [None]:
# The RNN
class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # Building your RNN (Input size, Hidden size, Number of layers)
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)

        # Readout layer
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden state with zeros
        # h0: (num_layers, batch_size, hidden_size)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Define the forward steps
        out, _ = self.rnn(x, h0)  # Pass input through the RNN
        out = out[:, -1, :]  # Take the output of the last time step

        # Pass through the readout layer
        out = self.fc(out)

        return out

# The LSTM
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # Building your LSTM (Input size, Hidden size, Number of layers)
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)

        # Readout layer
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden state and cell state with zeros
        # h0: (num_layers, batch_size, hidden_size)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        # c0: (num_layers, batch_size, hidden_size) — cell state initialization
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Define the forward steps
        out, _ = self.lstm(x, (h0, c0))  # Pass input through the LSTM
        out = out[:, -1, :]  # Take the output of the last time step

        # Pass through the readout layer
        out = self.fc(out)

        return out

### Exercise 1.2: Instantiations

In [None]:
#Instantiate the model classes
model_rnn = RNNModel(input_size=28, hidden_size=128, num_layers=2, output_size=10)
model_lstm = LSTMModel(input_size=28, hidden_size=128, num_layers=2, output_size=10)

# Move to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_rnn.to(device)
model_lstm.to(device)

# Instantiate the Loss function (Cross Entropy Loss for classification)
loss_fn = nn.CrossEntropyLoss()

# Instantiate the Optimizer (Adam optimizer for efficiency)
optimizer_rnn = optim.Adam(model_rnn.parameters(), lr=0.001)
optimizer_lstm = optim.Adam(model_lstm.parameters(), lr=0.001)

learning_rate = 0.1


## Exercise 1.3: Training the models

Below, you find the training steps for the RNN model. Implement the training for the LSTM model accordingly.

In [None]:
# RNN Training
# Number of steps to unroll
seq_dim = 28 #time steps
input_dim = 28 #features at each time step
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        if torch.cuda.is_available():
            images = Variable(images.view(-1, seq_dim, input_dim).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, seq_dim, input_dim))
            labels = Variable(labels)

        # Clear gradients w.r.t. parameters
        optimizer_rnn.zero_grad() #clear the gradients

        # Forward pass to get output/logits
        outputs = model_rnn(images)

        # Calculate Loss
        loss = loss_fn(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer_rnn.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.view(-1, seq_dim, input_dim).cuda())
                else:
                    images = Variable(images.view(-1, seq_dim, input_dim))

                # Forward pass only to get logits/output
                outputs = model_rnn(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()

            accuracy = 100 * correct / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}. Epochs: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 0.08315066993236542. Accuracy: 96.8499984741211. Epochs: 5
Iteration: 1000. Loss: 0.11296291649341583. Accuracy: 97.55000305175781. Epochs: 5
Iteration: 1500. Loss: 0.13575473427772522. Accuracy: 97.06999969482422. Epochs: 5
Iteration: 2000. Loss: 0.14055731892585754. Accuracy: 96.95999908447266. Epochs: 5
Iteration: 2500. Loss: 0.16083259880542755. Accuracy: 96.13999938964844. Epochs: 5
Iteration: 3000. Loss: 0.053878821432590485. Accuracy: 96.91000366210938. Epochs: 5


In [None]:
# LSTM Training
# Number of steps to unroll
seq_dim = 28  # Sequence dimension (number of timesteps, corresponding to 28 rows in MNIST)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        if torch.cuda.is_available():
            images = Variable(images.view(-1, seq_dim, input_dim).cuda())  # Reshape images for LSTM input
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, seq_dim, input_dim))
            labels = Variable(labels)

        # Clear gradients w.r.t. parameters
        optimizer_lstm.zero_grad()

        # Forward pass to get output/logits
        outputs = model_lstm(images)

        # Calculate Loss
        loss = loss_fn(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer_lstm.step()

        iter += 1  # Increment iteration counter

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0

            # Iterate through test dataset
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.view(-1, seq_dim, input_dim).cuda())  # Reshape for LSTM
                else:
                    images = Variable(images.view(-1, seq_dim, input_dim))

                # Forward pass only to get logits/output
                outputs = model_lstm(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()

            # Accuracy as a percentage
            accuracy = 100 * correct / total

            # Print Loss and Accuracy
            print(f'Iteration: {iter}. Loss: {loss.item()}. Accuracy: {accuracy.item()}%. Epochs: {epoch + 1}/{num_epochs}')


Iteration: 3500. Loss: 0.12816676497459412. Accuracy: 94.7300033569336%. Epochs: 1/5
Iteration: 4000. Loss: 0.19394533336162567. Accuracy: 96.94000244140625%. Epochs: 2/5
Iteration: 4500. Loss: 0.14641742408275604. Accuracy: 97.72000122070312%. Epochs: 3/5
Iteration: 5000. Loss: 0.047777943313121796. Accuracy: 97.69000244140625%. Epochs: 4/5
Iteration: 5500. Loss: 0.009313635528087616. Accuracy: 98.05999755859375%. Epochs: 5/5
Iteration: 6000. Loss: 0.04128799960017204. Accuracy: 98.33000183105469%. Epochs: 5/5


## Exercise 2: Classification
We want to compare different model configurations with each other.

For the RNN:
* 1, 2, 3 or 4 hidden layers
* tanh and ReLu activation function
* Additional fully connected layer

For the LSTM:
* 1, 2 or 3 hidden layers
* Additional fully connected layer


### Exercise 2.1:
Change the above implementation to allow for an efficient way to compare the final classification accuracies in one cell (i.e. define training methods and add model parameters).


In [None]:
## your code goes here
## type your answer as a comment

### Exercise 2.2:
Do your results differ from the results presented in the lecture? If so, why?

In [None]:
## Your answer goes here

## Exercise 3:

So far, we always trained for 3000 iterations with a batch size of 100 and a learning rate of 0.1. Our classification accuracies might be improved, if we change these values. Systematically change these values and find a better combination (if possible).

In [None]:
## your code goes here

## Exercise 4:
1. Why might the LSTM result in better classification accuracies? What are the advantages and disadvantages of using an LSTM in this task, compared to an RNN?
2. We addressed other variants of RNNs in the lecture. Which of them might be suitable for this classification task an why? (GRU, bidirectional RNN, Recursive Neural Network, Encoder-Decoder RNN)