<a href="https://colab.research.google.com/github/zxcej/6771_final/blob/master/2023_Lab6_Ex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6: Word Embeddings and RNNs

This lab covers the following topics:
- Word encodings and embeddings.
- Recurrent neural networks (RNNs).
- Long-short term memory (LSTM).





## Exercise 1: Word Embeddings

### Exercise 1.1 
Consider the limited vocabulary list below

In [None]:
vocab = ["the", "quick", "brown", "sly", "fox", "jumped", "over", "a", "lazy", "dog", "and","found","lion"]
print(len(vocab))

13


Write a function to create **one hot encodings** of the words. The function maps each word to a vector, where it's location in the vocab list is indicated by 1 and all other entries are zero. 

For example "quick" should map to a torch tensor of dimension 1 with entries [0,1,0....0].

Create an extra category for words not in the vocabulary 

In [None]:
def one_hot_embedding(token, vocab):
  """
  Token should be a list of words or an indvidual word of length W. 
  The output shouild be a torch tensor fo size W x (V+1) which gives the one hot encoding for all W tokens
  """
  
  return vector



  

### Exercise 1.2

Create a `nn.module` that:

1. Takes in a single sentence (a python list).
2. Finds the one hot encoding of each word using the function created in exercise 1.1.
3. Finds the "word embedding" of each word that is $D$-dimensional using the `EmbedddingTable`.
4. Returns the average of the word embeddings as a torch vector of size $D$. 
 

In [None]:
import torch.nn as nn

class MyWordEmbeddingBag(nn.Module):
    def __init__(self, dim):
        super(MyWordEmbeddingBag, self).__init__()

        self.EmbeddingTable = nn.Parameter(torch.randn(len(vocab)+1,dim))

    def forward(self, inputList):
        # Your answer here 
        return vector

### Exercise 1.3

Instantiate the model with vectors of size $D$= 100 and forward pass the following sentences through your module

In [None]:
sent1 = ["the", "quick", "brown"]
sent2 = ["the", "sly", "fox", "jumped"]
sent3 = ["the", "dog", "found","a","lion"]

#Instantiate model
my_model = 

#forward pass sentences
assert(len(my_model(sent1))==100)
assert(len(my_model(sent2))==100)
assert(len(my_model(sent3))==100)

### Exercise 1.4

Compute the euclidean distance between "fox" and "dog" using the randomly initialized embedding table in your model above. 

**Note**: As this is randomly initialized, the distances will also be random in this case. However a trained model using word embeddings will often exhibit closer distances between related words, depending on objective. 

## Exercise 2: Recurrent Neural Networks

We will experiment with recurrent networks using the MNIST dataset.

In [None]:
import torchvision
import torch
import torchvision.transforms as transforms

from torch.utils.data import Subset

### Hotfix for very recent MNIST download issue https://github.com/pytorch/vision/issues/1938 
from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
###

dataset = torchvision.datasets.MNIST('./', download=True, transform=transforms.Compose([transforms.ToTensor()]), train=True)
train_indices = torch.arange(0, 10000)
train_dataset = Subset(dataset, train_indices)

dataset=torchvision.datasets.MNIST('./', download=True, transform=transforms.Compose([transforms.ToTensor()]), train=False)
test_indices = torch.arange(0, 10000)
test_dataset = Subset(dataset, test_indices)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



In [None]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64,
                                          shuffle=True, num_workers=0)

test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=16,
                                          shuffle=False, num_workers=0)

### Exercise 2.1

Consider the following script (modified from https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/recurrent_neural_network/main.py) which trains an RNN on the MNIST data. 

Here we can consider each column of the image as an input for each step of the RNN. After 28 steps the model applies a linear layer + cross-entropy loss. We will use this to familiarize ourselves with the nn.RNN module and the nn.LSTM module. 

First run the cell below



In [None]:
import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms


# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 2
num_classes = 10
batch_size = 100
num_epochs = 2
learning_rate = 0.01


# Recurrent neural network (many-to-one)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
        
        # Forward propagate RNN
        out , _ = self.rnn(x, h0)  # out: tensor of shape (batch_size, seq_length, hidden_size)
        
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)


# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        #Gradient clipping
        #torch.nn.utils.clip_grad_norm_(model.parameters(), 0.2)
        
        optimizer.step()
        
        if (i+1) % 10 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

Epoch [1/2], Step [10/157], Loss: 2.1837
Epoch [1/2], Step [20/157], Loss: 2.0478
Epoch [1/2], Step [30/157], Loss: 1.4118
Epoch [1/2], Step [40/157], Loss: 1.5065
Epoch [1/2], Step [50/157], Loss: 1.8594
Epoch [1/2], Step [60/157], Loss: 1.2403
Epoch [1/2], Step [70/157], Loss: 1.4268
Epoch [1/2], Step [80/157], Loss: 1.2231
Epoch [1/2], Step [90/157], Loss: 1.0086
Epoch [1/2], Step [100/157], Loss: 1.2341
Epoch [1/2], Step [110/157], Loss: 1.0509
Epoch [1/2], Step [120/157], Loss: 1.1626
Epoch [1/2], Step [130/157], Loss: 1.0802
Epoch [1/2], Step [140/157], Loss: 1.2391
Epoch [1/2], Step [150/157], Loss: 1.2083
Epoch [2/2], Step [10/157], Loss: 1.1772
Epoch [2/2], Step [20/157], Loss: 1.1948
Epoch [2/2], Step [30/157], Loss: 1.1187
Epoch [2/2], Step [40/157], Loss: 1.3277
Epoch [2/2], Step [50/157], Loss: 1.5725
Epoch [2/2], Step [60/157], Loss: 1.3480
Epoch [2/2], Step [70/157], Loss: 1.0973
Epoch [2/2], Step [80/157], Loss: 1.0534
Epoch [2/2], Step [90/157], Loss: 1.1088
Epoch [2/2

### Exercise 2.2

Modify the above code (no need to create a new cell) to print the gradient norm of some of the parameters after backward in the the first minibatch. 

Do this for the following weight parameter: model.rnn.weight_ih_l0. 

### Exercise 2.3

Modify the code (in a new cell below) to use LSTM  (and remove the gradient clipping) and rerun the code. 

**Note**: This is essentially what is done in the original script linked above which you may check for reference or if you get stuck. 

Run with LSTM and compare the accuracy and the gradient norm for weight_ih_l0 of the RNN.