In [22]:
%load_ext autoreload
%autoreload 2

import submission

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Part 0: Signature

Please implement the `signature()` function in the `submission.py` file. It should return your first name and last name as a string. This is free points. Failure to do so will result in a 0 on the assignment. The function is on line 19.

Run the following cell to check your signature is returning your name as a string.

In [23]:
print(submission.signature())
assert type(submission.signature()) == str, 'Signature should return a string'
# this should return your name as a string, not a None type


YifeiChen


# Implementing the attention mechanism with the `Attention` class. 

You have learned that the attention mechanism is a way to compute the importance of each "word" in a sequence. In this part of the assignment, you will implement the attention mechanism in the `Attention` class. 

This is a toy attention mechanism. The outline of our attention mechanism is as follows:

1. You give it a sentence.
2. You will compute the word embeddings and positional embeddings for the sentence. This function has already been implemented for you. But when you run your attention mechanism you will need to first add the word embeddings and positional embeddings together first.
3. You will cast the embeddings to the dimensionality of the hidden layers.
4. Using the hidden embeddings and the weight matrices, you will compute the attention scores and the attention weighted embeddings.
5. You will report the attention scores.
6. You will report the value embeddings that have been weighted by the attention scores. 
7. The weighted value embeddings you report should be cast back to the dimensionality of the original word embeddings.

The `Attention` class will have the following methods:

- `__init__`: This method will initialize the class. It should define the Q, K, V weight matrices, and the maximum number of words in a sequence. Implement this method. 
- `get_word_embeddings`: This method will return the embeddings for words that you want to compute the attention for. There is no need to implement this method. It has been implemented for you. 
- `get_positional_embeddings`: This method will return the positional embeddings for words that you want to compute the attention for. Implement this method.
- `get_attention`: This method will compute the attention mechanism. Implement this method. 

Please see the instructions within each function for more details. 

## Hints 

- The `softmax` function is imported from `scipy.special`. You may use this in the attention mechanism. 
- Use the `@` operator to perform matrix multiplication. 
- Think carefully about the dimensions of the matrices you are multiplying. 

## Grading

You will be graded on the following criteria:

- The `get_attention` method is implemented correctly. We will test your implementation on unit tests.

Go to the `get_attention` function in the `submission.py` file to implement your code. It is at line 110, but you may want to read the entire class to understand the context. 

Run the following cell to test your implementation. It should pass if your implementation is bug free. Please do you due diligence to ensure your implementation is correct. One way is to use a small matrix with simple numbers and do the math by hand. 


In [24]:
def test_attention_p1(sentence: str):
    attn = submission.Attention()
    embeddings, pos_embeddings = attn.get_embeddings(sentence)
    print('embeddings', embeddings.shape)
    print('pos_embeddings', pos_embeddings.shape)
    attn_score, attn_weighted_embeddings = attn.get_attention(embeddings, pos_embeddings)
    print('attn_score', attn_score.shape)
    print('attn_weighted_embeddings', attn_weighted_embeddings.shape)
    
    assert attn_weighted_embeddings.shape == embeddings.shape
    print('Test passed')

test_attention_p1('this is a test sentence')

embeddings (5, 16)
pos_embeddings (5, 16)
attn_score (5, 5)
attn_weighted_embeddings (5, 16)
Test passed


# Using PyTorch to implement the attention mechanism

We want to use PyTorch to so we can use its autograd capabilities and actually train this model to do something. 

Below is the `AttentionTransformer` class that has parts to be implemented. It represents a transformer with a single head attention mechanism, but without the AddNorm. 

This transformer will be used as a classifier downstream. Specifically, we will be using the transformer to detect the number of times "cat" appears in a sentence. 

Only implement the methods and sections that are marked with TODO. If you are not familiar with PyTorch, you should refer to the PyTorch documentation to understand the methods that are being used. 

## Grading

You will be graded on the following criteria:

- The `AttentionTransformer` class is implemented correctly. We will test your implementation on unit tests.

Go to the `AttentionTransformer` class in the `submission.py` file to implement your code. 

It is at line 218, but you may want to read the entire class to understand the context. 

After you have implemented the `attention` method, run the following cell to test your implementation. It should pass if your implementation is bug free. Please do you due diligence to ensure your implementation is correct. One way is to use a small matrix with simple numbers and do the math by hand. 


In [33]:
import torch
def test_attention_p2():
    attn = submission.AttentionTransformer(max_input_length=10, embedding_dim=10, hidden_dim=5)
    attn_weighted_embeddings = attn.attention(torch.randn(1, 10, 10))
    print(attn_weighted_embeddings.shape)
    assert attn_weighted_embeddings.shape == torch.randn(1, 10, 5).shape, 'You have a dimension mismatch'
    print('Test passed')
    
test_attention_p2()

torch.Size([1, 10, 5])
Test passed


# Part 3: Training the Attention Transformer

A common way to train models and maintain hyperparameters is to use a dataclass. On line 256 of the `submission.py` file, you will see a dataclass called `TrainingParameters`. 

Fill out the parameters in the dataclass in the `submission.py` file, then run the cell below to train your model. You can observe the training loss and evaluation score.

You should be able to get a score of 1.0 on the evaluation score if your implementation is correct and hyperparameters are set appropriately.

Grading:

- The accuracy of the model on the test set = 1, full score 
- The accuracy of the model on the test set = 0.7, 50% score
- The accuracy of the model on the test set = 0.4, 25% score
- The accuracy of the model on the test set < 0.4, no points

Please ensure your implementation is bug free in `submission.py`. A quick sanity check is to run the cell below and see if it runs without errors. 



In [52]:
params = submission.TrainingParameters()
params.epochs = 100

loss_history, eval_score_history, model = submission.train(params, data_path='sentences.csv')

Map: 100%|██████████| 400/400 [00:00<00:00, 4073.73 examples/s]
Map: 100%|██████████| 1600/1600 [00:00<00:00, 4328.35 examples/s]


Epoch 0 train loss: 2.385319764797504
Epoch 0 eval accuracy: 0.2575
Epoch 10 train loss: 1.5432332020539503
Epoch 10 eval accuracy: 1.0
Epoch 20 train loss: 1.5431088759348943
Epoch 20 eval accuracy: 1.0
Epoch 30 train loss: 1.5430772304534912
Epoch 30 eval accuracy: 1.0
Epoch 40 train loss: 1.5430635488950288
Epoch 40 eval accuracy: 1.0
Epoch 50 train loss: 1.543056240448585
Epoch 50 eval accuracy: 1.0
Epoch 60 train loss: 1.5430519672540517
Epoch 60 eval accuracy: 1.0
Epoch 70 train loss: 1.5430492346103375
Epoch 70 eval accuracy: 1.0
Epoch 80 train loss: 1.5430473272617047
Epoch 80 eval accuracy: 1.0
Epoch 90 train loss: 1.5430458784103394
Epoch 90 eval accuracy: 1.0
1.543044979755695
1.0


## Test your performance on the following sentence provided in `test`

You are free to change the sentence to test your model. The model here is the one you trained in the previous cell. 

In [50]:
test = 'catcat is a great pet. tac cat act cactus cat cat is a great pet. tac act cat us cat' 

tokenizer = params.tokenizer

tokens = tokenizer(test, 
                   return_tensors='pt', 
                   padding="max_length", 
                   truncation=True, 
                   max_length=params.token_max_length).to(params.device)['input_ids']

with torch.no_grad():
    output = model.forward(tokens).argmax(dim=-1).item()
    print(f'Model prediction of number of cats: {output}')
    print('True number of cats:', test.count('cat'))


Model prediction of number of cats: 7
True number of cats: 7
