# Self-Attention for Vision

We're going to implement self-attention blocks in a convolutional neural network for CIFAR-10 Classification.

# Part I. Preparation

First, we load the CIFAR-10 dataset. This might take a couple minutes the first time you do it, but the files should stay cached after that.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

In [2]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./data/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./data/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./data/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


You have an option to **use GPU by setting the flag to True below**. It is not necessary to use GPU for this assignment. Note that if your computer does not have CUDA enabled, `torch.cuda.is_available()` will return False and this notebook will fallback to CPU mode.

The global variables `dtype` and `device` will control the data types throughout this assignment. 

In [3]:
USE_GPU = True

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

using device: cuda


## Flatten Function

In [4]:
def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

def test_flatten():
    x = torch.arange(12).view(2, 1, 3, 2)
    print('Before flattening: ', x)
    print('After flattening: ', flatten(x))

test_flatten()

Before flattening:  tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]]],


        [[[ 6,  7],
          [ 8,  9],
          [10, 11]]]])
After flattening:  tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


### Check Accuracy Function


In [5]:
import torch.nn.functional as F  # useful stateless functions
def check_accuracy(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))
        return 100 * acc

### Training Loop

In [6]:
def train(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    acc_max = 0
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Epoch %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))
                acc = check_accuracy(loader_val, model)
                if acc >= acc_max:
                    acc_max = acc
                print()
    print("Maximum accuracy attained: ", acc_max)

In [7]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

## Vanilla CNN; No Attention

In [8]:
channel_1 = 64
channel_2 = 32
learning_rate = 1e-3
num_classes = 10

model = nn.Sequential(
    nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, 3, padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32, num_classes),
)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)


train(model, optimizer, epochs=1)

Epoch 0, Iteration 0, loss = 2.3140
Checking accuracy on validation set
Got 141 / 1000 correct (14.10)

Epoch 0, Iteration 100, loss = 1.5275
Checking accuracy on validation set
Got 419 / 1000 correct (41.90)

Epoch 0, Iteration 200, loss = 1.4770
Checking accuracy on validation set
Got 445 / 1000 correct (44.50)

Epoch 0, Iteration 300, loss = 1.4464
Checking accuracy on validation set
Got 475 / 1000 correct (47.50)

Epoch 0, Iteration 400, loss = 1.4642
Checking accuracy on validation set
Got 532 / 1000 correct (53.20)

Epoch 0, Iteration 500, loss = 1.4925
Checking accuracy on validation set
Got 508 / 1000 correct (50.80)

Epoch 0, Iteration 600, loss = 1.4339
Checking accuracy on validation set
Got 534 / 1000 correct (53.40)

Epoch 0, Iteration 700, loss = 1.3148
Checking accuracy on validation set
Got 560 / 1000 correct (56.00)

Maximum accuracy attained:  56.00000000000001


## Test set 

In [9]:
vanillaModel = model
check_accuracy(loader_test, vanillaModel)

Checking accuracy on test set
Got 5534 / 10000 correct (55.34)


55.34

## Part II Self-Attention

In the next section, you will implement an Attention layer which you will then use within a convnet architecture defined above for cifar 10 classification task.

A self-attention layer is formulated as following:

Input: $X$ of shape $(H\times W, C)$

Query, key, value linear transforms are $W_Q$, $W_K$, $W_V$, of shape $(C, C)$. We implement these linear transforms as 1x1 convolutional layers of the same dimensions.

$XW_Q$, $XW_K$, $XW_V$, represent the output volumes when input X is passed through the transforms.


Self-Attention is given by the formula: $Attention(X) = X + Softmax(\frac{XW_Q(XW_K)^\top}{\sqrt{C}})XW_V$

### Here you implement the Attention module, and run it in the next section

In [10]:
# Initialize the attention module as a nn.Module subclass
class Attention(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        
        self.conv_query = nn.Conv2d(in_channels, in_channels, kernel_size=1)
        self.conv_key = nn.Conv2d(in_channels, in_channels, kernel_size=1)
        self.conv_value = nn.Conv2d(in_channels, in_channels, kernel_size=1)

    def forward(self, x):
        N, C, H, W = x.shape
        
        q = self.conv_query(x).reshape(N, C, H*W)

        k = self.conv_key(x).reshape(N, C, H*W)

        v = self.conv_value(x).reshape(N, C, H*W)

        attention = torch.bmm(torch.softmax(torch.bmm(q, k.transpose(1, 2)) / int(C**0.5), dim=2),v)

        attention = attention.reshape(N, C, H, W)
        return x + attention

## Single Attention Block: Early attention; After the first conv layer. 

In [11]:
channel_1 = 64
channel_2 = 32
learning_rate = 1e-3


model = nn.Sequential(
    nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
    nn.ReLU(),
    Attention(channel_1),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, 3, padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32, num_classes),
)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train(model, optimizer, epochs=10)

Epoch 0, Iteration 0, loss = 2.2951
Checking accuracy on validation set
Got 135 / 1000 correct (13.50)

Epoch 0, Iteration 100, loss = 1.5557
Checking accuracy on validation set
Got 451 / 1000 correct (45.10)

Epoch 0, Iteration 200, loss = 1.4654
Checking accuracy on validation set
Got 499 / 1000 correct (49.90)

Epoch 0, Iteration 300, loss = 1.1809
Checking accuracy on validation set
Got 506 / 1000 correct (50.60)

Epoch 0, Iteration 400, loss = 1.3901
Checking accuracy on validation set
Got 566 / 1000 correct (56.60)

Epoch 0, Iteration 500, loss = 1.2920
Checking accuracy on validation set
Got 551 / 1000 correct (55.10)

Epoch 0, Iteration 600, loss = 0.9551
Checking accuracy on validation set
Got 580 / 1000 correct (58.00)

Epoch 0, Iteration 700, loss = 1.2441
Checking accuracy on validation set
Got 585 / 1000 correct (58.50)

Epoch 1, Iteration 0, loss = 0.9908
Checking accuracy on validation set
Got 608 / 1000 correct (60.80)

Epoch 1, Iteration 100, loss = 0.8331
Checking acc

## Test set

In [12]:
earlyAttention = model
check_accuracy(loader_test, earlyAttention)

Checking accuracy on test set
Got 6033 / 10000 correct (60.33)


60.33

## Single Attention Block: Late attention; After the second conv layer.

In [13]:
channel_1 = 64
channel_2 = 32
learning_rate = 1e-3

# TODO: Use the above Attention module after the Second Convolutional layer.
# Essentially the architecture should be [Conv->Relu->Conv->Relu->Attention->Relu->Linear]

model = nn.Sequential(
    nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, 3, padding=1),
    nn.ReLU(),
    Attention(channel_2),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32, num_classes),
)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train(model, optimizer, epochs=10)

Epoch 0, Iteration 0, loss = 2.3106
Checking accuracy on validation set
Got 107 / 1000 correct (10.70)

Epoch 0, Iteration 100, loss = 1.4990
Checking accuracy on validation set
Got 451 / 1000 correct (45.10)

Epoch 0, Iteration 200, loss = 1.5117
Checking accuracy on validation set
Got 488 / 1000 correct (48.80)

Epoch 0, Iteration 300, loss = 1.1954
Checking accuracy on validation set
Got 527 / 1000 correct (52.70)

Epoch 0, Iteration 400, loss = 1.3444
Checking accuracy on validation set
Got 527 / 1000 correct (52.70)

Epoch 0, Iteration 500, loss = 1.3058
Checking accuracy on validation set
Got 539 / 1000 correct (53.90)

Epoch 0, Iteration 600, loss = 1.3221
Checking accuracy on validation set
Got 554 / 1000 correct (55.40)

Epoch 0, Iteration 700, loss = 1.0496
Checking accuracy on validation set
Got 564 / 1000 correct (56.40)

Epoch 1, Iteration 0, loss = 0.9104
Checking accuracy on validation set
Got 598 / 1000 correct (59.80)

Epoch 1, Iteration 100, loss = 1.2924
Checking acc

## Test set 

In [14]:
lateAttention = model
check_accuracy(loader_test, lateAttention)

Checking accuracy on test set
Got 6054 / 10000 correct (60.54)


60.540000000000006

## Double Attention Blocks: After conv layers 1 and 2 

In [15]:
channel_1 = 64
channel_2 = 32
learning_rate = 1e-3

model = nn.Sequential(
    nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
    nn.ReLU(),
    Attention(channel_1),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, 3, padding=1),
    nn.ReLU(),
    Attention(channel_2),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32, num_classes),
)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train(model, optimizer, epochs=10)

Epoch 0, Iteration 0, loss = 2.3165
Checking accuracy on validation set
Got 166 / 1000 correct (16.60)

Epoch 0, Iteration 100, loss = 1.4627
Checking accuracy on validation set
Got 437 / 1000 correct (43.70)

Epoch 0, Iteration 200, loss = 1.7036
Checking accuracy on validation set
Got 459 / 1000 correct (45.90)

Epoch 0, Iteration 300, loss = 1.5321
Checking accuracy on validation set
Got 492 / 1000 correct (49.20)

Epoch 0, Iteration 400, loss = 1.4116
Checking accuracy on validation set
Got 526 / 1000 correct (52.60)

Epoch 0, Iteration 500, loss = 1.1967
Checking accuracy on validation set
Got 541 / 1000 correct (54.10)

Epoch 0, Iteration 600, loss = 1.3342
Checking accuracy on validation set
Got 562 / 1000 correct (56.20)

Epoch 0, Iteration 700, loss = 1.3691
Checking accuracy on validation set
Got 596 / 1000 correct (59.60)

Epoch 1, Iteration 0, loss = 0.8812
Checking accuracy on validation set
Got 571 / 1000 correct (57.10)

Epoch 1, Iteration 100, loss = 1.1494
Checking acc

## Test set 

In [16]:
doubleAttentionModel = model
check_accuracy(loader_test, doubleAttentionModel)

Checking accuracy on test set
Got 6184 / 10000 correct (61.84)


61.839999999999996

## Resnet with Attention 

Now we will experiment with applying attention within the Resnet10 architecture 

## Vanilla Resnet, No Attention

The architecture for Resnet is given below, we will train it and evaluate it on the test set.

In [17]:
import torch
import torch.nn as nn

class ResNet(nn.Module):

    def __init__(self, block, layers, img_channels=3, num_classes=100, batchnorm=False):
        super(ResNet, self).__init__() #layers = [1, 1, 1, 1] 
        self.in_channels = 64
        self.conv1 = nn.Conv2d(img_channels, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.batchnorm = batchnorm
        self.layer1 = self.make_layer(block, layers[0], out_channels=64, stride=1, batchnorm=batchnorm)
        self.layer2 = self.make_layer(block, layers[1], out_channels=128, stride=1, batchnorm=batchnorm)
        self.layer3 = self.make_layer(block, layers[2], out_channels=256, stride=1, batchnorm=batchnorm)
        self.layer4 = self.make_layer(block, layers[3], out_channels=512, stride=2, batchnorm=batchnorm)

        self.averagepool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    
    def forward(self, x):

        x = self.conv1(x)
        if self.batchnorm:
            x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x) 
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.averagepool(x)
        x = x.reshape(x.shape[0], -1)
        x = x.reshape(x.shape[0], -1)
        x = self.fc(x)

        return x

    def make_layer(self, block, num_blocks, out_channels, stride, batchnorm=False):
        downsampler = None
        layers = []
        if stride != 1 or self.in_channels != out_channels:
            downsampler = nn.Sequential(nn.Conv2d(self.in_channels, out_channels, kernel_size = 1, stride = stride), nn.BatchNorm2d(out_channels))

        layers.append(block(self.in_channels, out_channels, downsampler, stride, batchnorm=batchnorm))

        self.in_channels = out_channels

        for i in range(num_blocks - 1):
            layers.append(block(self.in_channels, out_channels))

        
        return nn.Sequential(*layers)
        
class block(nn.Module):

    def __init__(self, in_channels, out_channels, downsampler = None, stride = 1, batchnorm=False):
        
        super(block, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding = 2)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size = 3, stride = stride)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsampler = downsampler
        self.relu = nn.ReLU()
        self.batchnorm = batchnorm

    
    def forward(self, x):

        residual = x
        x = self.conv1(x)
        if self.batchnorm:
            x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        if self.batchnorm:
            x = self.bn2(x)
        x = self.relu(x)
        
        if self.downsampler:
            residual = self.downsampler(residual)

        return self.relu(residual + x)
    
def ResNet10(num_classes = 100, batchnorm= False):

    return ResNet(block, [1, 1, 1, 1], num_classes=num_classes, batchnorm=batchnorm)

## Test set 

In [19]:
learning_rate = 1e-3

model = ResNet10()

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train(model, optimizer, epochs=10)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Epoch 0, Iteration 0, loss = 4.5606
Checking accuracy on validation set
Got 105 / 1000 correct (10.50)

Epoch 0, Iteration 100, loss = 1.7618
Checking accuracy on validation set
Got 390 / 1000 correct (39.00)

Epoch 0, Iteration 200, loss = 1.3404
Checking accuracy on validation set
Got 403 / 1000 correct (40.30)

Epoch 0, Iteration 300, loss = 1.3388
Checking accuracy on validation set
Got 511 / 1000 correct (51.10)

Epoch 0, Iteration 400, loss = 1.2707
Checking accuracy on validation set
Got 489 / 1000 correct (48.90)

Epoch 0, Iteration 500, loss = 1.3293
Checking accuracy on validation set
Got 527 / 1000 correct (52.70)

Epoch 0, Iteration 600, loss = 1.1351
Checking accuracy on validation set
Got 525 / 1000 correct (52.50)

Epoch 0, Iteration 700, loss = 1.1695
Checking accuracy on validation set
Got 541 / 1000 correct (54.10)

Epoch 1, Iteration 0, loss = 1.3914
Checking accuracy on validation set
Got 578 / 1000 correct (57.80)

Epoch 1, Iteration 100, loss = 1.1982
Checking acc

In [20]:
vanillaResnet = model
check_accuracy(loader_test, vanillaResnet)

Checking accuracy on test set
Got 7494 / 10000 correct (74.94)


74.94

## Resnet with Attention 

In [None]:
## Resnet with Attention

learning_rate = 1e-3

model = ResNet10()
model.layer2.add_module('attention', Attention(128))
model.layer2.add_module('relu', nn.ReLU())

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

train(model, optimizer, epochs=10)

Epoch 0, Iteration 0, loss = 4.6522
Checking accuracy on validation set
Got 113 / 1000 correct (11.30)

Epoch 0, Iteration 100, loss = 1.6097
Checking accuracy on validation set
Got 367 / 1000 correct (36.70)

Epoch 0, Iteration 200, loss = 1.5243
Checking accuracy on validation set
Got 464 / 1000 correct (46.40)

Epoch 0, Iteration 300, loss = 1.2668
Checking accuracy on validation set
Got 505 / 1000 correct (50.50)

Epoch 0, Iteration 400, loss = 1.3435
Checking accuracy on validation set
Got 459 / 1000 correct (45.90)

Epoch 0, Iteration 500, loss = 1.1794
Checking accuracy on validation set
Got 541 / 1000 correct (54.10)

Epoch 0, Iteration 600, loss = 1.1000
Checking accuracy on validation set
Got 569 / 1000 correct (56.90)

Epoch 0, Iteration 700, loss = 1.1838
Checking accuracy on validation set
Got 559 / 1000 correct (55.90)

Epoch 1, Iteration 0, loss = 1.1745
Checking accuracy on validation set
Got 580 / 1000 correct (58.00)

Epoch 1, Iteration 100, loss = 1.2302
Checking acc

## Test set 

In [22]:
AttentionResnet = model
check_accuracy(loader_test, AttentionResnet)

Checking accuracy on test set
Got 7636 / 10000 correct (76.36)


76.36

## Ranking the above models based on their performance on test dataset

In [23]:
def get_model_def(model_name):
    
    if model_name == 'vanilla_cnn':
        
        return nn.Sequential(
                        nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
                        nn.ReLU(),
                        nn.Conv2d(channel_1, channel_2, 3, padding=1),
                        nn.ReLU(),
                        Flatten(),
                        nn.Linear(channel_2*32*32, num_classes),
                    )
    
    elif model_name == 'early_attention': 
        
        return nn.Sequential(
                            nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
                            nn.ReLU(),
                            Attention(channel_1),
                            nn.ReLU(),
                            nn.Conv2d(channel_1, channel_2, 3, padding=1),
                            nn.ReLU(),
                            Flatten(),
                            nn.Linear(channel_2*32*32, num_classes),
                        )
    
    elif model_name == 'late_attention': 
        
        return nn.Sequential(
                            nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
                            nn.ReLU(),
                            nn.Conv2d(channel_1, channel_2, 3, padding=1),
                            nn.ReLU(),
                            Attention(channel_2),
                            nn.ReLU(),
                            Flatten(),
                            nn.Linear(channel_2*32*32, num_classes),
                        )
    
    elif model_name == 'double_attention': 
        
        return nn.Sequential(
                            nn.Conv2d(3, channel_1, 3, padding=1, stride=1),
                            nn.ReLU(),
                            Attention(channel_1),
                            nn.ReLU(),
                            nn.Conv2d(channel_1, channel_2, 3, padding=1),
                            nn.ReLU(),
                            Attention(channel_2),
                            nn.ReLU(),
                            Flatten(),
                            nn.Linear(channel_2*32*32, num_classes),
                        )
    
    elif model_name == 'vanilla_resnet': 
        
        return ResNet10()
    
    elif model_name == 'resnet_w_attention': 
        
        model = ResNet10()
        model.layer2.add_module('attention', Attention(128))
        model.layer2.add_module('relu', nn.ReLU())
        
        return model
    
    else:
        print("Invalid Model Name")

In [24]:
#Training all the models three times and reporting the average accuracy

models = ['vanilla_cnn',
          'early_attention',
          'late_attention',
          'double_attention',
          'vanilla_resnet',
          'resnet_w_attention']

model_acc = {}

learning_rate = 1e-3

epochs = 10

for i in models:
    test_acc = []
    for j in range(3):
        model = get_model_def(i)

        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
        
        train(model, optimizer, epochs=epochs)
        
        acc = check_accuracy(loader_test, model)
        
        test_acc.append(acc)
        
    model_acc[i] = test_acc

Epoch 0, Iteration 0, loss = 2.3181
Checking accuracy on validation set
Got 148 / 1000 correct (14.80)

Epoch 0, Iteration 100, loss = 1.7987
Checking accuracy on validation set
Got 407 / 1000 correct (40.70)

Epoch 0, Iteration 200, loss = 1.6420
Checking accuracy on validation set
Got 460 / 1000 correct (46.00)

Epoch 0, Iteration 300, loss = 1.5291
Checking accuracy on validation set
Got 472 / 1000 correct (47.20)

Epoch 0, Iteration 400, loss = 1.3316
Checking accuracy on validation set
Got 524 / 1000 correct (52.40)

Epoch 0, Iteration 500, loss = 1.0969
Checking accuracy on validation set
Got 544 / 1000 correct (54.40)

Epoch 0, Iteration 600, loss = 1.3414
Checking accuracy on validation set
Got 550 / 1000 correct (55.00)

Epoch 0, Iteration 700, loss = 1.2150
Checking accuracy on validation set
Got 559 / 1000 correct (55.90)

Epoch 1, Iteration 0, loss = 1.1066
Checking accuracy on validation set
Got 562 / 1000 correct (56.20)

Epoch 1, Iteration 100, loss = 1.1401
Checking acc

In [25]:
#Test accuracies of each model
model_acc  

{'vanilla_cnn': [58.79, 56.489999999999995, 58.8],
 'early_attention': [61.09, 60.019999999999996, 60.550000000000004],
 'late_attention': [60.480000000000004, 62.19, 59.599999999999994],
 'double_attention': [63.27, 62.78, 63.39],
 'vanilla_resnet': [76.47, 74.99, 75.76],
 'resnet_w_attention': [75.89, 76.19, 76.01]}

In [26]:
for i in model_acc:
    print("Model: "+i+" Average Accuracy: "+str(round(sum(model_acc[i])/3,2))+"%")

Model: vanilla_cnn Average Accuracy: 58.03%
Model: early_attention Average Accuracy: 60.55%
Model: late_attention Average Accuracy: 60.76%
Model: double_attention Average Accuracy: 63.15%
Model: vanilla_resnet Average Accuracy: 75.74%
Model: resnet_w_attention Average Accuracy: 76.03%


Ranking the Models:

|Rank|Model|Average Accuracy|
|----|-------- |-------|
|1.|ResNet with Attention|76.03%|
|2.|Vanilla ResNet|75.74%|
|3.|Double Attention|63.15%|
|4.|Late Attention|60.76%|
|5.|Early Attention|60.55%|
|6.|Vanilla CNN|58.03%|