<a href="https://colab.research.google.com/github/yjwu17/Course/blob/main/COMP5541_Finetune_Pretrained_AlexNet_CIFAR_10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning Pretrained AlexNet in PyTorch

AlexNet is a deep convolutional neural network, which was initially developed by Alex Krizhevsky and his colleagues back in 2012. It was designed to classify images for the ImageNet LSVRC-2010 competition where it achieved state of the art results. You can read in detail about the model in the original research paper.

We want to **fine-tune a AlexNet model based on the AlexNet model pre-trained on imagenet**. ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.

Let's start by loading and then pre-processing the data. For our purposes, we will be using the CIFAR10 dataset. The dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

![CIFAR10](https://drive.google.com/uc?export=view&id=13c1WiUPbOP_RbYcABeAJzBCBC0pifrnZ)

## Importing Libraries

The Notebook knows to use a GPU to train the model if it's available.

In [None]:
import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler
from datetime import datetime

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## Loading the CIFAR10 Dataset

Using torchvision (a helper library for computer vision tasks), we will load our dataset. This method has some helper functions that makes pre-processing pretty easy and straight-forward. Let's define the functions get_train_valid_loader and get_test_loader, and then call them to load in and process our CIFAR-10 data:

In [None]:
def get_train_valid_loader(
    data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True
):
    normalize = transforms.Normalize(
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010],
    )

    # define transforms
    transform = transforms.Compose(
        [
            transforms.Resize((227, 227)),
            transforms.ToTensor(),
            normalize,
        ]
    )

    # load the dataset
    train_dataset = datasets.CIFAR10(
        root=data_dir,
        train=True,
        download=True,
        transform=transform,
    )

    valid_dataset = datasets.CIFAR10(
        root=data_dir,
        train=True,
        download=True,
        transform=transform,
    )

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, sampler=train_sampler
    )

    valid_loader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=batch_size, sampler=valid_sampler
    )

    return (train_loader, valid_loader)


def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    )

    # define transform
    transform = transforms.Compose(
        [
            transforms.Resize((227, 227)),
            transforms.ToTensor(),
            normalize,
        ]
    )

    dataset = datasets.CIFAR10(
        root=data_dir,
        train=False,
        download=True,
        transform=transform,
    )

    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=batch_size, shuffle=shuffle
    )

    return data_loader


# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(
    data_dir="./data", batch_size=64, augment=False, random_seed=1
)

test_loader = get_test_loader(data_dir="./data", batch_size=64)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:04<00:00, 35551477.41it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Files already downloaded and verified


## Building AlexNet

- The first step to defining any neural network (whether a CNN or not) in PyTorch is to define a class that inherits `nn.Module` as it contains many of the methods that we will need to utilize
- There are two main steps after that. First is initializing the layers that we are going to use in our CNN inside `__init__`, and the other is to define the sequence in which those layers will process the image. This is defined inside the forward function
- For the architecture itself, we first define the convolutional layers using the `nn.Conv2D` function with the appropriate kernel size and the input/output channels. We also apply max pooling using `nn.MaxPool2D` function. The nice thing about PyTorch is that we can combine the convolutional layer, activation function, and max pooling into one single layer (they will be separately applied, but it helps with organization) using the `nn.Sequential` function
- Then we define the fully connected layers using linear (`nn.Linear`) and dropout (`nn.Dropout`) along with ReLu activation function (`nn.ReLU`) and combining these with the nn.Sequential function
- Finally, our last layer outputs 10 neurons which are our final predictions for the 10 classes of objects


In [None]:
class AlexNet(nn.Module):
    def __init__(self, num_classes = 1000, dropout = 0.5):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

## Setting Hyperparameters

Before training, we need to set some hyperparameters, such as the loss function and the optimizer to be used along with batch size, learning rate, and number of epochs.

In [None]:
# +++++++++++++++++++++++++++++++++++++
imagenet_num_classes = 1000
cifar_num_classes = 10
# +++++++++++++++++++++++++++++++++++++

num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(imagenet_num_classes).to(device)


# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

## Loading Pretrained Weights on ImageNet

The pre-trained model expects input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least 224. The images have to be loaded in to a range of `[0, 1]` and then normalized using mean = `[0.485, 0.456, 0.406]` and std = `[0.229, 0.224, 0.225]`.

In [None]:
!wget https://download.pytorch.org/models/alexnet-owt-7be5be79.pth

--2023-06-07 01:05:42--  https://download.pytorch.org/models/alexnet-owt-7be5be79.pth
Resolving download.pytorch.org (download.pytorch.org)... 13.249.85.22, 13.249.85.10, 13.249.85.72, ...
Connecting to download.pytorch.org (download.pytorch.org)|13.249.85.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244408911 (233M) [application/x-www-form-urlencoded]
Saving to: ‘alexnet-owt-7be5be79.pth.2’


2023-06-07 01:05:49 (34.4 MB/s) - ‘alexnet-owt-7be5be79.pth.2’ saved [244408911/244408911]



In [None]:
state_dict = torch.load("/content/alexnet-owt-7be5be79.pth")
model.load_state_dict(state_dict)

<All keys matched successfully>

### The Pretrained AlexNet Architecture on ImageNet

The default output of the pretrained AlexNet on ImageNet is 1000 classes, we have to adjust the network structure to fit this.

In [None]:
# before modify
model

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [None]:
# We can choose to replace all layers in the classifier, or just replace the last layer.
modify_whole_classifier = False

if modify_whole_classifier:
    model.classifier = nn.Sequential()
    model.classifier.add_module("0", nn.Dropout(p=0.5))
    model.classifier.add_module("1", nn.Linear(256 * 6 * 6, 4096))
    model.classifier.add_module("2", nn.ReLU(inplace=True))
    model.classifier.add_module("3", nn.Dropout(p=0.5))
    model.classifier.add_module("4", nn.Linear(4096, 4096))
    model.classifier.add_module("5", nn.ReLU(inplace=True))
    model.classifier.add_module("6", nn.Linear(4096, cifar_num_classes))
    model.to(device)
else:
    model.classifier[6] = nn.Linear(4096, cifar_num_classes)
    model.to(device)

In [None]:
# after modify
model

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

## Training

We are ready to train our model at this point:

In [None]:
total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print ('{} - Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(str(datetime.now()), epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Validation
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            del images, labels, outputs

        print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Epoch [1/20], Step [704/704], Loss: 0.8373
Accuracy of the network on the 5000 validation images: 80.28 %
Epoch [2/20], Step [704/704], Loss: 0.7658
Accuracy of the network on the 5000 validation images: 83.44 %
Epoch [3/20], Step [704/704], Loss: 0.0689
Accuracy of the network on the 5000 validation images: 84.78 %
Epoch [4/20], Step [704/704], Loss: 0.3098
Accuracy of the network on the 5000 validation images: 83.78 %
Epoch [5/20], Step [704/704], Loss: 0.2478
Accuracy of the network on the 5000 validation images: 84.08 %
Epoch [6/20], Step [704/704], Loss: 0.0839
Accuracy of the network on the 5000 validation images: 81.9 %
Epoch [7/20], Step [704/704], Loss: 0.1804
Accuracy of the network on the 5000 validation images: 82.72 %
Epoch [8/20], Step [704/704], Loss: 0.5683
Accuracy of the network on the 5000 validation images: 82.76 %
Epoch [9/20], Step [704/704], Loss: 0.4275
Accuracy of the network on the 5000 validation images: 80.3 %
Epoch [10/20], Step [704/704], Loss: 0.1408
Accu

## Test

Now, we see how our model performs on unseen data:

In [None]:
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

Accuracy of the network on the 10000 test images: 82.83 %


## Resources

- [Writing AlexNet from Scratch in PyTorch](https://blog.paperspace.com/alexnet-pytorch/#training)
- [AlexNet | PyTorch](https://pytorch.org/hub/pytorch_vision_alexnet/)
- [Classify CIFAR10 images using pretrained AlexNet with PyTorch](https://www.youtube.com/watch?v=BrwJp-JuIOw)
