# Convolutional Neural Network with PyTorch

In the first part of the MP, you have successfully implemented a convolutional neural network from scratch using NumPy. Now that you have a good understanding of the fundamentals and underlying concepts, it's time to dive into the world of deep learning frameworks.

In this part, you will learn how to implement a convolutional neural network using **[PyTorch](https://pytorch.org/)**. You will learn how to define a network architecture, instantiate a network object, train the network, and evaluate the network on the test data. We will also cover essential topics such as automatic differentiation, loss functions, and optimizers. By the end of this part, you will be able to implement, train, and evaluate neural networks using PyTorch with ease.

Get ready to explore the exciting world of PyTorch and enhance your deep learning skills!

### Submission Instructions

- You can convert this notebook into a Python file and submit it to Gradescope for manual grading. For example, you can use the menu `File -> Download as -> Python (.py)` (Jupyter Notebook) or `File -> Export Notebook As -> Export Notebook to Executable Script` (JupyterLab) to convert this notebook to a Python file. Make sure to clean up the python script before submitting to gradescope. We should be able to run it simply via `python <script_name.py>`.
- When submitting the prediction and the Python files, the files **must** be named exactly as instructed.
- You can make multiple submissions but only the **latest** score will be used. Keep a copy of your previous predictions in case you want to revert to them. We are not policing the number of submissions you make to the test set, but we request that you limit to 1 submission every day. You shouldn't be tuning your model on the test set.


## 0. Prerequisites

### Installing PyTorch

Before we dive into the main part of the MP, let's first make sure that you have all the necessary packages installed. If you are using Google Colab, you can skip this section as the packages are already installed. If you are using your local machine, follow the official instructions [here](https://pytorch.org/get-started/locally/). Install the GPU-compatible version if you have a compatible NVIDIA GPU. You will also need to install these additional packages: `torch torchvision Pillow tqdm matplotlib`. Run the following cell to test your installation (and import other necessary packages):


In [None]:
import os
import json
import random

from PIL import Image
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision import transforms as T
from torchvision.models import resnet18, ResNet18_Weights
from torchvision.utils import make_grid
from matplotlib import pyplot as plt

print(f"PyTorch version: {torch.__version__}")
print(f"torchvision version: {torchvision.__version__}")

### Hardware Acceleration

Graphics Processing Units (GPUs) are specialized hardware devices that can significantly speed up the training the inference of deep learning models. We strongly recommend using a GPU for this MP to save training time, though it is not required.

If you are using Google Colab, you can enable free GPU acceleration by navigating to `Runtime -> Change runtime type -> Hardware accelerator -> GPU`. If you are using your local machine, you must have a compatible GPU. Run the following cell to check if you have a compatible GPU (`cuda` for GPU):


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

### Downloading the Dataset

In this MP, we will be working with the `AnimalDataset`. The dataset contains 10 different kinds of animals, and each data point is a 256 by 256 RGB image. The images are cropped from the [MS-COCO dataset](https://cocodataset.org/#home). The train, valid, and test splits contain 5347, 1330, and 1332 images, respectively.

If you are on a Unix-based system (macOS or Linux or Google Colab), you can run the following cell to download the dataset. If you are using Windows, you should manually download the dataset from [here](https://saurabhg.web.illinois.edu/teaching/cs444/fa2023/mp2/dataset-v1.tgz) and extract the compressed file to the current directory. You should see a `dataset-v1` folder containing the dataset.


In [None]:
!wget https://saurabhg.web.illinois.edu/teaching/cs444/fa2023/mp2/dataset-v1.tgz -O dataset-v1.tgz
!tar -xf dataset-v1.tgz

## 1. Dataset

As you embark on your deep learning journey with PyTorch, one of the essential components you will encounter is the Dataset class. Datasets play a crucial role in the development of machine learning models, as they provide the necessary information for training, validation, and testing. In PyTorch, the Dataset class simplifies the process of handling data by offering a unified and efficient way to manage and preprocess your data.

The PyTorch Dataset class is an abstract class, and to utilize it, you will need to create a custom dataset by subclassing and implementing two key methods: `__len__()` and `__getitem__()`. The `__len__()` method returns the size of the dataset, while the `__getitem__()` method retrieves a sample from the dataset given an index. By providing a standardized interface for accessing your data, the Dataset class enables seamless integration with other PyTorch components, such as DataLoader, which streamlines the process of loading and batching your data during training.


In [None]:
class AnimalDataset(Dataset):
    """
    Dataset containing 10 different kinds of animals. Each data point is a 256 by 256 RGB image.
    The images are cropped from the MS-COCO dataset. (https://cocodataset.org/#home)
    The train, valid, and test splits contain 5347, 1330, and 1332 images respectively.
    """

    classes = (
        "bird",
        "cat",
        "dog",
        "horse",
        "sheep",
        "cow",
        "elephant",
        "bear",
        "zebra",
        "giraffe",
    )

    def __init__(self, root, split, transform=None):
        """
        Args:
            root (str): The root directory of the dataset.
            split (str): The split to use. Can be 'train', 'val', or 'test'.
            transform (callable, optional): A function/transform that takes in an PIL image
                and returns a transformed version. See `torchvision.transforms` for examples.
        """
        self.root = root
        self.split = split
        self.transform = transform

        with open(os.path.join(root, "labels", f"{split}.json")) as f:
            self.image_list = list(json.load(f).items())

    def __len__(self):
        """
        This method is called when you do len(dataset) to get the size of the dataset.
        Usually you should implement this method.
        Returns:
            The number of data points in the split.
        """
        return len(self.image_list)

    def __getitem__(self, idx):
        """
        This method return a data point specified by the index.
        You MUST implement this method when inheriting `Dataset`.
        Args:
            idx (int): The index of the data point to get.
        Returns:
            image (Tensor): The image of the data point.
            label (int): The label of the data point.
        """
        image_path, label = self.image_list[idx]
        image = Image.open(os.path.join(self.root, "images", f"{image_path}.jpg"))
        if self.transform is not None:
            image = self.transform(image)
        return image, label

In [None]:
root = "dataset-v1"
# This transforms the PIL images to PyTorch tensors
transform = T.Compose(
    [
        T.ToTensor(),
    ]
)
test_transform = T.Compose(
    [
        T.ToTensor(),
    ]
)
train_dataset = AnimalDataset(root=root, split="train", transform=transform)
valid_dataset = AnimalDataset(root=root, split="val", transform=test_transform)
print(f"Train dataset size: {len(train_dataset)}")
print(f"Valid dataset size: {len(valid_dataset)}")

In [None]:
# Visualize some examples from the dataset.
# You can run this cell multiple times to see different examples!
n_samples = 64
indices = random.choices(range(len(train_dataset)), k=n_samples)
sample_img = [train_dataset[i][0] for i in indices]
grid_img = make_grid(sample_img, nrow=8)
plt.figure(figsize=(10, 10))
plt.imshow(grid_img.permute(1, 2, 0))
plt.axis("off")
plt.title("Training Samples")
plt.show()

## 2. DataLoader

The PyTorch `DataLoader` works hand-in-hand with the `Dataset` class. By wrapping a `Dataset` object with a `DataLoader`, you can easily automate essential tasks such as shuffling, batching, and parallel processing. This not only saves you time and effort but also ensures that your data is loaded and prepared optimally for the training process.

One of the key features of the `DataLoader` is its support for multiprocessing. By utilizing multiple workers, `DataLoader` can efficiently parallelize the loading and preprocessing of your data, significantly reducing the overall time it takes to prepare your data for training. This is especially useful when dealing with large datasets or complex data preprocessing pipelines.

Run the following cell to create a `DataLoader` for the training and validation sets.


In [None]:
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False)
sample_img, sample_lbl = next(iter(train_loader))
print(f"Image batch shape: {sample_img.size()}")
print(f"Label batch shape: {sample_lbl.size()}")

## 3. Model

As you delve deeper into deep learning with PyTorch, you will come across one of its most powerful and versatile components: the `nn.Module` class. This class serves as the foundation for creating and managing neural networks in PyTorch, providing a flexible and modular approach to building a wide variety of neural network architectures.

The `nn.Module` class is an abstract base class, and to leverage its power, you will need to subclass it and define your custom neural network layers and components. By implementing the `__init__()` method, you can initialize the layers and parameters of your network, while the `forward()` method defines the forward pass of your model, specifying how the input data flows through the layers to produce the output.

One of the key advantages of the `nn.Module` class is its support for automatic differentiation and backpropagation. By encapsulating your model within an `nn.Module`, you can seamlessly integrate with PyTorch's autograd system, allowing you to perform gradient-based optimization with minimal effort. Furthermore, the `nn.Module` class provides built-in methods for parameter management, serialization, and device handling, making it easy to work with complex models in a distributed or GPU-accelerated environment.

By mastering the `nn.Module`, you will be well-equipped to tackle diverse deep learning challenges and create state-of-the-art models with ease. In this introduction to PyTorch's `nn.Module`, we will guide you through the process of creating and customizing your neural networks using this powerful class. You will define a Convolutional Neural Network (CNN) by **implementing the `__init__()` and `forward()` methods**. Check PyTorch's official documentation at https://pytorch.org/docs/stable/nn.html for more information on the `nn.Module` class and other rich builtin layers.

In this cell, we define a simple multi-layer linear classifier (multi-layer percetron, or MLP) for you to get started. You will implement a convolutional neural network in the later section.


In [None]:
class MLP(nn.Module):
    """
    A simple multi-layer perceptron for classifying images in the AnimalDataset.
    All models in PyTorch should inherit from `nn.Module` that provides functionality
    for automatic differentiation and weight management.
    """

    def __init__(self):
        """
        Define the layers used in model.
        """
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(3 * 256 * 256, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
        )

    def forward(self, x):
        """
        Forward pass of the model.
        Apply the layers defined in `__init__`.
        Args:
            x (Tensor): The input tensor of shape (N, 3, 256, 256).
        Returns:
            output (Tensor): The output tensor of shape (N, 10).
        """
        x = x.view(-1, 3 * 256 * 256)
        x = self.fc(x)
        return x

In [None]:
# Create a model instance and move it to the GPU if available
model = MLP().to(device)
print(model)
print(f"Model has {sum(p.numel() for p in model.parameters())} parameters.")

## 4. Loss Function & Optimizer & Scheduler

Just as you have done in the previous part of this MP, we will need to define a loss function, an optimizer, and an (optional) scheduler to train our model. Luckily, PyTorch provides a wide variety of built-in [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions), [optimizers](https://pytorch.org/docs/stable/optim.html), and [schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate), making it easy to implement and experiment with different combinations. Run the following cell to create these instances.


In [None]:
lr = 1e-3
gamma = 0.9

# Define the loss function
criterion = nn.CrossEntropyLoss()
# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# Define the learning rate scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=gamma)

## 5. Training & Validation

Now that we have defined our model, loss function, optimizer, and scheduler, we are ready to train our model! The training and validation scheme is essentially the same as you have implemented in the first part of this MP.


In [None]:
def evaluate(model, data_loader):
    """Evaluate the model on the given dataset."""
    # Set the model to evaluation mode.
    model.eval()
    correct = 0
    # The `torch.no_grad()` context will turn off gradients for efficiency.
    with torch.no_grad():
        for images, labels in tqdm(data_loader):
            images, labels = images.to(device), labels.to(device)
            output = model(images)
            pred = output.argmax(dim=1)
            correct += (pred == labels).sum().item()
    return correct / len(data_loader.dataset)


def train(model, n_epoch, optimizer, scheduler):
    """Train the model on the given dataset."""
    for epoch in range(n_epoch):
        # Set the model to training mode.
        model.train()
        for step, (images, labels) in enumerate(train_loader):
            # 0. Prepare the data. Move the data to the device (CPU/GPU).
            images, labels = images.to(device), labels.to(device)
            # 1. Clear previous gradients.
            optimizer.zero_grad()
            # 2. Forward pass. Calculate the output of the model.
            output = model(images)
            # 3. Calculate the loss.
            loss = criterion(output, labels)
            # 4. Calculate the gradients. PyTorch does this for us!
            loss.backward()
            # 5. Update the model parameters.
            optimizer.step()
            if step % 10 == 0:
                print(f"Epoch {epoch}, Step {step}, Loss {loss.item():.4f}")
        # 6. (Optional) Update the learning rate.
        scheduler.step()
        acc = evaluate(model, valid_loader)
        print(f"Epoch {epoch}, Valid Accuracy {acc * 100:.2f}%")

In [None]:
# Train the model for 10 epochs.
# You should get an accuracy about 30% on the validation set after training.
n_epoch = 10
train(model, n_epoch, optimizer, scheduler)

## 6. Convolutional Neural Network

In the previous section, you have successfully trained a CNN model on the AnimalDataset. However, the performance of the model is still far from satisfactory. In this section, you will implement a convolutional neural network using the pre-defined layers in PyTorch. **Complete the `__init__` and `forward` methods**. You may refer to the [official documentation](https://pytorch.org/docs/stable/nn.html#conv2d) for more information on the `nn.Conv2d` layer.


In [None]:
class CNN(nn.Module):
    """
    A simple CNN for classifying images in the AnimalDataset.
    """

    def __init__(self):
        """
        Define the layers used in model.
        Hints:
        - Checkout `nn.Conv2d`, `nn.MaxPool2d`, `nn.Linear`, `nn.ReLU`
            in the PyTorch documentation.
        - You may use `nn.Sequential` to chain multiple layers together.
        - Be careful about the input and output shapes of the layers! Print `x.size()` if unsure.

        1. 1st CNN layer:
            - 2D Convolutional with input channels 3, output channels 8, kernel size 5, stride 1, and padding 2.
            - ReLU activation.
            - 2D Max pooling with kernel size 4 and stride 4.
        2. 2nd CNN layer:
            - 2D Convolutional with input channels 8, output channels 16, kernel size 5, stride 1, and padding 2.
            - ReLU activation.
            - 2D Max pooling with kernel size 4 and stride 4.
        3. 3rd CNN layer:
            - 2D Convolutional with input channels 16, output channels 32, kernel size 3, stride 1, and padding 1.
            - ReLU activation.
            - 2D Max pooling with kernel size 4 and stride 4.
        4. A flatten layer. The flattened feature should have shape (N, 32 * 4 * 4).
        5. A fully connected layer with 256 output units and ReLU activation.
        6. A fully connected layer with 10 output units.
        """
        super().__init__()

    def forward(self, x):
        """
        Forward pass of the model.
        Apply the layers defined in `__init__` in order.
        Args:
            x (Tensor): The input tensor of shape (N, 3, 256, 256).
        Returns:
            output (Tensor): The output tensor of shape (N, 10).
        """
        pass

Run the following cell to test your implementation.


In [None]:
# Create a model instance and move it to the GPU if available
model = CNN().to(device)
print(model)
print(f"Model has {sum(p.numel() for p in model.parameters())} parameters.")

dummy_input = torch.randn(1, 3, 256, 256, device=device, dtype=torch.float)
output = model(dummy_input)
assert output.size() == (1, 10), f"Expected output size (1, 10), got {output.size()}!"
print("Test passed!")

In [None]:
lr = 1e-3
gamma = 0.9
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=gamma)

# Train the model for 10 epochs.
# You should get an accuracy about 50%.
n_epoch = 10
train(model, n_epoch, optimizer, scheduler)

## 7. Hyperparameter Tuning

Though the performance has been significantly improved with a CNN model, the performance of the model is still far from satisfactory. In this section, we will explore how to tune the hyperparameters of the model to achieve better performance.
We will explore how to tune the hyperparameters of the model to achieve better performance. You may test with the following settings of the model:

- **Model architecture**: You may add more convolutional layers to the model or increase the number of hidden channels.
- **Training hyperparameters**: You may try different hyperparameters including learning rate and batch size.
- **Data augmentation**: You may apply data augmentation techniques such as random cropping, random flipping, and random color jittering to the training set. See [here](https://pytorch.org/vision/stable/transforms.html) for more information.
- **Batch normalization**: You may add batch normalization layers to the model to improve the training process. See `nn.BatchNorm2d` for more information.

  7.1 **[2 pts Autograded]**
  You should be able to achieve an accuracy **above 60%** after tuning the hyperparameters. After you are satisfied with the performance of your model, run the following cells to evaluate generate predicted labels for the test set. Note that all labels we provided for the test set were set to -1. **Submit the output prediction with the name `pred_custom_cnn.txt` on Gradescope** to obtain its performance on the test set. Also, **upload a `script_custom_cnn.py` file of your CNN model architecture on Gradescope**. Feel free to experiment with different settings and see how the performance changes. **You are not allowed to load any pre-trained model in this section**.

  7.2 **[2 pts Manually Graded]**
  Document the hyperparameters and/or improvement techniques you applied in your report and discuss your findings. Include _control experiments_ that measure the effectiveness of each aspect that lead to large improvements. For example, if you are trying to improve the performance of your model by adding more convolutional layers, you should include a control experiment that measures the performance of the model with and without the additional convolutional layers. It is insightful to do backward ablations: starting with your final model, remove each modification you made one at a time to measure its contribution to the final performance. Consider presenting your results in tabular form along with a discussion of the results.


In [None]:
# Modify the model architecture, training hyperparameters, and use data augmentation techniques etc. to improve the performance.
# Dataset
root = "dataset-v1"
transform = T.Compose(
    [
        T.ToTensor(),
    ]
)
test_transform = T.Compose(
    [
        T.ToTensor(),
    ]
)
train_dataset = AnimalDataset(root=root, split="train", transform=transform)
valid_dataset = AnimalDataset(root=root, split="val", transform=test_transform)
print(f"Train dataset size: {len(train_dataset)}")
print(f"Valid dataset size: {len(valid_dataset)}")

# Model
model = CNN().to(device)
print(model)
print(f"Model has {sum(p.numel() for p in model.parameters())} parameters.")
lr = 1e-3
gamma = 0.9
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=gamma)

# Training
n_epoch = 10
train(model, n_epoch, optimizer, scheduler)

In [None]:
def inference(model, data_loader, output_fn="predictions.txt"):
    """Generate predicted labels for the test set."""
    model.eval()
    predictions = []
    with torch.no_grad():
        for images, _ in tqdm(data_loader):
            images = images.to(device)
            output = model(images)
            pred = output.argmax(dim=1)
            predictions.extend(pred.cpu().numpy())
    with open(output_fn, "w") as f:
        for pred in predictions:
            f.write(f"{pred}\n")
    print(f"Predictions saved to {output_fn}")
    return predictions


test_dataset = AnimalDataset(root=root, split="test", transform=test_transform)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
inference(model, test_loader, "pred_custom_cnn.txt")

## 8. Finetuning Pre-Trained ResNet-18 Model

In the previous section, you have successfully trained a CNN model from scratch. However, training a model from scratch is often time-consuming and computationally expensive. Luckily, we can leverage the power of transfer learning to speed up the training process and achieve better performance. In this section, we will explore how to finetune a pre-trained model using PyTorch.

In this section, you can use the [ResNet-18](https://arxiv.org/abs/1512.03385) model pre-trained on the [ImageNet](https://www.image-net.org/) dataset. Finetune the model on the AnimalDataset and evaluate the model on the test set. See [here](https://pytorch.org/docs/stable/torchvision/models.html) for more information on the pre-trained models provided by PyTorch.

8.1 **[2 pts Autograded]**
You should be able to get an accuracy **above 90%** in order to receive the full points. **Submit the output prediction with the name `pred_resnet_ft.txt` on Gradescope** to obtain its performance on the test set. Also, **upload a `script_resnet_ft.py` file of your finetuning code on Gradescope**.

8.2 **[1 pt Extra credit, manually graded]**
We have set up another leaderboard for this section. Feel free to experiment with different settings and see how the performance changes. You can only use pre-trained ResNet-18 model in this section. Students with the top 5 highest accuracy will receive 1 point extra credit. Late submissions (i.e. after Oct 5 11:59:59PM) will not be eligible for this extra credit.


In [None]:
# Enter your code here