# Welcome to CS 5242 **Assignment 2**

ASSIGNMENT DEADLINE ⏰ : ** 3 March 2024** 

In this assignment, we have three parts:
1. Implement some operations in CNNs from scratch *(2 Points)*
2. Implement a simple CNN and train on MNIST using PyTorch  *(4 Points)*
3. Implement a VGG network with PyTorch *(4 Points)*

Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs. In this semester, we will use Colab to run our experiments.
1. Login Google Colab https://colab.research.google.com/
2. In this assignment, We **need GPU** to training the CNN model. You may need to **choose GPU in Runtime -> Change runtime type -> Hardware accerator**
![Alt text](image.png)


### **Grades Policy**

We have 10 points for this homework. 15% off per day late, 0 scores if you submit it 7 days after the deadline.

### **Cautions**

**DO NOT** copy the code from the internet, e.g. GitHub.
---

**DO NOT** use any LLMs to write the code, e.g. ChatGPT.
---

### **Contact**

Please feel free to contact us if you have any question about this homework or need any further information.

Slack: Wangbo Zhao


> If you have not join the slack group, you can click [here](https://join.slack.com/t/cs5242-2024spring/shared_invite/zt-2cw3jgqab-wFhoaIVa4RIX4fCZ_k~vjQ)

## Setup

Start by running the cell below to set up all required software.

In [None]:
!pip install numpy matplotlib 
!pip install torch torchvision

Import the neccesary library and fix seed for Python, NumPy and PyTorch.

In [None]:
import math
import random

import numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

random.seed(0)
np.random.seed(0)
torch.manual_seed(0)

Now let's setup the GPU environment. The colab provides a free GPU to use. Do as follows:

- Runtime -> Change Runtime Type -> select `GPU` in Hardware accelerator
- Click `connect` on the top-right

After connecting to one GPU, you can check its status using `nvidia-smi` command.

In [None]:
!nvidia-smi

torch.cuda.is_available()

Everything is ready, you can move on and ***Good Luck !*** 😃

## Implement some operations in CNNs from scratch

In this section, you need to implement some operations commonly used in CNNs, including convolution and pooling. 

You need to compare the computational results of your implemented version with those of Pytorch, expecting that the error between the correct implementation and pytorch will be very small.


### Step 1
Given a 32x32 pixels, 3 channels input, get a torch tensor with torch.randn().

In [None]:
batch_size = 2
c = 3
h = 32
w = 32
x = torch.randn(batch_size, c, h, w)
print(x)
print(x.shape)

### Step 2
We first implement these operations with Pytorch so that we can compare the computational results of our implemented version with those of original pytorch.


In [None]:

# 1. Build a max pooling layer torch_max_pool with Pytorch. The kernel size of the pooling is 2, the stride is 2, and there is not any padding.
torch_max_pool = nn.MaxPool2d(kernel_size=2,
                              stride=2,
                              padding=0)

# 2. Build a average pooling layer torch_avg_pool with Pytorch. The kernel size of the pooling is 2, the stride is 1. The padding shoulbd be set to 1.
torch_avg_pool = nn.AvgPool2d(kernel_size=2,
                              stride=1,
                              padding=1)

# 3.Build a 2D convolutional layer torch_conv with Pytorch. The kernel size of the convolution is 3. Stride is 1. The input channel and output channel should be set to 3 and 64, respectively. We use zero padding to keep the spatial size of the output feature.
torch_conv = nn.Conv2d(in_channels=3,
                       out_channels=64,
                       kernel_size=3,
                       stride=1,
                       padding=1)

# 2D batchnorm with channel=3
torch_norm = nn.BatchNorm2d(3)

In [None]:
torch_max_pool_out = torch_max_pool(x)
print(torch_max_pool_out.shape)

torch_avg_pool_out = torch_avg_pool(x)
print(torch_avg_pool_out.shape)

torch_conv_out = torch_conv(x)
print(torch_conv_out.shape)

torch_norm_out = torch_norm(x)
print(torch_norm_out.shape)

### Step 3

Implement these operations from scratch. Output your tensors as "my_xxx_out".

In [None]:
def my_max_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window to take a max over, 
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    y = None
    # === Complete the code (0.5')

    # === Complete the code
    return y

In [None]:
def my_avg_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window, 
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    y = None
    # === Complete the code (0.5')

    # === Complete the code
    return y

In [None]:
def my_conv(x, in_channels, out_channels, kernel_size, stride, padding, weight, bias):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        in_channels: number of channels in the input image, it is C_in;
        out_channels: number of channels produced by the convolution;
        kernel_size: size of onvolving kernel, 
        stride: stride of the convolution,
        padding: implicit zero padding to be added on both sides of each dimension,
        
    Return:
        y: torch tensor of size (N, C_out, H_out, W_out)
    """

    y = None
    # === Complete the code (0.5')

    # === Complete the code
    return y

In [None]:
def my_batchnorm(x, num_features, eps=1e-5):
    """
    Args:
        x: torch tensor with size (N, C, H, W),
        num_features: number of features in the input tensor, it is C;
        eps: a value added to the denominator for numerical stability. Default: 1e-5
        
    Return:
        y: torch tensor of size (N, C, H, W)
    """

    y = torch.empty_like(x)
    # === Complete the code (0.5')

    # === Complete the code
    return y

In [None]:
my_max_pool_out = my_max_pool(x, kernel_size=, stride=, padding=)
my_avg_pool_out = my_avg_pool(x, kernel_size=, stride=, padding=)
my_conv_out = my_conv(x,
                      in_channels=,
                      out_channels=,
                      kernel_size=,
                      stride=,
                      padding=)
my_norm_out = my_batchnorm(x, num_features=, eps=1e-5)

### Step 4

Compare and show that "torch_xxx_out" and "my_xxx_out" are equal up to small numerical errors.

In [None]:
print(F.mse_loss(my_max_pool_out, torch_max_pool_out))
print(F.mse_loss(my_avg_pool_out, torch_avg_pool_out))
print(F.mse_loss(my_conv_out, torch_conv_out))
print(F.mse_loss(my_norm_out, torch_norm_out))

## Implement a simple CNN and train it on MNIST using PyTorch

### Step 1
Create datasets. The MNIST data set is composed of handwritten digit images and digit labels from 0 to 9. It consists of 60,000 training samples and 10,000 test samples. Each sample is a 28 * 28 pixel grayscale handwritten digit image.

In [None]:
import torch 
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

train_set = torchvision.datasets.FashionMNIST(
    root = 'FashionMNIST/',
    train = True,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
)

test_set = torchvision.datasets.FashionMNIST(
    root = 'FashionMNIST/',
    train = False,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=100)

### Step 2
Create the model.
You can build a simple convolutional neural network to conduct the classification. You may refine the architecture based on the accuracy. You can also try different learning rates.
**The test accuracy should achieve 85%.**


In [None]:
class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()


    def forward(self, input):

        return t

network = Network()
if torch.cuda.is_available():
    network = network.cuda()
    

optimizer = optim.Adam(network.parameters(), lr=0.01)

### Step 3

Build the train and test loops

In [None]:
for epoch in range(10):
    total_loss = 0
    total_correct = 0
    for batch in train_loader:  
        images, labels = batch
        if torch.cuda.is_available():
            images = images.cuda()
            labels = labels.cuda()

        optimizer.zero_grad()  
        preds = network(images)
        loss = F.cross_entropy(preds, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _,prelabels=torch.max(preds,dim=1)
        total_correct += (prelabels==labels).sum().item()
    accuracy = total_correct/len(train_set)
    print("Epoch:%d  ,  Loss:%f  , Train Accuracy:%f "%(epoch, total_loss, accuracy * 100))


correct=0
total=0
network.eval()
with torch.no_grad():
    for batch in test_loader:
        imgs,labels=batch
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        preds=network(imgs)
        _,prelabels=torch.max(preds,dim=1)
        #print(prelabels.size())
        total=total+labels.size(0)
        correct=correct+int((prelabels==labels).sum())
    #print(total)
    accuracy=correct / total
    print("Test Accuracy: ", accuracy * 100)

# Implement a VGG network with PyTorch 
VGG is a type of CNN (Convolutional Neural Network) that was considered to be one of the best computer vision models in 2015.
https://arxiv.org/abs/1409.1556

Here is the configuration of the network from its paper. Now, you need to implement **Config C** it with Pytorch.

![Alt text](image-1.png)

In [None]:
import torch
from torch import nn

class VGG(nn.Module):
    def __init__(self, ) -> None:
        super().__init__()
    
    def forward(image):
        
        return x


Then, please calculate the number of parameters and FLOPs (Floating point operations) of **Config C**.
You can only consider the FLOPs of the convolution and FC in **Config C**.
