# Welcome to CS 5242 **Assignment 2**

ASSIGNMENT DEADLINE ⏰ : ** 3 March 2024**

In this assignment, we have three parts:
1. Implement some operations in CNNs from scratch *(2 Points)*
2. Implement a simple CNN and train on MNIST using PyTorch  *(4 Points)*
3. Implement a VGG network with PyTorch *(4 Points)*

Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs. In this semester, we will use Colab to run our experiments.
1. Login Google Colab https://colab.research.google.com/
2. In this assignment, We **need GPU** to training the CNN model. You may need to **choose GPU in Runtime -> Change runtime type -> Hardware accerator**
![Alt text](image.png)


### **Grades Policy**

We have 10 points for this homework. 15% off per day late, 0 scores if you submit it 7 days after the deadline.

### **Cautions**

**DO NOT** copy the code from the internet, e.g. GitHub.
---

**DO NOT** use any LLMs to write the code, e.g. ChatGPT.
---

### **Contact**

Please feel free to contact us if you have any question about this homework or need any further information.

Slack: Wangbo Zhao


> If you have not join the slack group, you can click [here](https://join.slack.com/t/cs5242-2024spring/shared_invite/zt-2cw3jgqab-wFhoaIVa4RIX4fCZ_k~vjQ)

## Setup

Start by running the cell below to set up all required software.

In [1]:
!pip install numpy matplotlib
!pip install torch torchvision



Import the neccesary library and fix seed for Python, NumPy and PyTorch.

In [2]:
import math
import random

import numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

random.seed(0)
np.random.seed(0)
torch.manual_seed(0)

<torch._C.Generator at 0x7e84383626f0>

Now let's setup the GPU environment. The colab provides a free GPU to use. Do as follows:

- Runtime -> Change Runtime Type -> select `GPU` in Hardware accelerator
- Click `connect` on the top-right

After connecting to one GPU, you can check its status using `nvidia-smi` command.

In [3]:
!nvidia-smi

torch.cuda.is_available()

/bin/bash: line 1: nvidia-smi: command not found


False

Everything is ready, you can move on and ***Good Luck !*** 😃

## Implement some operations in CNNs from scratch

In this section, you need to implement some operations commonly used in CNNs, including convolution and pooling.

You need to compare the computational results of your implemented version with those of Pytorch, expecting that the error between the correct implementation and pytorch will be very small.


### Step 1
Given a 32x32 pixels, 3 channels input, get a torch tensor with torch.randn().

In [4]:
batch_size = 2
c = 3
h = 32
w = 32
x = torch.randn(batch_size, c, h, w)
print(x)
print(x.shape)

tensor([[[[-1.1258, -1.1524, -0.2506,  ...,  1.5863,  0.9463, -0.8437],
          [-0.6136,  0.0316, -0.4927,  ..., -1.2341,  1.8197, -0.5515],
          [-0.5692,  0.9200,  1.1108,  ..., -0.9565,  0.0335,  0.7101],
          ...,
          [ 1.0166,  1.2868,  2.0820,  ...,  0.8161, -0.5711, -0.1195],
          [-0.4274,  0.8143, -1.4121,  ..., -0.1394, -0.3677, -0.4574],
          [-1.2945,  0.7012, -1.9098,  ...,  0.5374,  1.0826, -1.7105]],

         [[-1.0841, -0.1287, -0.6811,  ..., -0.9825,  0.7184,  0.4402],
          [-0.5619,  0.6640, -2.1033,  ..., -0.7821, -2.1407,  0.3337],
          [-1.1230,  0.6210, -0.8764,  ...,  0.9159,  0.2990,  0.1771],
          ...,
          [ 2.2746, -0.9119,  0.5105,  ...,  0.4876, -0.9265, -0.5748],
          [ 0.7300, -0.9287,  0.1743,  ..., -0.7073, -0.8813, -0.5895],
          [-0.8363, -1.8354,  0.4765,  ..., -0.3812, -1.6687,  1.0869]],

         [[ 0.6657,  0.8847,  0.4671,  ...,  0.7709, -0.8416,  1.7962],
          [ 0.1924, -0.1777,  

### Step 2
We first implement these operations with Pytorch so that we can compare the computational results of our implemented version with those of original pytorch.


In [5]:

# 1. Build a max pooling layer torch_max_pool with Pytorch. The kernel size of the pooling is 2, the stride is 2, and there is not any padding.
torch_max_pool = nn.MaxPool2d(kernel_size=2,
                              stride=2,
                              padding=0)

# 2. Build a average pooling layer torch_avg_pool with Pytorch. The kernel size of the pooling is 2, the stride is 1. The padding shoulbd be set to 1.
torch_avg_pool = nn.AvgPool2d(kernel_size=2,
                              stride=1,
                              padding=1)

# 3.Build a 2D convolutional layer torch_conv with Pytorch. The kernel size of the convolution is 3. Stride is 1. The input channel and output channel should be set to 3 and 64, respectively. We use zero padding to keep the spatial size of the output feature.
torch_conv = nn.Conv2d(in_channels=3,
                       out_channels=64,
                       kernel_size=3,
                       stride=1,
                       padding=1)

# 2D batchnorm with channel=3
torch_norm = nn.BatchNorm2d(3)

In [6]:
torch_max_pool_out = torch_max_pool(x)
print(torch_max_pool_out.shape)

torch_avg_pool_out = torch_avg_pool(x)
print(torch_avg_pool_out.shape)

torch_conv_out = torch_conv(x)
print(torch_conv_out.shape)

torch_norm_out = torch_norm(x)
print(torch_norm_out.shape)

torch.Size([2, 3, 16, 16])
torch.Size([2, 3, 33, 33])
torch.Size([2, 64, 32, 32])
torch.Size([2, 3, 32, 32])


### Step 3

Implement these operations from scratch. Output your tensors as "my_xxx_out".

In [7]:
def my_max_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window to take a max over,
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,

    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    y = None
    # === Complete the code (0.5')

    N, C_in, H_in, W_in = x.shape
    H_out = (H_in - kernel_size + 2*padding)//stride + 1
    W_out = (W_in - kernel_size + 2*padding)//stride + 1

    pad = (padding,padding,padding,padding)
    padded_x = F.pad(x, pad, mode='constant', value=0)

    y = torch.zeros((N, C_in, H_out, W_out))

    for i in range(N):
        for j in range(C_in):
            for k in range(0, H_in - kernel_size + 1, stride):
            # for k in range(0, 1):
                for l in range(0, W_in - kernel_size + 1, stride):
                # for l in range(0, 1):
                    temp = padded_x[i, j, k:k+kernel_size, l:l+kernel_size]
                    # print(temp)
                    # print(temp.shape)
                    y[i, j, k//stride, l//stride] = torch.max(temp)
    # print(y)
    # print(y.shape)

    # === Complete the code
    return y

In [8]:
def my_avg_pool(x, kernel_size, stride, padding):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        kernel_size: size of the window,
        stride: stride of the window,
        padding: implicit zero padding to be added on both sides,

    Return:
        y: torch tensor of size (N, C_out, H_out, W_out).
    """

    y = None
    # === Complete the code (0.5')

    N, C_in, H_in, W_in = x.shape
    H_out = (H_in - kernel_size + 2*padding)//stride + 1
    W_out = (W_in - kernel_size + 2*padding)//stride + 1

    pad = (padding,padding,padding,padding)
    padded_x = F.pad(x, pad, mode='constant', value=0)
    # print(padded_x)
    # print(padded_x.shape)

    y = torch.zeros((N, C_in, H_out, W_out), dtype=torch.float64)

    for i in range(N):
        for j in range(C_in):
            for k in range(0, H_in - kernel_size + 1 + 2*padding, stride):
              # for k in range(0, 1):
                for l in range(0, W_in - kernel_size + 1 + 2*padding, stride):
                  # for l in range(0, 1):
                    temp = padded_x[i, j, k:k+kernel_size, l:l+kernel_size].to(dtype=torch.float64)
                    # print(temp)
                    # print(temp.shape)
                    average = torch.mean(temp, dtype=torch.float64)
                    y[i, j, k//stride, l//stride] = average.to(dtype=torch.float64)
    # print(y)
    # print(f"{y[0,0,0,0]:.10f}")
    # print(y.shape)

    # === Complete the code
    return y

In [9]:
def my_conv(x, in_channels, out_channels, kernel_size, stride, padding, weight, bias):
    """
    Args:
        x: torch tensor with size (N, C_in, H_in, W_in),
        in_channels: number of channels in the input image, it is C_in;
        out_channels: number of channels produced by the convolution;
        kernel_size: size of onvolving kernel,
        stride: stride of the convolution,
        padding: implicit zero padding to be added on both sides of each dimension,

    Return:
        y: torch tensor of size (N, C_out, H_out, W_out)
    """

    y = None
    # === Complete the code (0.5')

    N, C_in, H_in, W_in = x.shape
    # outchannels, in_channels, kernel_size, kernel_size = weight.shape
    H_out = (H_in - kernel_size + 2*padding)//stride + 1
    W_out = (W_in - kernel_size + 2*padding)//stride + 1

    pad = (padding,padding,padding,padding)
    padded_x = F.pad(x, pad, mode='constant', value=0)
    # print(padded_x)
    # print(padded_x.shape)

    y = torch.zeros((N, out_channels, H_out, W_out), dtype=torch.float64)

    for i in range(N):
        for j in range(out_channels):
            for k in range(0, H_in - kernel_size + 1 + 2*padding, stride):
            # for k in range(0,1):
                for l in range(0, W_in - kernel_size + 1 + 2*padding, stride):
                # for l in range(0,1):
                    temp = padded_x[i, :, k:k+kernel_size, l:l+kernel_size].to(dtype=torch.float64)
                    # print(temp)
                    # print(temp.shape)
                    # print(weight[j])
                    # print(weight[j].shape)
                    result = (torch.sum(temp * weight[j]) + bias[j]).to(dtype=torch.float64)
                    # 将结果写入输出张量
                    y[i, j, k//stride, l//stride] = result.to(dtype=torch.float64)
    # print(y)
    # print(f"{y[0,0,0,0]:.10f}")
    # print(y.shape)

    # === Complete the code
    return y

In [10]:
def my_batchnorm(x, num_features, eps=1e-5):
    """
    Args:
        x: torch tensor with size (N, C, H, W),
        num_features: number of features in the input tensor, it is C;
        eps: a value added to the denominator for numerical stability. Default: 1e-5

    Return:
        y: torch tensor of size (N, C, H, W)
    """

    y = torch.empty_like(x)
    # === Complete the code (0.5')

    mean = torch.mean(x, dim=(0, 2, 3), keepdim=True)
    # print(mean)
    # print(mean.shape)
    var = torch.var(x, dim=(0, 2, 3), keepdim=True, unbiased=True)
    # print(var)
    # print(var.shape)

    batch_norm = ((x - mean) / torch.sqrt(var + eps))
    # print(batch_norm)
    # print(batch_norm.shape)

    y = batch_norm
    # print(y)
    # print(f"{y[0,0,0,0]:.10f}")
    # print(y.shape)

    # === Complete the code
    return y

In [11]:
my_max_pool_out = my_max_pool(x, kernel_size=2, stride=2, padding=0)
my_avg_pool_out = my_avg_pool(x, kernel_size=2, stride=1, padding=1)
my_conv_out = my_conv(x,
                      in_channels=3,
                      out_channels=64,
                      kernel_size=3,
                      stride=1,
                      padding=1,
                      weight=torch_conv.weight.data,
                      bias=torch_conv.bias.data)
my_norm_out = my_batchnorm(x, num_features=3, eps=1e-5)

### Step 4

Compare and show that "torch_xxx_out" and "my_xxx_out" are equal up to small numerical errors.

In [12]:
print(F.mse_loss(my_max_pool_out, torch_max_pool_out))
print(F.mse_loss(my_avg_pool_out, torch_avg_pool_out))
print(F.mse_loss(my_conv_out, torch_conv_out))
print(F.mse_loss(my_norm_out, torch_norm_out))

tensor(0.)
tensor(3.8468e-16, dtype=torch.float64)
tensor(3.0318e-15, dtype=torch.float64, grad_fn=<MseLossBackward0>)
tensor(5.9602e-08, grad_fn=<MseLossBackward0>)


## Implement a simple CNN and train it on MNIST using PyTorch

### Step 1
Create datasets. The MNIST data set is composed of handwritten digit images and digit labels from 0 to 9. It consists of 60,000 training samples and 10,000 test samples. Each sample is a 28 * 28 pixel grayscale handwritten digit image.

In [13]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

train_set = torchvision.datasets.FashionMNIST(
    root = 'FashionMNIST/',
    train = True,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
)

test_set = torchvision.datasets.FashionMNIST(
    root = 'FashionMNIST/',
    train = False,
    download = True,
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=100)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 18876325.51it/s]


Extracting FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz to FashionMNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 344045.74it/s]


Extracting FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz to FashionMNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 6284542.74it/s]


Extracting FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to FashionMNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 4781283.66it/s]

Extracting FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to FashionMNIST/FashionMNIST/raw






### Step 2
Create the model.
You can build a simple convolutional neural network to conduct the classification. You may refine the architecture based on the accuracy. You can also try different learning rates.
**The test accuracy should achieve 85%.**


In [14]:
class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        self.max_pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()
        self.max_pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.fc1 = nn.Linear(20 * 7 * 7, 50)
        self.relu3 = nn.ReLU()

        self.fc2 = nn.Linear(50, 10)


    def forward(self, input):
        t = self.conv1(input)
        t = self.relu1(t)
        t = self.max_pool1(t)
        t = self.conv2(t)
        t = self.relu2(t)
        t = self.max_pool2(t)
        # print(t)
        # print(t.shape)
        t = t.view(-1, 20 * 7 * 7)
        t = self.fc1(t)
        t = self.relu3(t)
        t = self.fc2(t)

        return t

network = Network()
if torch.cuda.is_available():
    network = network.cuda()


optimizer = optim.Adam(network.parameters(), lr=0.001)

### Step 3

Build the train and test loops

In [15]:
for epoch in range(10):
    total_loss = 0
    total_correct = 0
    for batch in train_loader:
        images, labels = batch
        if torch.cuda.is_available():
            images = images.cuda()
            labels = labels.cuda()

        optimizer.zero_grad()
        preds = network(images)
        loss = F.cross_entropy(preds, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _,prelabels=torch.max(preds,dim=1)
        total_correct += (prelabels==labels).sum().item()
    accuracy = total_correct/len(train_set)
    print("Epoch:%d  ,  Loss:%f  , Train Accuracy:%f "%(epoch, total_loss, accuracy * 100))


correct=0
total=0
network.eval()
with torch.no_grad():
    for batch in test_loader:
        imgs,labels=batch
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        preds=network(imgs)
        _,prelabels=torch.max(preds,dim=1)
        #print(prelabels.size())
        total=total+labels.size(0)
        correct=correct+int((prelabels==labels).sum())
    #print(total)
    accuracy=correct / total
    print("Test Accuracy: ", accuracy * 100)

Epoch:0  ,  Loss:376.303199  , Train Accuracy:77.586667 
Epoch:1  ,  Loss:245.289699  , Train Accuracy:85.438333 
Epoch:2  ,  Loss:217.041146  , Train Accuracy:87.180000 
Epoch:3  ,  Loss:199.453494  , Train Accuracy:88.190000 
Epoch:4  ,  Loss:187.056528  , Train Accuracy:88.885000 
Epoch:5  ,  Loss:176.970339  , Train Accuracy:89.371667 
Epoch:6  ,  Loss:167.991592  , Train Accuracy:89.856667 
Epoch:7  ,  Loss:160.161007  , Train Accuracy:90.320000 
Epoch:8  ,  Loss:153.383451  , Train Accuracy:90.668333 
Epoch:9  ,  Loss:146.680733  , Train Accuracy:91.008333 
Test Accuracy:  89.58


# Implement a VGG network with PyTorch
VGG is a type of CNN (Convolutional Neural Network) that was considered to be one of the best computer vision models in 2015.
https://arxiv.org/abs/1409.1556

Here is the configuration of the network from its paper. Now, you need to implement **Config C** it with Pytorch.

![Alt text](image-1.png)

In [16]:
import torch
from torch import nn

class VGG(nn.Module):
    def __init__(self) -> None:
        super(VGG, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.bn1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.bn2 = nn.BatchNorm2d(64)
        self.max_pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.relu3 = nn.ReLU()
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.relu4 = nn.ReLU()
        self.bn4 = nn.BatchNorm2d(128)
        self.max_pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv5 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.relu5 = nn.ReLU()
        self.bn5 = nn.BatchNorm2d(256)
        self.conv6 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        self.relu6 = nn.ReLU()
        self.bn6 = nn.BatchNorm2d(256)
        self.conv7 = nn.Conv2d(256, 256, kernel_size=1, padding=1)
        self.relu7 = nn.ReLU()
        self.bn7 = nn.BatchNorm2d(256)
        self.max_pool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv8 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
        self.relu8 = nn.ReLU()
        self.bn8 = nn.BatchNorm2d(512)
        self.conv9 = nn.Conv2d(512, 512, kernel_size=3, padding=1)
        self.relu9 = nn.ReLU()
        self.bn9 = nn.BatchNorm2d(512)
        self.conv10 = nn.Conv2d(512, 512, kernel_size=1, padding=1)
        self.relu10 = nn.ReLU()
        self.bn10 = nn.BatchNorm2d(512)
        self.max_pool4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv11 = nn.Conv2d(512, 512, kernel_size=3, padding=1)
        self.relu11 = nn.ReLU()
        self.bn11 = nn.BatchNorm2d(512)
        self.conv12 = nn.Conv2d(512, 512, kernel_size=3, padding=1)
        self.relu12 = nn.ReLU()
        self.bn12 = nn.BatchNorm2d(512)
        self.conv13 = nn.Conv2d(512, 512, kernel_size=1, padding=1)
        self.relu13 = nn.ReLU()
        self.bn13 = nn.BatchNorm2d(512)
        self.max_pool5 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.fc1 = nn.Linear(512 * 7 * 7, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 1000)

        self.softmax = nn.Softmax(dim=1)

    def forward(self, image):

        x = self.bn1(self.relu1(self.conv1(image)))
        x = self.bn2(self.relu2(self.conv2(x)))
        x = self.max_pool1(x)

        x = self.bn3(self.relu3(self.conv3(x)))
        x = self.bn4(self.relu4(self.conv4(x)))
        x = self.max_pool2(x)

        x = self.bn5(self.relu5(self.conv5(x)))
        x = self.bn6(self.relu6(self.conv6(x)))
        x = self.bn7(self.relu7(self.conv7(x)))
        x = self.max_pool3(x)

        x = self.bn8(self.relu8(self.conv8(x)))
        x = self.bn9(self.relu9(self.conv9(x)))
        x = self.bn10(self.relu10(self.conv10(x)))
        x = self.max_pool4(x)

        x = self.bn11(self.relu11(self.conv11(x)))
        x = self.bn12(self.relu12(self.conv12(x)))
        x = self.bn13(self.relu13(self.conv13(x)))
        x = self.max_pool5(x)

        x = x.view(-1, 512 * 7 * 7)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)

        x = self.softmax(x)

        return x


Then, please calculate the number of parameters and FLOPs (Floating point operations) of **Config C**.
You can only consider the FLOPs of the convolution and FC in **Config C**.


Convolution layer:

conv1: 3x3x3x64=1728 parameters, 3x3x3x64x224x224=86704128 FLOPs

conv2: 3x3x64x64=36864 parameters, 3x3x64x64x224x224=1849688064 FLOPs

conv3: 3x3x64x128=73728 parameters, 3x3x64x128x112x112=924844032 FLOPs

conv4: 3x3x128x128=147456 parameters, 3x3x128x128x112x112=1849688064 FLOPs

conv5: 3x3x128x256=294912 parameters, 3x3x128x256x56x56=924844032 FLOPs

conv6: 3x3x256x256=589824 parameters, 3x3x256x256x56x56=1849688064 FLOPs

conv7: 1x1x256x256=65536 parameters, 1x1x256x256x56x56=205520896 FLOPs

conv8: 3x3x256x512=1179648 parameters, 3x3x256x512x28x28=924844032 FLOPs

conv9: 3x3x512x512=2359296 parameters, 3x3x512x512x28x28=1849688064 FLOPs

conv10: 1x1x512x512=262144 parameters, 1x1x512x512x28x28=205520896 FLOPs

conv11: 3x3x512x512=2359296 parameters, 3x3x512x512x14x14=462422016 FLOPs

conv12: 3x3x512x512=2359296 parameters, 3x3x512x512x14x14=462422016 FLOPs

conv13: 1x1x512x512=262144 parameters, 1x1x512x512x14x14=51380224 FLOPs

FC layer:

fc1: (512x7x7 + 1) x 4096=102764544 parameters, 512x7x7x4096=102760448 FLOPs

fc2: (4096 + 1) x 4096=16781312 parameters, 4096x4096=16777216 FLOPs

fc3: (4096 + 1) x 1000=4097000 parameters, 4096x1000=4096000 FLOPs

Total parameters: 133634728, if we need to take bias into account, then the number of parameters is 133638952.

Total FLOPs: 11770888192





