### ResNet

[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385)

*2024/11/12*

The main idea of ResNet is the introduction of residual connections (also known as skip connections). In very deep neural networks, gradients can become extremely small during backpropagation, leading to the vanishing gradient problem. This makes it difficult to train deep networks because the weights in the early layers do not get updated effectively. Residual connections introduce a mechanism to bypass one or more layers, allowing the gradient to flow directly from later layers to earlier layers. This helps in maintaining a stable gradient throughout the network, which effectively addresses the vanishing gradient problem and enables the training of very deep neural networks.

Instead of learning the underlying mapping directly, residual connections allow the network to learn the residual function with respect to the layer inputs. Mathematically, if $F(x)$ is the desired underlying mapping, the residual function $H(x)=F(x)+x$ is learned, where $x$ is the input to the layer. This reformulation simplifies the optimization problem. Learning the residual function is often easier than learning the original function, especially when the original function is close to the identity mapping.

Additionally, ResNet differs from AlexNet or VGG by avoiding the use of large fully connected layers. Instead, it uses smaller convolutional kernels (typically 3x3) and global average pooling layers, significantly reducing the number of parameters. This makes the model consume fewer computational resources and be less prone to overfitting, especially on small datasets.

*Code*

In [1]:
import torch
from torchvision.models import resnet18
from torchsummary import summary

model = resnet18().cuda()
x = torch.randn(1, 3, 224, 224).cuda()
summary(model, (x.squeeze(dim=0).shape))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,