<a href="https://colab.research.google.com/github/yeb2Binfang/ECE-GY9143HPML/blob/main/Lab/Lab2/Resnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ResNet

ResNet是我们必须要了解的一个网络

我们总是在想要加深我们的神经网络，但是加深就一定是好的吗？

<img src="https://user-images.githubusercontent.com/68700549/155563707-8d67f8f3-8cbd-4fc7-8736-69978e6cc23e.png"  style="width:500px;height:227px;">

我们来看一下ResNet的核心思想，就是我们每次加深网络，模型就会越来越复杂，那么模型就可能并没有嵌套在原来的模型中，就会像上图左边一样，模型是越来越复杂，但是已经有所偏离，这也就是为什么随着网络的加深，模型可能会有degrade的现象。ResNet想的办法就像是在右边的图，就是有一个identical mapping，就是嵌套原来的模型中，这样去加深网络，效果会更好

## residual block

具体做法就是使用residual block。也就是去拟合残差，从而得到$f(x) = x+ g(x)$的结构。

我们看下面这张图，左边这张图就是普通的NN，右边就是residual block。我们可以看到右边的residual block多了一条线，就是x，这是identical mapping，表示的是拟合残差，右边的那一条线表示的是，即使当前的block学得不好，但是有x那一条线在，就不怕，因为至少保证了跟原来一样

<img src="https://user-images.githubusercontent.com/68700549/155577558-70522e1f-c32d-497f-830c-985e24269145.png"  style="width:500px;height:397px;">

## Residual block details

我们来看看实现的细节，我们可以看到residual block跟普通的nn没有很大的区别，如果有需要变换通道，就用1x1卷积即可
<img src="https://user-images.githubusercontent.com/68700549/155578686-34b47d6d-a56c-46e9-b52d-6bc31e8377f8.png"  style="width:500px;height:397px;">

架构很简单，residual block使得很深的网络更加容易训练，甚至可以徐那脸一千层的网络




In [3]:
import torch
from torch import nn
from torch.nn import functional as F

## ResNet block

In [30]:
class Residual(nn.Module):
  def __init__(self, input_channels, num_channels, use_1x1conv = False, strides = 1):
    super().__init__()
    self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding = 1, stride = strides)
    self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding = 1)

    if use_1x1conv:
      self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride = strides)
    else:
      self.conv3 = None
    
    self.bn1 = nn.BatchNorm2d(num_channels)
    self.bn2 = nn.BatchNorm2d(num_channels)
    # inplace的意思就是省点内存
    self.relu = nn.ReLU(inplace=True)

  def forward(self, X):
    Y = F.relu(self.bn1(self.conv1(X)))
    
    Y = self.bn2(self.conv2(Y))
    if self.conv3:
      X = self.conv3(X)
    Y += X
    return F.relu(Y)


测试一下

In [31]:
blk = Residual(3, 3)
X = torch.rand(4,3,6,6)
Y = blk(X)
Y.shape

torch.Size([4, 3, 6, 6])

In [32]:
blk = Residual(3, 6, use_1x1conv=True, strides=2)
blk(X).shape

torch.Size([4, 6, 3, 3])

## ResNet Model

In [33]:
b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size = 7, stride = 2, padding = 3),
                   # 这个BatchNorm2d(64)的意思就是num_features, 也就是output_channel
                   nn.BatchNorm2d(64), nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride = 2, padding = 1))

In [34]:
def resnet_block(input_channels, num_channels, num_residuals, first_block = False):
  '''
  :param inut_channels: input_channels
  :param num_channels: output channels
  :param num_residuals: number of residual block
  :first_block: first block is special 
  '''
  blk = []
  for i in range(num_residuals):
    # 因为 first block已经有了stride = 2的操作了，所以，按理说第二个block是不用的
    if i == 0 and not first_block:
      blk.append(Residual(input_channels, num_channels, use_1x1conv = True, strides = 2))
    else:
      blk.append(Residual(num_channels, num_channels))
  return blk      

In [35]:
# *的意思是展开的意思，因为resnet_block返回的是list，*就是展开list里面的东西
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

In [36]:
net = nn.Sequential(b1, b2, b3, b4, b5,
                    nn.AdaptiveAvgPool2d((1, 1)),
                    nn.Flatten(), nn.Linear(512, 10))

In [37]:
X = torch.rand(size=(1,1,224,224))
for layer in net:
  X = layer(X)
  print(layer.__class__.__name__, 'output shape:\t', X.shape)

Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 128, 28, 28])
Sequential output shape:	 torch.Size([1, 256, 14, 14])
Sequential output shape:	 torch.Size([1, 512, 7, 7])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 512, 1, 1])
Flatten output shape:	 torch.Size([1, 512])
Linear output shape:	 torch.Size([1, 10])


In [38]:
print(net)

Sequential(
  (0): Sequential(
    (0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
  (1): Sequential(
    (0): Residual(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (1): Residual(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   

## How Residual deal with gradient vanishinh?
