# **残差网络ResNet**

## **残差模块**

<div align=center>
<img width="400" src="../image/5.11_residual-block.svg"/>
</div>
<div align=center>图5.9 普通的网络结构（左）与加入残差连接的网络结构（右）</div>

我们把$f(x) = x$称为恒等映射。恒等映射更加易于捕捉数据的细微波动，通过残差块将恒等映射纳入网络可以使得数据更好的向前传播

ResNet采用了全$3 \times 3$的设计。残差块的结构如下所示：
- 两个相同输出通道层的$3 \times 3$的卷积层
- 每个卷积层后面都有一个bn层和relu函数
- 残差块的输入被连接到最后的relu函数之前
- 两个卷积层的输入输出设计是一样的，如果通道数不同需要使用$1 \times 1$进行改变

In [1]:
import torch
from torch import nn, optim
import torch.nn.functional as F

In [2]:
import sys
sys.path.append(r'..\utils') 
import d2lzh as d2l
device = torch.device('cuda')

In [3]:
class Residual(nn.Module):
    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
        super(Residual, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
    def forward(self, x):
        Y = F.relu(self.bn1(self.conv1(x)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            x = self.conv3(x)
        return F.relu(Y + x)

resnet使得X和经过处理的X的特征图长宽一样

In [4]:
blk = Residual(3, 3)
X = torch.rand((4, 3, 224, 224))
blk(X).shape

torch.Size([4, 3, 224, 224])

In [5]:
blk = Residual(3, 3, use_1x1conv=True, stride=2)
blk(X).shape

torch.Size([4, 3, 112, 112])

## **ResNet**模型

ResNet主要分为三部分：
- 输入后面接输出通道为64, 步长为2，$7 \times 7$的卷积核，经过bn和激活后，使用步幅为2的$3 \times 3$最大值池化，都是same模式
- 接下来接了四个由残差块组成的模块，除第一次模块外，其他模块输出通道翻倍，长宽减半
- 最后使用全局平均池化把通道缩减到分类数目

In [6]:
# 由resnet组成的模块
def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    if first_block:
        assert in_channels == out_channels
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            # 当不是第一个模块时，把长宽减半
            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(out_channels, out_channels))
    return nn.Sequential(*blk)

In [7]:
net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                    nn.BatchNorm2d(64),
                    nn.ReLU(),
                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

In [8]:
net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))
net.add_module("resnet_block2", resnet_block(64, 128, 2))
net.add_module("resnet_block3", resnet_block(128, 256, 2))
net.add_module("resnet_block4", resnet_block(256, 512, 2))

In [9]:
net.add_module("global_avg_pool", d2l.GlobalAvgPool2d())
net.add_module("fc", nn.Sequential(d2l.FlattenLayer(), nn.Linear(512, 10)))

In [10]:
X = torch.rand((1, 1, 224, 224))
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)

0  output shape:	 torch.Size([1, 64, 112, 112])
1  output shape:	 torch.Size([1, 64, 112, 112])
2  output shape:	 torch.Size([1, 64, 112, 112])
3  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block1  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block2  output shape:	 torch.Size([1, 128, 28, 28])
resnet_block3  output shape:	 torch.Size([1, 256, 14, 14])
resnet_block4  output shape:	 torch.Size([1, 512, 7, 7])
global_avg_pool  output shape:	 torch.Size([1, 512, 1, 1])
fc  output shape:	 torch.Size([1, 10])


In [11]:
import torchsummary
torchsummary.summary(net.cuda(), (1, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           3,200
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,928
       BatchNorm2d-6           [-1, 64, 56, 56]             128
            Conv2d-7           [-1, 64, 56, 56]          36,928
       BatchNorm2d-8           [-1, 64, 56, 56]             128
          Residual-9           [-1, 64, 56, 56]               0
           Conv2d-10           [-1, 64, 56, 56]          36,928
      BatchNorm2d-11           [-1, 64, 56, 56]             128
           Conv2d-12           [-1, 64, 56, 56]          36,928
      BatchNorm2d-13           [-1, 64, 56, 56]             128
         Residual-14           [-1, 64,

## **正式训练**

In [12]:
net = net.to(device)
batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)

lr, num_epochs = 0.001, 5
optimizer = optim.Adam(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss()

In [14]:
for epoch in range(num_epochs):
    train_l_sum, train_acc_sum, n, batch_count = 0.0, 0.0, 0, 0
    for X, y in train_iter:
        X = X.to(device)
        y = y.to(device)
        y_hat = net(X)
        l = loss(y_hat, y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
        train_l_sum += l.cpu()
        train_acc_sum += (y_hat.argmax(dim=1) == y).float().sum().cpu()
        n += y.shape[0]
        batch_count += 1
    test_acc = d2l.evaluate_accuracy(test_iter, net)
    print(f'epoch{epoch+1}: loss {train_l_sum/batch_count:.4f} train_acc {train_acc_sum / n:.4f} test_acc {test_acc:.4f}')

epoch1: loss 0.3867 train_acc 0.8572 test_acc 0.8968
epoch2: loss 0.2491 train_acc 0.9080 test_acc 0.8980
epoch3: loss 0.2105 train_acc 0.9232 test_acc 0.9032
epoch4: loss 0.1828 train_acc 0.9323 test_acc 0.8743
epoch5: loss 0.1616 train_acc 0.9395 test_acc 0.9147
