### 5.7 使用重复元素的网络（VGG）

### 5.7.1 VGG块

> 1. **对于给定的感受野(与输出有关的输入图片大小),采用堆积的小卷积核优于采用大的卷积核,可以增加网络深度来保证学习更复杂的模型,且参数更少**
> 2. **如3个3x3卷积核代替7x7卷积核,使用2个3x3卷积核来代替5x5卷积核**

In [1]:
import time
import torch
from torch import nn, optim

import sys
sys.path.append("..") 
import d2lzh_pytorch.utils as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [2]:
def vgg_block(num_convs, in_channels, out_channels):
    blk = []
    for i in range(num_convs):
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU())
    blk.append(nn.MaxPool2d(kernel_size=2, stride=2))  # 这里使宽高减半
    return nn.Sequential(*blk)  # 将blk转为元祖

### 5.7.2 VGG网络

> 1. **构造一个VGG网络,有5个卷积块,前两个使用单层卷积层,后三块使用双卷积层**
> 2. **第一块的输入输出通道分别为1(因为F-Mnist通道为1)和64**
> 3. **之后每次对输出通道数翻倍,直到变为512**
> 4. **8个卷积层和3个全连接层,所以称为VGG-11**

In [3]:
# (num_convs, in_channels, out_channels)
conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))
# 经过5次vgg_block，宽高减半5次，变为224/32=7
fc_features = 512 * 7 * 7  # c * w * h
fc_hidden_units = 4096  # 任意

In [4]:
def vgg(conv_arch, fc_features, fc_hidden_units=4096):
    net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        net.add_module("vgg_block_" + str(i+1), vgg_block(num_convs, in_channels, out_channels))
    net.add_module('fc', nn.Sequential(
        d2l.FlattenLayer(),
        nn.Linear(fc_features, fc_hidden_units),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(fc_hidden_units, fc_hidden_units),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(fc_hidden_units, 10)
    ))
    return net

In [5]:
# 构造(244, 244)的单通道数据样本观察每一层的输出形状
net = vgg(conv_arch, fc_features, fc_hidden_units)

X = torch.rand(1, 1, 244, 244)
for name, blk in net.named_children():
    X = blk(X)
    print(name, 'output shape: ', X.shape)

vgg_block_1 output shape:  torch.Size([1, 64, 122, 122])
vgg_block_2 output shape:  torch.Size([1, 128, 61, 61])
vgg_block_3 output shape:  torch.Size([1, 256, 30, 30])
vgg_block_4 output shape:  torch.Size([1, 512, 15, 15])
vgg_block_5 output shape:  torch.Size([1, 512, 7, 7])
fc output shape:  torch.Size([1, 10])


### 5.7.3 获取数据和训练模型

In [6]:
# 让通道数变小,方便测试
ratio = 8
small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio), 
                   (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]
net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)
print(net)

Sequential(
  (vgg_block_1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_3): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_4): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [7]:
batch_size = 32
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

In [9]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
# d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)