# **DenseNet稠密连接**

<div align=center>
<img width="400" src="../image/5.12_densenet.svg"/>
</div>
<div align=center>图5.10 ResNet（左）与DenseNet（右）在跨层连接上的主要区别：使用相加和使用连结</div>

如上图，ResNet和DenseNet最主要的区别是前面的模块传递到后面的模块的连接方式    
DenseNet主要由两部分组成:
- 稠密块(denseblock)：定义输入输出及其连接
- 过渡层(transition layer)：控制通道数目

## **稠密块**

In [1]:
import time
import torch
from torch import nn, optim
import torch.nn.functional as F

import sys
sys.path.append(r'..\utils') 
import d2lzh as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def conv_block(in_channels, out_channels):
    blk = nn.Sequential(nn.BatchNorm2d(in_channels), 
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
    return blk

稠密块由多个conv_block组成，每块的输出通道数相同。前向计算时，我们将每块的输入和输出在通道维连接

In [2]:
class DenseBlock(nn.Module):
    def __init__(self, num_convs, in_channels, out_channels):
        super(DenseBlock, self).__init__()
        net = []
        for i in range(num_convs):
            in_c = in_channels + i * out_channels # 这是每一层conv_block的输入通道数，每一层conv_block都有dense连接
            net.append(conv_block(in_c, out_channels))
        self.net = nn.ModuleList(net)
        self.out_channels = in_channels + num_convs * out_channels
    
    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1)
        return X

In [3]:
blk = DenseBlock(2, 3, 10)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
Y.shape # torch.Size([4, 23, 8, 8])

torch.Size([4, 23, 8, 8])

## **过渡层**

dense连接都会带来通道数的增加，所以我们有必要使用$1 \times 1$卷积层来减少通道数，同时使用步长为2的平均池化来降低高和宽

In [4]:
def transition_block(in_channels, out_channels):
    blk = nn.Sequential(nn.BatchNorm2d(in_channels),
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, kernel_size=1),
                        nn.AvgPool2d(kernel_size=2, stride=2))
    return blk

In [5]:
blk = transition_block(23, 10)
blk(Y).shape

torch.Size([4, 10, 4, 4])

## **DenseNet模型**

DenseNet的前部分和ResNet一样

In [6]:
net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3), 
                    nn.BatchNorm2d(64),
                    nn.ReLU(),
                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

DenseNet使用了4个稠密块。我们暂时将每个稠密块使用的卷积层数为4，每个卷积层的输出通道数为32，这样每个稠密块的通道数将增加128

我们使用过渡层来减少半高和宽

In [7]:
num_channels, growth_rate = 64, 32
num_convs_in_dense_blocks = [4, 4, 4, 4]

for i, num_convs in enumerate(num_convs_in_dense_blocks):
    DB = DenseBlock(num_convs, num_channels, growth_rate)
    net.add_module(f"Dense_block{i}", DB)
    # 上一个稠密块的输出通道数
    num_channels = DB.out_channels
    # 在稠密层之间加入过渡层
    if i != len(num_convs_in_dense_blocks) - 1:
        net.add_module(f"transition_block{i}", transition_block(num_channels, num_channels//2))
        num_channels = num_channels // 2

In [8]:
net.add_module("BN", nn.BatchNorm2d(num_channels))
net.add_module("relu", nn.ReLU())
net.add_module("global_avg_pool", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, num_channels, 1, 1)
net.add_module("fc", nn.Sequential(d2l.FlattenLayer(), nn.Linear(num_channels, 10))) 

In [9]:
net = net.to(device)

In [10]:
X = torch.rand((1, 1, 96, 96)).to(device)
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)

0  output shape:	 torch.Size([1, 64, 48, 48])
1  output shape:	 torch.Size([1, 64, 48, 48])
2  output shape:	 torch.Size([1, 64, 48, 48])
3  output shape:	 torch.Size([1, 64, 24, 24])
Dense_block0  output shape:	 torch.Size([1, 192, 24, 24])
transition_block0  output shape:	 torch.Size([1, 96, 12, 12])
Dense_block1  output shape:	 torch.Size([1, 224, 12, 12])
transition_block1  output shape:	 torch.Size([1, 112, 6, 6])
Dense_block2  output shape:	 torch.Size([1, 240, 6, 6])
transition_block2  output shape:	 torch.Size([1, 120, 3, 3])
Dense_block3  output shape:	 torch.Size([1, 248, 3, 3])
BN  output shape:	 torch.Size([1, 248, 3, 3])
relu  output shape:	 torch.Size([1, 248, 3, 3])
global_avg_pool  output shape:	 torch.Size([1, 248, 1, 1])
fc  output shape:	 torch.Size([1, 10])


In [11]:
import torchsummary
torchsummary.summary(net, (1, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           3,200
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
       BatchNorm2d-5           [-1, 64, 56, 56]             128
              ReLU-6           [-1, 64, 56, 56]               0
            Conv2d-7           [-1, 32, 56, 56]          18,464
       BatchNorm2d-8           [-1, 96, 56, 56]             192
              ReLU-9           [-1, 96, 56, 56]               0
           Conv2d-10           [-1, 32, 56, 56]          27,680
      BatchNorm2d-11          [-1, 128, 56, 56]             256
             ReLU-12          [-1, 128, 56, 56]               0
           Conv2d-13           [-1, 32, 56, 56]          36,896
      BatchNorm2d-14          [-1, 160,

In [12]:
batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)

lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

training on cuda
199.78587877750397
epoch1: loss 0.4260 train_acc 0.8469 test_acc 0.8559
125.52722355723381
epoch2: loss 0.1338 train_acc 0.9006 test_acc 0.8488
108.35170888900757
epoch3: loss 0.0770 train_acc 0.9150 test_acc 0.9073
96.57719483971596
epoch4: loss 0.0515 train_acc 0.9240 test_acc 0.8965
87.70108084380627
epoch5: loss 0.0374 train_acc 0.9310 test_acc 0.8729
