## 5.5 卷积神经网络(LeNet)

![LeNet网络结构](img/5.5_lenet.png)

### 5.5.1 LeNet模型

1. **分为卷积层块和全链接层块**
2. **卷积层基本单位: 卷积层(5x5,且sigmoid激活)+最大池化层(2x2)**
    - 卷积层: 识别图像的空间模型,如线条和物体局部
    - 最大池化: 降低卷积层对位置的敏感性
3. **C1层(卷积层):6@28x28**
    - 特诊图大小: 6个卷积核,得到feature map大小为(32-5+1)x(32-5+1)=28x28
    - 参数个数: 因为权值共享,所以参数个数 (5x5+1)x6 = 156 (5x5为卷积核参数,1为偏置参数)
    - 连接数: 卷积后每个特诊图有28x28个神经元,每个卷积核参数为(5x5+1)x6,所以该层连接数(5x5+1)x6x28x28=122304
4. **S2层(池化层):6@14x14**
    - 特诊图大小: 6x(28/2)x(28/2)=6x14x14
    - 参数个数: 因为窗口为2x2,此时计算方式为 $sigmoid(w(\sum_i^{n=4}(e_i)) + b)$;即将窗口4个元素相加乘以权值w,在加上b,最后做sigmoid激活,所以参数个数为6x(1+1)
    - 连接数: (2x2+1)x6x14x14=5880
5. **C3层(卷积层):6@10x10**
    - 特诊图大小: 6x(14-5+1)x(14-5+1) = 6x10x10
    - 参数个数: (5x5x3+1)x6 + (5x5x4+1)x9 + (5x5x6+1) = 1516 (详解见下)
    - 连接数: 1516x10x10=151600
6. **S4层(池化层):16@5x5**
    - 特诊图大小: 16x(10/2)x(10/2) = 16x5x5
    - 参数个数: 16x(1+1) = 16x2
    - 连接数: 16x(2x2+1)x5x5
7. **C5层(卷积层):120@5x5**
    - 特诊图大小: 120x(5-5+1)x(5-5+1) = 120x1x1
    - 参数个数: 120x(5x5x16+1) = 48120
    - 连接数: 48120x(1x1)
8. **F6层(全连接层):84**
    - 特诊图大小: 84
    - 参数个数: 84x(120+1) = 10164
    - 连接数: 10164x(1x1)
9. **OUTPUT层(输出层):10**
    - 特诊图大小: 84; 使用径向基函数 $y_i = \sum_{j=0}^{n=83}(x_j - w_{ij})^2$,其中,$x_i$是上一层输入$i$取0-9
    - 参数个数: 84x10
    - 连接数: 84x10

#### 5.5.1.1 池化层计算参数

> **池化层就两个参数,W和b,计算是将池化窗口的4个求和最后再通过sigmoid激活**

![池化层](img/5.5.1_lenet.png)

#### 5.5.1.2 LeNet C3层计算方式

> 1. **此过程为S2(6个输入图) -> C3(16个输入图)**
> 2. **C3第一个特征图是由S2中的[0, 1, 2]三个输入卷积得到,依次类推**
> 3. **所以参数个数为 (5x5x3+1)x6 + (5x5x4+1)x9+ (5x5x6+1)**

![C3层](img/5.5.2_lenet.png)

#### 5.5.1.3 LeNet 输出示例图

![C3层](img/5.5.3_lenet.png)

In [22]:
import time
import torch
from torch import nn, optim
import sys
sys.path.append("..")
import d2lzh_pytorch.utils as d2l

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

In [32]:
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 6, 5),  # in_channels, out_channels, kernel_size
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2),  # kernel_size, stride
            nn.Conv2d(6, 16, 5),  # in_channels, out_channels, kernel_size
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2)  # kernel_size, stride
        )
        
        self.fc = nn.Sequential(
            nn.Linear(16*4*4, 120),
            nn.Sigmoid(),
            nn.Linear(120, 84),
            nn.Sigmoid(),
            nn.Linear(84, 10)
        )
    
    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

In [33]:
net = LeNet()
net

LeNet(
  (conv): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Sigmoid()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): Sigmoid()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=120, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)

In [34]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

In [35]:
def evaluate_accuracy(data_iter, net, device=None):
    if device is None and isinstance(net, torch.nn.Module):
        # 没有指定device则使用net的device
        device = list(net.parameters())[0].device
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(net, torch.nn.Module):
                net.eval() # 评估模式,会关闭dropout
                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                net.train() # 改回训练模式
            else: # 自定义模型,
                if 'is_training' in net.__code__.co_varnames: 
                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item()
                else:
                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
            n += y.shape[0]
    return acc_sum / n

In [36]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('training on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

In [37]:
lr,  num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

training on cpu
epoch 1, loss 1.8508, train acc 0.317, test acc 0.587, time 31.0 sec
epoch 2, loss 0.9607, train acc 0.622, test acc 0.679, time 36.6 sec
epoch 3, loss 0.7953, train acc 0.707, test acc 0.726, time 37.4 sec
epoch 4, loss 0.7009, train acc 0.737, test acc 0.744, time 37.1 sec
epoch 5, loss 0.6413, train acc 0.754, test acc 0.756, time 36.6 sec
