# LeNet

接下来的几天我们将主要精力用来重温以下经典。
![](./img/history.png)

卷积神经网络的鼻祖，Yan Lecun在1998年提出来的，卷积神经网络，其中最经典的当属LeNet5，这个网络其实一点也不简单，我们来实现一下，通过这个我们会对卷积结构有更深刻的理解

![](./img/lenet.jpg)

In [34]:
import torch
from torch.autograd import Variable
import torch.nn as nn
from torchvision import transforms, datasets

# load mnist
def minst_loader():
    # load mnist
    transform = transforms.Compose([
        transforms.Scale((32, 32)),
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
    ])  

    train_set = datasets.MNIST('./data', train = True, download = True, transform = transform)
    test_set = datasets.MNIST('./data', train = False, download = True, transform = transform)

    train_loader = torch.utils.data.DataLoader(train_set, batch_size = 32, shuffle = True)
    test_loader = torch.utils.data.DataLoader(train_set, shuffle = False)
    return (train_loader, test_loader)


In [75]:
import torch.optim as optim
import numpy as np
def train(data_loader, n_epoch, net, criterion, lr = 0.01):
    optimizer = optim.SGD(net.parameters(), lr = 0.01, momentum = 0.5)

    for epoch in range(n_epoch):
        loss_arr = []
        accuracy_arr = []
        print("------ epoch: %d Start--------" % (epoch))   
        for i, data in enumerate(data_loader, 0): # 0的意思是无论重跑几次，都从0开始迭代，避免多次试验时不稳定的问题
            inputs, labels = data
            inputs, labels = Variable(inputs), Variable(labels)
            
            optimizer.zero_grad() # 等同于net.zero_grad()
            outputs = net(inputs)
            
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
        
            _, predicted = torch.max(outputs, 1) # 注意一定是dim = 1，0是batch
            acc = (predicted == labels).long().sum()
            total = labels.size(0)

            loss = loss.data[0]
            acc = acc.data[0] * 100 / total
            
            loss_arr.append(loss)
            accuracy_arr.append(acc)
            if i % 100 == 0:
                print('iter: %d, Loss: %.2f, Accuracy: %.2f %%' % (i, loss, acc))
          
        print("------ epoch: %d, Loss: %.2f, Accuracy: %.2f %%--------" % (epoch, np.sum(loss_arr) / len(loss_arr), np.sum(accuracy_arr) / len(accuracy_arr)))   

### 网络模型

- 将输入图像28x28x1转成32x32x1.

- 卷积层一：C1: 输出为 28x28x6. [ kernal_size = 5, feature_size = 6, stride = 1 ]
- 激活一：activation1: 自由选择.
- 池化层一S2：输出为 14 x 14 x 6.  [ 2 x 2 ] 尊重原著称为下采样层 downsampling


- 卷积层二：C3: 输出 10x10x16. [ kernal_size = 5, feature_size = 16, stride = 1 ]
- 激活二：activations2: 自由选择
- 池化层二S4：输出为5 x 5 x 16 尊重原著称为下采样层 downsampling S2

6是如何变成16的呢？这块会让初学者困惑，参见下图：
![](./img/lenets2c3.png)
6个feature map与S2层相连的3个feature map相连接。
6个feature map与S2层相连的4个feature map相连接。
3个feature map与S2层部分不相连的4个feature map相连接。
1个与S2层的所有feature map相连。

这里面的核心问题是，一个feature map如何与多个feature map相连呢？
![](./img/convconv.png)
feature之间类似于全连接关系，每个后置feature的一个点，对应之前多个feature的卷积后再加权复合。

对于LeNet5，S2与C3的连接参数计算方法是：每个卷积核大小是5x5，25个点位，三个feature并入一个，则3->6的参数量为：6 x (5 x 5 x 3 + 1)
同理，4->6：6 x (5 x 5 x 4 + 1)，4->3：3 x (5 x 5 x 4 + 1)，6->1：1 x (5 x 5 x 6 + 1)，共计1516个参数，输出16个大小为10 * 10的feature

- 维度变换：三维变一维， 输出为：
- 卷积层三C5： 输出为120  

#### 为什么叫C5呢？从图上看明明是全连接

由于S4的作用，输出为5x5x16所以，本层再使用5x5的卷积核时，发生了一个神奇的现象，只能卷积一下，形成一个1x1的图，使用了120个卷积核，因而有16->120的卷积映射关系，共有参数120 x (5x5x16 + 1) = 48120个参数。输出也就是120 x 1的类似全连接层的效果了，平展开就像一个全连接层。

- 全连接层一的激活：activation3

- 全连接层二F6： 输出为84
- 全连接层二的激活: activation4

- 全连接层三： 输出为10 (RBF，欧式径向基函数)


参考资料：
- http://blog.csdn.net/zhangjunhit/article/details/53536915
- https://github.com/feiyuhug/lenet-5/blob/master/covnet.py

# 特别说明
鉴于98年的时候计算力比较落后，S2到C3的复杂设计，我们认为是落后的，另外由于比较难用pytorch实现，我们发现大部分的lenet5实现，都将S2-C3的连接关系变成了全连接。
本次实现采用偷懒做法。将来有时间尝试复杂的做法，也相当于探索pytorch的灵活性问题。
TODO：忠实原著的实现方式

In [86]:
import torch.nn.functional as F
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.c1 = nn.Conv2d(1, 6, 5)
        self.s2 = nn.MaxPool2d(2)
        self.c3 = nn.Conv2d(6, 16, 5)
        self.s4 = nn.MaxPool2d(2)
        self.c5 = nn.Conv2d(16, 120, 5)
        self.f6 = nn.Linear(120, 84)
        self.output = nn.Linear(84, 10)
    
    def forward(self, x):
        x = self.s2(F.sigmoid(self.c1(x)))
        x = self.s4(F.sigmoid(self.c3(x)))
        x = F.sigmoid(self.c5(x))
        x = x.view(x.size(0), -1)
        x = F.sigmoid(self.f6(x))
        x = self.output(x)
        return x
    

In [87]:
net = LeNet5()
train_loader, test_loader = minst_loader()
criterion = nn.CrossEntropyLoss()

train(train_loader, 1, net, criterion, lr = 0.0001)

------ epoch: 0 Start--------
iter: 0, Loss: 2.23, Accuracy: 28.12 %
iter: 100, Loss: 2.30, Accuracy: 9.38 %
iter: 200, Loss: 2.28, Accuracy: 12.50 %
iter: 300, Loss: 2.31, Accuracy: 15.62 %
iter: 400, Loss: 2.31, Accuracy: 6.25 %
iter: 500, Loss: 2.32, Accuracy: 3.12 %
iter: 600, Loss: 2.32, Accuracy: 9.38 %
iter: 700, Loss: 2.34, Accuracy: 9.38 %
iter: 800, Loss: 2.31, Accuracy: 12.50 %
iter: 900, Loss: 2.31, Accuracy: 9.38 %
iter: 1000, Loss: 2.26, Accuracy: 18.75 %
iter: 1100, Loss: 2.29, Accuracy: 3.12 %
iter: 1200, Loss: 2.32, Accuracy: 6.25 %
iter: 1300, Loss: 2.31, Accuracy: 6.25 %
iter: 1400, Loss: 2.29, Accuracy: 12.50 %
iter: 1500, Loss: 2.28, Accuracy: 12.50 %
iter: 1600, Loss: 2.32, Accuracy: 6.25 %
iter: 1700, Loss: 2.32, Accuracy: 3.12 %
iter: 1800, Loss: 2.33, Accuracy: 6.25 %
------ epoch: 0, Loss: 2.30, Accuracy: 10.63 %--------


你会发现sigmoid根本收敛不下来，非常难以训练，我们换成relu试试

In [88]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.c1 = nn.Conv2d(1, 6, 5)
        self.s2 = nn.MaxPool2d(2)
        self.c3 = nn.Conv2d(6, 16, 5)
        self.s4 = nn.MaxPool2d(2)
        self.c5 = nn.Conv2d(16, 120, 5)
        self.f6 = nn.Linear(120, 84)
        self.output = nn.Linear(84, 10)
    
    def forward(self, x):
        x = self.s2(F.relu(self.c1(x)))
        x = self.s4(F.relu(self.c3(x)))
        x = F.relu(self.c5(x))
        x = x.view(x.size(0), -1)
        x = F.relu(self.f6(x))
        x = self.output(x)
        return x
net = LeNet5()
train_loader, test_loader = minst_loader()
criterion = nn.CrossEntropyLoss()

train(train_loader, 1, net, criterion, lr = 0.01)

------ epoch: 0 Start--------
iter: 0, Loss: 2.32, Accuracy: 0.00 %
iter: 100, Loss: 2.19, Accuracy: 34.38 %
iter: 200, Loss: 0.62, Accuracy: 75.00 %
iter: 300, Loss: 0.26, Accuracy: 90.62 %
iter: 400, Loss: 0.48, Accuracy: 87.50 %
iter: 500, Loss: 0.25, Accuracy: 90.62 %
iter: 600, Loss: 0.64, Accuracy: 81.25 %
iter: 700, Loss: 0.17, Accuracy: 96.88 %
iter: 800, Loss: 0.09, Accuracy: 100.00 %
iter: 900, Loss: 0.05, Accuracy: 100.00 %
iter: 1000, Loss: 0.29, Accuracy: 90.62 %
iter: 1100, Loss: 0.10, Accuracy: 96.88 %
iter: 1200, Loss: 0.52, Accuracy: 90.62 %
iter: 1300, Loss: 0.16, Accuracy: 93.75 %
iter: 1400, Loss: 0.01, Accuracy: 100.00 %
iter: 1500, Loss: 0.05, Accuracy: 100.00 %
iter: 1600, Loss: 0.20, Accuracy: 90.62 %
iter: 1700, Loss: 0.13, Accuracy: 90.62 %
iter: 1800, Loss: 0.22, Accuracy: 93.75 %
------ epoch: 0, Loss: 0.38, Accuracy: 88.27 %--------


可以说瞬间就上去了， relu真的是个很牛逼的东西