# DeepLearning里的 Hello World

这是一个综合运用部分，我们会先介绍Conv 层和Pooling 层，然后再进一步介绍最基础的LeNet-5，最后再通过加载MNIST来完成DeepLearning界的 Hello World

## 卷积层（Convlution）

### 1.1 为什么需要卷积层？

假如我们不采用卷积层而使用向量表示图片，第一、在图片里面相近的像素在向量里面可能会相差很远，很难在空间中表示出它们。 第二、对于大图片输入，模型也会很大，假设我们的图片是256 $\times$ 256 的， 那么我们的输入层就要有 $ 256 \times 256 \times = 196608 $个，如果又1000张图片，那么就会消耗1GB的内存....太大了。

### 1.2 什么是卷积层？

卷积层跟全连接层类似，只是输入和权重不是做简单的矩阵乘法，而是使用每次作用在一个窗口上的卷积。下图演示了输入是一个$4\times4$矩阵，使用一个$3\times3$的权重，计算得到$2\times2$结果的过程。

每次我们采样一个跟权重一样大小的窗口，让它跟权重做按元素的乘法然后相加。通常我们也是用卷积的术语把这个权重叫kernel或者filter。


![卷积层](../images/no_padding_no_strides.gif)



在PyTorch中，我们可以使用如下代码，完成上述操作


另外在PyTorch中，卷积层有 Conv1d、Conv2d、Conv3d， 这三个分别对应着一维卷积、二维卷积、三维卷积，然后它们的输入数据的格式分别是

Conv1d的输入数据为 (minibatch, in_chanels, iW)

Conv2d的输入数据为 (minibatch, in_chanels, iH, iW)

Conv3d的输入数据为 (minibatch, in_chanels, iT, iH, iW)

In [1]:
# 使用PyTorch的函数来进行卷积操作

import torch
from torch.autograd import Variable
import torch.nn.functional as F

x = Variable(torch.Tensor(range(9)))
x=x.view(1,1,3,3)
print("input", x)

weights = Variable(torch.Tensor([0,1,2,3]))

weights = weights.view(1,1,2,2)

print ("weights:",weights)

bias = Variable(torch.Tensor([1]))

print("bias", bias)

y=F.conv2d(x, weights,bias, padding=0)

print ("y:",y)

input Variable containing:
(0 ,0 ,.,.) = 
  0  1  2
  3  4  5
  6  7  8
[torch.FloatTensor of size 1x1x3x3]

weights: Variable containing:
(0 ,0 ,.,.) = 
  0  1
  2  3
[torch.FloatTensor of size 1x1x2x2]

bias Variable containing:
 1
[torch.FloatTensor of size 1]

y: Variable containing:
(0 ,0 ,.,.) = 
  20  26
  38  44
[torch.FloatTensor of size 1x1x2x2]



In [2]:
# 通过 nn 来构建一个卷积模型，来进行卷积操作

import torch
from torch.autograd import Variable

x = Variable(torch.Tensor(range(9)))
x = x.view(1,1,3,3)

print("input", x)

weights = torch.Tensor([0,1,2,3]).view(1,1,2,2)

weights = torch.nn.Parameter(weights)

print ("weights:",weights)

bias = torch.nn.Parameter(torch.Tensor([1]))

print("bias", bias)

# model

m = torch.nn.Conv2d(in_channels=x.data.size()[1], out_channels=x.data.size()[1], kernel_size=(2,2), stride=1 )

# 设置该卷积层过滤器形状 (out_channels, in_channels, kernel_size[0], kernel_size[1]) 和 权重数值
m.weight = weights

# 设置该卷积层过滤器偏差
m.bias = bias

# 求解X
y = m(x)

print ("y:",y)


input Variable containing:
(0 ,0 ,.,.) = 
  0  1  2
  3  4  5
  6  7  8
[torch.FloatTensor of size 1x1x3x3]

weights: Parameter containing:
(0 ,0 ,.,.) = 
  0  1
  2  3
[torch.FloatTensor of size 1x1x2x2]

bias Parameter containing:
 1
[torch.FloatTensor of size 1]

y: Variable containing:
(0 ,0 ,.,.) = 
  20  26
  38  44
[torch.FloatTensor of size 1x1x2x2]



我们可以控制如何移动窗口，和在边缘的时候如何填充窗口。下图演示了 stride=(2, 2) 和 padding=1。


![stride and padding](../images/padding_strides.gif)


In [3]:
y=F.conv2d(x, weights, bias, stride=(2,2), padding=1)

print("input", x, "\n \n weights:" ,weights, "\n \n bias:", bias, "y:", y)

input Variable containing:
(0 ,0 ,.,.) = 
  0  1  2
  3  4  5
  6  7  8
[torch.FloatTensor of size 1x1x3x3]
 
 
 weights: Parameter containing:
(0 ,0 ,.,.) = 
  0  1
  2  3
[torch.FloatTensor of size 1x1x2x2]
 
 
 bias: Parameter containing:
 1
[torch.FloatTensor of size 1]
 y: Variable containing:
(0 ,0 ,.,.) = 
   1   9
  22  44
[torch.FloatTensor of size 1x1x2x2]



当输入数据有多个通道的时候，我们就不能再使用conv2d了，因为它是针对二维数据而设计。

但是，我们可以使用conv3d，另外每个通道会有对应的权重，然后可以对每个通道做卷积之后再在通道之间求和


$$ conv(data, w, b) = \sum_i conv(data[:,i,:,:], w[0,i,:,:], b) $$



In [4]:
# input data
x = Variable(torch.Tensor(range(18))).view(1,1,2,3,3)

# weight
w = Variable(torch.Tensor(range(8))).view(1,1,2,2,2)

# bias 
b = Variable(torch.Tensor([1]))

# 计算 y

y = F.conv3d(x, w, b)

print("input", x, "\n \n weights:" ,weights, "\n \n bias:", bias, "y:", y)


input Variable containing:
(0 ,0 ,0 ,.,.) = 
   0   1   2
   3   4   5
   6   7   8

(0 ,0 ,1 ,.,.) = 
   9  10  11
  12  13  14
  15  16  17
[torch.FloatTensor of size 1x1x2x3x3]
 
 
 weights: Parameter containing:
(0 ,0 ,.,.) = 
  0  1
  2  3
[torch.FloatTensor of size 1x1x2x2]
 
 
 bias: Parameter containing:
 1
[torch.FloatTensor of size 1]
 y: Variable containing:
(0 ,0 ,0 ,.,.) = 
  269  297
  353  381
[torch.FloatTensor of size 1x1x1x2x2]



当然，如果有需要，我们也可以在让输出的数据变成多通道 (由bias控制着)

$$ conv(data, w, b)[:,i,:,:] = conv(data, w[i,:,:,:], b[i]) $$

In [5]:
# input data
x = Variable(torch.Tensor(range(18))).view(1,1,2,3,3)

# weight
w = Variable(torch.Tensor(range(16))).view(2,1,2,2,2)

# bias 
b = Variable(torch.Tensor([1, 1]))

# 计算 y

y = F.conv3d(x, w, b)

print("input", x, "\n \n weights:" ,w, "\n \n bias:", b, "y:", y)


input Variable containing:
(0 ,0 ,0 ,.,.) = 
   0   1   2
   3   4   5
   6   7   8

(0 ,0 ,1 ,.,.) = 
   9  10  11
  12  13  14
  15  16  17
[torch.FloatTensor of size 1x1x2x3x3]
 
 
 weights: Variable containing:
(0 ,0 ,0 ,.,.) = 
   0   1
   2   3

(0 ,0 ,1 ,.,.) = 
   4   5
   6   7

(1 ,0 ,0 ,.,.) = 
   8   9
  10  11

(1 ,0 ,1 ,.,.) = 
  12  13
  14  15
[torch.FloatTensor of size 2x1x2x2x2]
 
 
 bias: Variable containing:
 1
 1
[torch.FloatTensor of size 2]
 y: Variable containing:
(0 ,0 ,0 ,.,.) = 
   269   297
   353   381

(0 ,1 ,0 ,.,.) = 
   685   777
   961  1053
[torch.FloatTensor of size 1x2x1x2x2]



## 二、池化层（pooling）


因为卷积层每次作用在一个窗口，它对位置很敏感。池化层能够很好的缓解这个问题。它跟卷积类似每次看一个小窗口，然后选出窗口里面最大的元素，或者平均元素作为输出。


![池化层](../images/Max_pooling.png)


跟上面的卷积层一样，我们先使用 torch.nn.functional 来进行池化操作，然后再使用 torch.nn来在模型中构建池化层

In [6]:
import torch
import torch.nn.functional as F
from torch.autograd import Variable
# input data
x = Variable(torch.Tensor(range(16))).view(1, 1, 4, 4)

# kenel_size

kenel_size = (2, 2)

# Max-pooling
y_Max = F.max_pool2d(x, kenel_size)

# average-pooling
y_ave = F.avg_pool2d(x, kenel_size)

print("input", x, "\n  kenel_size:" ,kenel_size,  " \n ", "y_Max:", y_Max, "\n y_ave:", y_ave)



input Variable containing:
(0 ,0 ,.,.) = 
   0   1   2   3
   4   5   6   7
   8   9  10  11
  12  13  14  15
[torch.FloatTensor of size 1x1x4x4]
 
  kenel_size: (2, 2)  
  y_Max: Variable containing:
(0 ,0 ,.,.) = 
   5   7
  13  15
[torch.FloatTensor of size 1x1x2x2]
 
 y_ave: Variable containing:
(0 ,0 ,.,.) = 
   2.5000   4.5000
  10.5000  12.5000
[torch.FloatTensor of size 1x1x2x2]



接下来，我们使用 torch.nn 来构建模型

In [7]:
import torch.nn as nn

x = Variable(torch.Tensor(range(16))).view(1, 1, 4, 4)

# kenel_size

kenel_size = (2, 2)


m_max = nn.MaxPool2d(kenel_size)

m_ave = nn.AvgPool2d(kenel_size)


# 求解

y_Max = m_max(x)

y_ave = m_ave(x)

print("input", x, "\n  kenel_size:" ,kenel_size,  " \n ", "y_Max:", y_Max, "\n y_ave:", y_ave)



input Variable containing:
(0 ,0 ,.,.) = 
   0   1   2   3
   4   5   6   7
   8   9  10  11
  12  13  14  15
[torch.FloatTensor of size 1x1x4x4]
 
  kenel_size: (2, 2)  
  y_Max: Variable containing:
(0 ,0 ,.,.) = 
   5   7
  13  15
[torch.FloatTensor of size 1x1x2x2]
 
 y_ave: Variable containing:
(0 ,0 ,.,.) = 
   2.5000   4.5000
  10.5000  12.5000
[torch.FloatTensor of size 1x1x2x2]



# 三、LeNet-5 模型

接下来我们会使用LeNet模型来处理MNIST数据集。

LeNet-5的模型结构如下图所示：

![LeNet-5](../images/LeNet-5.png)



In [8]:
# 构建模型

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F


class LeNet5(nn.Module):
    def __init__(self, in_dim, n_class):
        super(LeNet5, self).__init__()
        # 从结构图中可以看出，第一层：卷积层输入是1 channel, 输出是 6 channel, kennel_size = (5,5)
        self.conv1 = nn.Conv2d(in_dim, 6, 5, padding=2)
        # 第二层：依旧是 卷积层， 输入 6 channel 输出 6 channel , kennel_size = (5,5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 第三层：全连接层（线性表示）
        self.fc1 = nn.Linear(16*5*5, 120)
        # 第四层：全连接层
        self.fc2 = nn.Linear(120, 84)
        # 第五层：输出层
        self.fc3 = nn.Linear(84, n_class)
    # 向前传播
    def forward(self, x):
        # Subsampling 1 process
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        
        # Subsampling 2 process
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        
        # -1的话，意味着最后的相乘为维数
        x = x.view(-1, self.num_flat_features(x))
        # full connect 1
        x = F.relu(self.fc1(x))
        # full connect 2
        x = F.relu(self.fc2(x))
        # full connect 3
        x = self.fc3(x)
        return x
    
    # 6 channel 卷积层 转全连接层的处理
    def num_flat_features(self, x):
        # 得到 channel * iW * iH 的值
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

In [9]:
leNet = LeNet5(1, 10)
print(leNet)

LeNet5 (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)


数据集是 MNIST dataset

我们会使用 torchvision 来加载数据

In [10]:
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets

# mini-batch
batch_size = 128

# 未下载数据，使用True表示下载数据
DOWNLOAD = False 

train_dataset = datasets.MNIST(
    root='./data', train=True, transform=transforms.ToTensor(), download=DOWNLOAD)

test_dataset = datasets.MNIST(
    root='./data', train=False, transform=transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

现在，我们有了模型，也有了数据集，就让我们开始进行测试

In [11]:
import torch.optim as optim

# hyper-parameters
learning_rate = 0.0001
num_epoches = 2
use_gpu = torch.cuda.is_available()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(leNet.parameters(), lr = learning_rate)
tt = 0
# 开始训练
for epoch in range(num_epoches):
    print('epoch {}'.format(epoch + 1))
    print('*' * 10)
    running_loss = 0.0
    running_acc = 0.0
    for i, data in enumerate(train_loader, 1):
        tt +=1
        img, label = data
        img = Variable(img)
        label = Variable(label)
        # 向前传播
        out = leNet(img)
        loss = criterion(out, label)
        running_loss += loss.data[0] * label.size(0)
        _, pred = torch.max(out, 1)
        num_correct = (pred == label).sum()
        accuracy = (pred == label).float().mean()
        running_acc += num_correct.data[0]
        # 向后传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if i % 300 == 0:
            print('[{}/{}] Loss: {:.6f}, Acc: {:.6f}'.format(
                epoch + 1, num_epoches, running_loss / (batch_size * i),
                running_acc / (batch_size * i)))
print("Done!")

epoch 1
**********
[1/2] Loss: 1.630987, Acc: 0.528724
epoch 2
**********
[2/2] Loss: 0.439152, Acc: 0.871667
Done!


Baseline