## 深度卷积神经网络（AlexNet）

#### 5.6.2 AlexNet

> 1. **`AlexNet`包含8层网络:5层卷积,2层全链接隐层,一个全链接输出层,通道数也大于`LeNet`数十倍**
> 2. **使用`ReLU()`**
> 2. **使用`dropout()`**
> 2. **引入了图像增广,如翻转，裁剪，颜色**

#### 卷积

> `class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)`
> - in_channels(int) – 输入信号的通道
> - out_channels(int) – 卷积产生的通道
> - kerner_size(int or tuple) - 卷积核的尺寸
> - stride(int or tuple, optional) - 卷积步长
> - padding (int or tuple, optional)- 输入的每一条边补充0的层数
> - dilation(int or tuple, optional) – 卷积核元素之间的间距
> - groups(int, optional) – 从输入通道到输出通道的阻塞连接数
> - bias(bool, optional) - 如果bias=True，添加偏置

#### 池化层
> `class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)`
> - kernel_size(int or tuple) - max pooling的窗口大小
> - stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
> - padding(int or tuple, optional) - 输入的每一条边补充0的层数
> - dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
> - return_indices - 如果等于True，会返回输出最大值的序号，对于上采样操作会有帮助
> - ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作

In [1]:
import time
import torch
from torch import nn, optim
import torchvision
import sys
sys.path.append("..")
import d2lzh_pytorch.utils as d2l

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [6]:
class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 96, 11, 4),  # in_channels, out_channels, kernel_size, stride
            nn.ReLU(),
            nn.MaxPool2d(3, 2),  # kernel_size, stride
            # 减小卷积窗口,使用填充为2使输入与输出宽和高一致,且增大输出通道
            nn.Conv2d(96, 256, 5, 1, 2), # in_channels, out_channels, kernel_size, stride, padding
            nn.ReLU(),
            nn.MaxPool2d(3, 2),
            # 连续3个卷积层,且使用更小的的卷积窗口。除了最后一层卷积层,进一步增大输出通道数
            # 前两个卷积层不适用池化层来减小输入的宽和高
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(3, 2)
        )
        
        # 全连接层的输出个数比LeNet大,使用Dropout来减小过拟合
        self.fc = nn.Sequential(
            nn.Linear(256*5*5, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            # 输出10,是由于Fashion-Mnist
            nn.Linear(4096, 10)
        )
        
        def forward(self, img):
            features = self.conv(img)
            output = self.fc(feature.view(img.shape[0], -1))
            return output

In [8]:
net = AlexNet()
net

AlexNet(
  (conv): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU()
    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU()
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=6400, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (

#### 5.6.3 读取数据

In [9]:
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4)

    return train_iter, test_iter

batch_size = 4
train_iter, test_iter = load_data_fashion_mnist(128, resize=2224)

#### 5.6.4 训练数据

In [11]:
lr, num_epochs = 0.001, 5

optimizer = torch.optim.Adam(net.parameters(), lr=lr)
# d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)