# 深度卷积神经网络

AlexNet的架构如下：
1. 输入层：224*224的RGB图像
2. 卷积层1：96个11*11的卷积核，步幅为4，填充为2，输出大小为(227-11)/4+1=55，输出通道数为96
3. 最大池化层1：3*3的窗口，步幅为2，输出大小为(55-3)/2+1=27
4. 卷积层2：256个5*5的卷积核，步幅为1，填充为2，输出大小为(27-5)/1+1=23，输出通道数为256
5. 最大池化层2：3*3的窗口，步幅为2，输出大小为(23-3)/2+1=11
6. 卷积层3：384个3*3的卷积核，步幅为1，填充为1，输出大小为(11-3)/1+1=9，输出通道数为384
7. 卷积层4：384个3*3的卷积核，步幅为1，填充为1，输出大小为(9-3)/1+1=7，输出通道数为384
8. 卷积层5：256个3*3的卷积核，步幅为1，填充为1，输出大小为(7-3)/1+1=5，输出通道数为256
9. 最大池化层3：3*3的窗口，步幅为2，输出大小为(5-3)/2+1=3
10. 全连接层1：4096个神经元，使用ReLU激活函数
11. Dropout层1：0.5的概率丢弃神经元
12. 全连接层2：4096个神经元，使用ReLU激活函数
13. Dropout层2：0.5的概率丢弃神经元
14. 全连接层3：10个神经元，输出类别为10

In [1]:
import d2lzh as d2l
from mxnet import gluon, init, nd
from mxnet.gluon import data as gdata, nn
import os
import sys

In [2]:
net=nn.Sequential()
net.add(nn.Conv2D(96, kernel_size=11, strides=4, padding=2, activation='relu'), # 使用11*11的窗口来捕获物体，步幅为4，输出大小为(227-11)/4+1=55，使用填充为2来使得输出大小与输入一致，输出通道数比LeNet多3倍
        nn.MaxPool2D(pool_size=3, strides=2),
        nn.Conv2D(256, kernel_size=5, padding=2, activation='relu'),
        nn.MaxPool2D(pool_size=3, strides=2), # 前两个卷积层之后都使用了最大池化层，以降低输出的维度
        nn.Conv2D(384, kernel_size=3, padding=1, activation='relu'),
        nn.Conv2D(384, kernel_size=3, padding=1, activation='relu'),
        nn.Conv2D(256, kernel_size=3, padding=1, activation='relu'),
        nn.MaxPool2D(pool_size=3, strides=2),
        nn.Dense(4096, activation='relu'),
        nn.Dropout(0.5), # 在第一个全连接层之后添加一个dropout层，以减少过拟合
        nn.Dense(4096, activation='relu'),
        nn.Dropout(0.5),
        nn.Dense(10))

In [5]:
X=nd.random.uniform(shape=(1,1,224,224)) # 输入数据为1张224*224的RGB图像
net.initialize()
for layer in net:
    X=layer(X)
    print(layer.name, 'output shape:\t', X.shape)

conv0 output shape:	 (1, 96, 55, 55)
pool0 output shape:	 (1, 96, 27, 27)
conv1 output shape:	 (1, 256, 27, 27)
pool1 output shape:	 (1, 256, 13, 13)
conv2 output shape:	 (1, 384, 13, 13)
conv3 output shape:	 (1, 384, 13, 13)
conv4 output shape:	 (1, 256, 13, 13)
pool2 output shape:	 (1, 256, 6, 6)
dense0 output shape:	 (1, 4096)
dropout0 output shape:	 (1, 4096)
dense1 output shape:	 (1, 4096)
dropout1 output shape:	 (1, 4096)
dense2 output shape:	 (1, 10)


读取数据

In [7]:
def load_data_face(batch_size,resize=None,root=os.path.join('~','.mxnet','datasets','fashion-mnist')):
    root=os.path.expanduser(root)
    transformer = []
    if resize:
        transformer += [gdata.vision.transforms.Resize(resize)]
    transformer += [gdata.vision.transforms.ToTensor()]
    transformer = gdata.vision.transforms.Compose(transformer)
    minist_train = gdata.vision.FashionMNIST(root=root,train=True)
    minist_test = gdata.vision.FashionMNIST(root=root,train=False)
    num_workers=0 if sys.platform.startswith('win32') else 4
    train_iter = gdata.DataLoader(minist_train.transform_first(transformer),batch_size=batch_size,shuffle=True,num_workers=num_workers)
    test_iter = gdata.DataLoader(minist_test.transform_first(transformer),batch_size=batch_size,shuffle=False,num_workers=num_workers)
    return train_iter,test_iter
batch_size=128
train_iter,test_iter=load_data_face(batch_size,resize=224)

训练

In [8]:
lr, num_epochs ,ctx= 0.01, 5,d2l.try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
d2l.train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)

training on cpu(0)


KeyboardInterrupt: 