# AlexNet
让我们开始学习这个开启一个时代的神经网络
原文地址：
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
![](./img/alexnet.png)

### 要点

我们训练了一个大型深度卷积神经网络，对ImageNet LSVRC-2010竞赛的120万张高分辨率图片进行1000个类别的分类工作。在测试数据上，我们拿到了目前最好的成绩，top-1和top-5错误率分别为37.5%和17.0%。这个网络有6000万个参数，65万个神经元，构建了5个卷积层，其中一些卷积层后面跟了max-pooling层，最后是三个全连接层最后是使用softmax的1000个输出。为了让训练更快，我们使用了不饱和神经元(指的是ReLU)，并对卷积操作进行GPU优化实现。为了消减全连接层的过拟合， 我们采用了一种叫dropout的正则化方法，结果证明非常有效。

We trained our models using stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005. We found that this small amount of weight decay was important for the model to learn.

下面是第一层卷积核学习到的96个feature可视化：

![](./img/alexnetlayer1.png)

### AlexNet的历史贡献
* 凸显了ReLU的重要价值
* 引入了Dropout
* 更深的网络层次

### 本次实现的注意事项
没有进行双GPU实现，所以网络没有进行拆分。现在的GPU的能力可以轻松覆盖。

In [19]:
import torch.nn as nn

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64,11, stride = 4, padding = 2),
            nn.ReLU(inplace = True),
            nn.MaxPool2d(3, stride = 2),
            
            nn.Conv2d(64, 192, 5, padding = 2),
            nn.ReLU(inplace = True),
            nn.MaxPool2d(3, stride = 2),
            
            nn.Conv2d(192, 384, 3, padding = 1),
            nn.ReLU(inplace = True),
            
            nn.Conv2d(384, 256, 3, padding = 1),
            nn.ReLU(inplace = True),
            
            nn.Conv2d(256, 256, 3, padding = 1),
            nn.ReLU(inplace = True),
            nn.MaxPool2d(3, stride = 2)
            
        )
        
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace = True),
            # inplace=True means that it will modify the input directly, 
            # without allocating any additional output. It can sometimes slightly decrease the memory usage, 
            # but may not always be a valid operation (because the original input is destroyed). 
            # However, if you don't see an error, it means that your use case is valid.
            
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace = True),
            
            nn.Linear(4096, 200)
        )
        
    def forward(self, x):
        x = self.cnn(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x

第一次使用ImageNet，这个会很慢，必须上GPU了，同志们！

In [21]:
import argparse
import os
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.trainer as trainer
import torch.utils.trainer.plugins
import torch.utils.data
import torchvision.transforms as transforms
import torchvision.datasets as datasets

model = AlexNet()

# Data loading code
transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                         std = [ 0.229, 0.224, 0.225 ]),
])

data_root = './tiny-imagenet-200'

traindir = os.path.join(data_root, 'train')
valdir = os.path.join(data_root, 'val')
train = datasets.ImageFolder(traindir, transform)
val = datasets.ImageFolder(valdir, transform)
train_loader = torch.utils.data.DataLoader(
    train, batch_size=16, shuffle=True, num_workers=8)

# define Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.001, 0.9)


# pass model, loss, optimizer and dataset to the trainer
t = trainer.Trainer(model, criterion, optimizer, train_loader)

# register some monitoring plugins
t.register_plugin(trainer.plugins.ProgressMonitor())
t.register_plugin(trainer.plugins.AccuracyMonitor())
t.register_plugin(trainer.plugins.LossMonitor())
t.register_plugin(trainer.plugins.TimeMonitor())
t.register_plugin(trainer.plugins.Logger(['progress', 'accuracy', 'loss', 'time']))

# train!
t.run(2)

progress: 1/6250 (0.02%)	accuracy: 0.00%  (0.00%)	loss: 5.2973  (1.5892)	time: 0ms  (0ms)
progress: 2/6250 (0.03%)	accuracy: 0.00%  (0.00%)	loss: 5.3049  (2.7039)	time: 1513ms  (454ms)
progress: 3/6250 (0.05%)	accuracy: 0.00%  (0.00%)	loss: 5.3021  (3.4833)	time: 1351ms  (723ms)
progress: 4/6250 (0.06%)	accuracy: 0.00%  (0.00%)	loss: 5.2988  (4.0280)	time: 1490ms  (953ms)
progress: 5/6250 (0.08%)	accuracy: 0.00%  (0.00%)	loss: 5.2949  (4.4081)	time: 1503ms  (1118ms)
progress: 6/6250 (0.10%)	accuracy: 0.00%  (0.00%)	loss: 5.3015  (4.6761)	time: 1436ms  (1213ms)
progress: 7/6250 (0.11%)	accuracy: 0.00%  (0.00%)	loss: 5.2974  (4.8625)	time: 1508ms  (1302ms)
progress: 8/6250 (0.13%)	accuracy: 0.00%  (0.00%)	loss: 5.3015  (4.9942)	time: 1403ms  (1332ms)
progress: 9/6250 (0.14%)	accuracy: 0.00%  (0.00%)	loss: 5.3031  (5.0869)	time: 1231ms  (1302ms)
progress: 10/6250 (0.16%)	accuracy: 0.00%  (0.00%)	loss: 5.2959  (5.1496)	time: 1236ms  (1282ms)
progress: 11/6250 (0.18%)	accuracy: 0.00%  (0.00

Process Process-28:
Process Process-27:
Process Process-31:
Process Process-26:
Process Process-29:
Process Process-30:
Process Process-25:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/local/anaconda2/envs/aind/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/local/anaconda2/envs/aind/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/local/anaconda2/envs/aind/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/local/anaconda2/envs/aind/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/usr/local/anaconda2/envs/aind/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/local/ana

KeyboardInterrupt: 

In [None]:
这部分是从网上摘下来的很有借鉴意义，先留在这里。

import argparse
import os
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.trainer as trainer
import torch.utils.trainer.plugins
import torch.utils.data
import torchvision.transforms as transforms
import torchvision.datasets as datasets

import resnet

parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
parser.add_argument('--data', metavar='PATH', required=True,
                    help='path to dataset')
parser.add_argument('--arch', '-a', metavar='ARCH', default='resnet18',
                    help='model architecture: resnet18 | resnet34 | ...'
                         '(default: resnet18)')
parser.add_argument('--gen', default='gen', metavar='PATH',
                    help='path to save generated files (default: gen)')
parser.add_argument('--nThreads', '-j', default=2, type=int, metavar='N',
                    help='number of data loading threads (default: 2)')
parser.add_argument('--nEpochs', default=90, type=int, metavar='N',
                    help='number of total epochs to run')
parser.add_argument('--epochNumber', default=1, type=int, metavar='N',
                    help='manual epoch number (useful on restarts)')
parser.add_argument('--batchSize', '-b', default=256, type=int, metavar='N',
                    help='mini-batch size (1 = pure stochastic) Default: 256')
parser.add_argument('--lr', default=0.1, type=float, metavar='LR',
                    help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
                    help='momentum')
parser.add_argument('--weightDecay', default=1e-4, type=float, metavar='W',
                    help='weight decay')
args = parser.parse_args()

if args.arch.startswith('resnet'):
    model = resnet.__dict__[args.arch]()
    model.cuda()
else:
    parser.error('invalid architecture: {}'.format(args.arch))

cudnn.benchmark = True

# Data loading code
transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                         std = [ 0.229, 0.224, 0.225 ]),
])

traindir = os.path.join(args.data, 'train')
valdir = os.path.join(args.data, 'val')
train = datasets.ImageFolder(traindir, transform)
val = datasets.ImageFolder(valdir, transform)
train_loader = torch.utils.data.DataLoader(
    train, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)


# create a small container to apply DataParallel to the ResNet
torch.nn.parallel.data_parallel

# 已经被废弃
class DataParallel(nn.Container):
    def __init__(self):
        super(DataParallel, self).__init__(
            model=model,
        )

    def forward(self, input):
        if torch.cuda.device_count() > 1:
            gpu_ids = range(torch.cuda.device_count())
            return nn.parallel.data_parallel(self.model, input, gpu_ids)
        else:
            return self.model(input.cuda()).cpu()

model = DataParallel()

# define Loss Function and Optimizer
criterion = nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), args.lr, args.momentum)


# pass model, loss, optimizer and dataset to the trainer
t = trainer.Trainer(model, criterion, optimizer, train_loader)

# register some monitoring plugins
t.register_plugin(trainer.plugins.ProgressMonitor())
t.register_plugin(trainer.plugins.AccuracyMonitor())
t.register_plugin(trainer.plugins.LossMonitor())
t.register_plugin(trainer.plugins.TimeMonitor())
t.register_plugin(trainer.plugins.Logger(['progress', 'accuracy', 'loss', 'time']))

# train!
t.run(args.nEpochs)