# **NiN网络中的网络**

**思想**:串联多个由卷积层和全连接层构成的小网络来构建一个深层网络

## **NiN块**

NiN使用1维卷积来代替全连接，这样的话就避免了展开的操作，使得通道信息能够自然的传递给后面的层     
卷积层->1维卷积->卷积层->一维卷积     

In [1]:
import torch 
from torch import nn, optim

In [2]:
device = torch.device('cuda')

In [3]:
def nin_block(in_channels, out_channels, kernel_size, stride, padding):
    blk = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding),
                        nn.ReLU(),
                        nn.Conv2d(out_channels, out_channels, kernel_size=1),
                        nn.ReLU(),
                        nn.Conv2d(out_channels, out_channels, kernel_size=1),
                        nn.ReLU())
    return blk

## **NiN模型**

NiN模型没有使用三层全连接层来输出，反之在最后将通道数缩减到和分类数目一样，使用平均池化的方式将长宽维度缩减到一维

In [5]:
import sys
sys.path.append(r'C:\D\ProgramFile\jupyter\torch_learn\dive_to_dp\utils') 
import d2lzh as d2l

In [7]:
net = nn.Sequential(nin_block(1, 96, kernel_size=11, stride=4, padding=0),
                    nn.MaxPool2d(kernel_size=3, stride=2),
                    nin_block(96, 256, kernel_size=5, stride=1, padding=2),
                    nn.MaxPool2d(kernel_size=3, stride=2),
                    nin_block(256, 384, kernel_size=3, stride=1, padding=1),
                    nn.MaxPool2d(kernel_size=3, stride=2),
                    # 标签类别为10
                    nin_block(384, 10, kernel_size=3, stride=1, padding=1),
                    nn.AvgPool2d(kernel_size=5),
                    d2l.FlattenLayer())

In [8]:
net = net.cuda()
import torchsummary
torchsummary.summary(net, (1, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 96, 54, 54]          11,712
              ReLU-2           [-1, 96, 54, 54]               0
            Conv2d-3           [-1, 96, 54, 54]           9,312
              ReLU-4           [-1, 96, 54, 54]               0
            Conv2d-5           [-1, 96, 54, 54]           9,312
              ReLU-6           [-1, 96, 54, 54]               0
         MaxPool2d-7           [-1, 96, 26, 26]               0
            Conv2d-8          [-1, 256, 26, 26]         614,656
              ReLU-9          [-1, 256, 26, 26]               0
           Conv2d-10          [-1, 256, 26, 26]          65,792
             ReLU-11          [-1, 256, 26, 26]               0
           Conv2d-12          [-1, 256, 26, 26]          65,792
             ReLU-13          [-1, 256, 26, 26]               0
        MaxPool2d-14          [-1, 256,

## **获取数据和训练模型**

In [9]:
batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

In [10]:
loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.002)

In [11]:
nums_epochs = 5
for epoch in range(nums_epochs):
    train_l_sum, train_acc_sum, n, batch_count = 0.0, 0.0, 0, 0
    for X, y in train_iter:
        X = X.to(device)
        y = y.to(device)
        y_hat = net(X)
        l = loss(y_hat, y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
        train_l_sum += l.cpu()
        train_acc_sum += (y_hat.argmax(dim=1) == y).float().sum().cpu()
        n += y.shape[0]
        batch_count += 1
    test_acc = d2l.evaluate_accuracy(test_iter, net)
    print(f'epoch{epoch + 1}: train_loss {train_l_sum / batch_count :.4f} train_acc {train_acc_sum/n:.3f} test_acc {test_acc:.3f}')

epoch1: train_loss	1.7199	rain_acc	0.392	test_acc0.515


KeyboardInterrupt: 