# 个人笔记——多层感知机

softmax回归笔记：https://www.kesci.com/org/boyuai/project/share/54b451b9329f1a38
线性回归笔记：https://www.kesci.com/org/boyuai/project/share/4b72c6515eb2ec7f

不同于先前的线性回归和softmax回归的单层神经网络，多层感知机突出“多层”，更趋近于深度学习的神经网络。多层神经网络除了输入层和输出层外，中间还有一层或多层网络，叫作隐藏层；并且每一层不再是简单的线性函数，在每层线性输出后，需要进行一次非线性变换，实现这个非线性变换的叫作“激活函数”。基本实现过程与单层的差不多。

![Image Name](https://cdn.kesci.com/upload/image/q5mzdvvft9.jpg?imageView2/0/w/960/h/960)

In [13]:
import torch 
from torch import nn
from torch.nn import init
import numpy as np 
import sys 
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

## 从零实现

In [9]:
# 获取、读取数据
batch_size = 256 
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz


26427392it [00:06, 4042212.49it/s]                              


Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:01, 28007.06it/s]                           
0it [00:00, ?it/s]

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


4423680it [00:02, 1657967.72it/s]                            
0it [00:00, ?it/s]

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 8765.19it/s]             

Extracting /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /home/kesci/Datasets/FashionMNIST/FashionMNIST/raw
Processing...
Done!





In [10]:
# 定义模型参数
num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
b1 = torch.zeros(num_hiddens, dtype=torch.float)
W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
b2 = torch.zeros(num_outputs, dtype=torch.float)

params = [W1, b1, W2, b2]
for param in params:
    param.requires_grad_(requires_grad=True)

### 激活函数
常见的激活函数有：
relu函数：max(x,0)
sigmoid函数：1 / (1 + exp(-x))
tanh函数：(1 - exp(-2x)) / (1 + exp(-2x))

In [11]:
# 定义激活函数,这里用relu函数
def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))

# 定义模型
def net(X):
    X = X.view((-1, num_inputs))
    H = relu(torch.matmul(X, W1) + b1)
    return torch.matmul(H, W2) + b2

# 定义损失函数
loss = torch.nn.CrossEntropyLoss()

In [12]:
# 训练模型
num_epochs, lr = 5, 100.0 
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

epoch 1, loss 0.0031, train acc 0.712, test acc 0.797
epoch 2, loss 0.0019, train acc 0.821, test acc 0.834
epoch 3, loss 0.0017, train acc 0.845, test acc 0.843
epoch 4, loss 0.0015, train acc 0.854, test acc 0.835
epoch 5, loss 0.0014, train acc 0.865, test acc 0.853


## 简洁实现

In [14]:
# 定义模型
num_inputs, num_outputs, num_hiddens = 784, 10, 256

net = nn.Sequential(d2l.FlattenLayer(),nn.Linear(num_inputs, num_hiddens),nn.ReLU(),nn.Linear(num_hiddens, num_outputs),)

for params in net.parameters():
    init.normal_(params, mean=0, std=0.01)

In [15]:
# 读取数据
batch_size = 256 
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) 

# 训练数据
loss = torch.nn.CrossEntropyLoss()  # 损失函数
optimizer = torch.optim.SGD(net.parameters(), lr=0.5)  # 模型优化
num_epochs = 5 
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)


epoch 1, loss 0.0032, train acc 0.695, test acc 0.817
epoch 2, loss 0.0019, train acc 0.824, test acc 0.788
epoch 3, loss 0.0017, train acc 0.845, test acc 0.832
epoch 4, loss 0.0015, train acc 0.857, test acc 0.809
epoch 5, loss 0.0014, train acc 0.864, test acc 0.817
