### 从零开始实现

#### 根据丢弃法的定义，我们很容易实现它。下面的dropout函数将以drop_prob的概率丢弃X中的元素

In [1]:
import torch
import torch.nn as nn
import numpy as np
import d2lzh_pytorch as d2l

def dropout(X,drop_prob):
    X = X.float()
    assert 0 <= drop_prob <=1
    keep_prob = 1 - drop_prob
    #这种情况下把全部元素丢弃
    if keep_prob == 0:
        return torch.zeros_like(X)
    mask = (torch.rand(X.shape) < keep_prob).float()
    return mask * X / keep_prob

#### 我们运行几个例子来测试一下dropout 函数。其中丢弃概率分别为0，0.5 和1

In [2]:
X = torch.arange(16).view(2,8)
dropout(X,0)


tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14., 15.]])

In [20]:
dropout(X,0.5)

tensor([[ 0.,  0.,  0.,  6.,  8.,  0.,  0., 14.],
        [ 0.,  0.,  0., 22.,  0.,  0.,  0.,  0.]])

In [17]:
dropout(X,1.0)

tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]])

### 定义模型

#### 下定义的模型将全连接层和激活函数relu串起来，并对每个激活函数的输出使用丢弃法。我们可以分别设置各个层的丢弃概率。通常的建议是把靠近输入层的丢弃概率设置的小一点，在这个实验中，我们把第一个隐藏层的丢弃概率设置为0.2把第二个

In [23]:
x = (torch.rand(2,8) < 0.1)
print(x)

tensor([[False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False]])


In [27]:
x = (torch.rand(2,8) < 0.5).float()
print(x)

tensor([[1., 1., 1., 1., 1., 0., 1., 0.],
        [0., 1., 0., 1., 1., 0., 0., 1.]])


In [3]:
num_inputs,num_outputs,num_hiddens1,num_hiddens2 = 784,10,256,256

W1 = torch.tensor(np.random.normal(0,0.01,size = (num_inputs,num_hiddens1)),dtype = torch.float,requires_grad = True)
b1 = torch.zeros(num_hiddens1,requires_grad = True)
W2 = torch.tensor(np.random.normal(0,0.01,size = (num_hiddens1,num_hiddens2)),dtype = torch.float,requires_grad = True)
b2 = torch.zeros(num_hiddens2,requires_grad = True)
W3 = torch.tensor(np.random.normal(0,0.01,size = (num_hiddens2,num_outputs)),dtype = torch.float,requires_grad = True)
b3 = torch.zeros(num_outputs,requires_grad = True)

In [5]:
params = [W1,b1,W2,b2,W3,b3]

In [6]:
drop_prob1,drop_prob2 = 0.2,0.5

def net(X,is_training = True):
    X = X.view(-1,num_inputs)
    H1 = (torch.matmul(X,W1) + b1).relu()
    if is_training:    #只在训练模型时使用丢弃法
        H1 = dropout(H1,drop_prob1)
    H2 = (torch.matmul(H1,W2) +b2).relu()
    if is_training:
        H2 = dropout(H2,drop_prob2)
    return torch.matmul(H2,W3) + b3
    

#### 我们在对模型评估的时候不应该进行丢弃，所以我们修改一下d2lzh_pytorch中的evaluate_accuracy函数:

### 训练和测试模型

#### 这部分与之前多层感知机的训练和测试类似

In [7]:
num_epochs,lr,batch_size = 5,100.0,256
loss = torch.nn.CrossEntropyLoss()
train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr)

epoch 1, loss 0.0045, train acc 0.556, test acc 0.650
epoch 2, loss 0.0023, train acc 0.783, test acc 0.763
epoch 3, loss 0.0019, train acc 0.821, test acc 0.822
epoch 4, loss 0.0017, train acc 0.839, test acc 0.820
epoch 5, loss 0.0016, train acc 0.850, test acc 0.833


### 简洁实现

#### 在pytorch中，我们只需要在全连接层后添加dropout层并指定丢弃概率。在训练模型时，dropout层将以指定的丢弃概率随机丢弃上一层的输出元素，在测试模型时 即model.eval()后，dropout层并不发挥作用

In [10]:
net = nn.Sequential(
    d2l.FlattenLayer(),
    nn.Linear(num_inputs,num_hiddens1),
    nn.ReLU(),
    nn.Dropout(drop_prob1),
    nn.Linear(num_hiddens1,num_hiddens2),
    nn.ReLU(),
    nn.Dropout(drop_prob2),
    nn.Linear(num_hiddens2,10)
)
for param in net.parameters():
    nn.init.normal_(param,mean = 0,std = 0.01)

### 下面训练并测试模型

In [11]:
optimizer = torch.optim.SGD(net.parameters(),lr = 0.5)
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,None,None,optimizer)

epoch 1, loss 0.0045, train acc 0.558, test acc 0.750
epoch 2, loss 0.0022, train acc 0.786, test acc 0.812
epoch 3, loss 0.0019, train acc 0.824, test acc 0.812
epoch 4, loss 0.0019, train acc 0.826, test acc 0.804
epoch 5, loss 0.0017, train acc 0.844, test acc 0.801
