## 什么是PyTorch

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

### Tensors

Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。

In [1]:
import torch

构造一个未初始化的5x3矩阵:

In [18]:
x = torch.empty(5,3)
x

tensor([[2.8251e+05, 7.8893e-43, 2.8251e+05],
        [7.8893e-43, 2.8251e+05, 7.8893e-43],
        [2.8251e+05, 7.8893e-43, 2.8251e+05],
        [7.8893e-43, 2.8251e+05, 7.8893e-43],
        [2.8251e+05, 7.8893e-43, 2.8251e+05]])

构建一个随机初始化的矩阵: torch.rand是均匀分布，torch.randn是标准正态分布

In [19]:
x = torch.rand(5,3)
x

tensor([[0.7054, 0.0115, 0.9597],
        [0.9421, 0.4642, 0.4273],
        [0.5290, 0.8471, 0.4369],
        [0.3095, 0.1544, 0.9498],
        [0.0889, 0.9291, 0.5059]])

构建一个全部为0，类型为long的矩阵:

In [20]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [21]:
x = torch.zeros(5,3).long()  # 强制类型转换
x.dtype

torch.int64

从数据直接直接构建tensor:

In [22]:
x = torch.tensor([5.5,3]) # 1*2的矩阵
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [23]:
x = x.new_ones(5,3, dtype=torch.double) # 
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [24]:
x = torch.randn_like(x, dtype=torch.float) # 随机产生跟x形状相同的tensor
x

tensor([[ 0.5924,  0.9863,  1.0319],
        [ 0.4186, -0.8391, -0.0890],
        [-1.7492, -0.2189,  0.3351],
        [ 1.1020, -1.2133,  1.4746],
        [-0.1461,  1.0083,  0.8271]])

得到tensor的形状:

In [25]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。

In [26]:
y = torch.rand(5,3)
y

tensor([[0.9765, 0.4238, 0.2948],
        [0.7273, 0.9435, 0.3466],
        [0.6223, 0.9892, 0.7007],
        [0.1638, 0.5064, 0.0386],
        [0.5196, 0.8638, 0.3709]])

In [27]:
x + y # 两个tensor直接相加

tensor([[ 1.5689,  1.4100,  1.3267],
        [ 1.1459,  0.1044,  0.2576],
        [-1.1269,  0.7703,  1.0358],
        [ 1.2658, -0.7070,  1.5132],
        [ 0.3736,  1.8721,  1.1980]])

另一种着加法的写法

In [28]:
torch.add(x, y)  # tensor相加函数

tensor([[ 1.5689,  1.4100,  1.3267],
        [ 1.1459,  0.1044,  0.2576],
        [-1.1269,  0.7703,  1.0358],
        [ 1.2658, -0.7070,  1.5132],
        [ 0.3736,  1.8721,  1.1980]])

加法：把输出作为一个变量

In [29]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 1.5689,  1.4100,  1.3267],
        [ 1.1459,  0.1044,  0.2576],
        [-1.1269,  0.7703,  1.0358],
        [ 1.2658, -0.7070,  1.5132],
        [ 0.3736,  1.8721,  1.1980]])

in-place加法

In [30]:
y.add_(x)  # 直接在y上做更改，也就是返回值放到y中
y

tensor([[ 1.5689,  1.4100,  1.3267],
        [ 1.1459,  0.1044,  0.2576],
        [-1.1269,  0.7703,  1.0358],
        [ 1.2658, -0.7070,  1.5132],
        [ 0.3736,  1.8721,  1.1980]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。

In [31]:
x[1:, 1:] # 切片

tensor([[-0.8391, -0.0890],
        [-0.2189,  0.3351],
        [-1.2133,  1.4746],
        [ 1.0083,  0.8271]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [32]:
x = torch.randn(4,4)
y = x.view(16)   # 将x转为16维的tensor
z = x.view(-1,8) # 任何维度写成-1就会自动识别具体数值
z

tensor([[-0.0205,  0.5863, -1.3809, -0.7456, -2.0674, -0.7317,  2.5107, -0.1470],
        [-0.8988, -1.4296,  1.7932,  0.3143, -0.4821,  0.1191,  1.0049,  0.2906]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [33]:
x = torch.randn(1)
x

tensor([-1.2214])

In [34]:
x.item()  # tensor ==> python

-1.2214466333389282

In [35]:
z.transpose(1,0)  # 矩阵的维度互换

tensor([[-0.0205, -0.8988],
        [ 0.5863, -1.4296],
        [-1.3809,  1.7932],
        [-0.7456,  0.3143],
        [-2.0674, -0.4821],
        [-0.7317,  0.1191],
        [ 2.5107,  1.0049],
        [-0.1470,  0.2906]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

### Numpy和Tensor之间的转化

在Torch Tensor和NumPy array之间相互转化非常容易。

**Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。**

把Torch Tensor转变成NumPy Array

In [2]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [3]:
b = a.numpy()    # tensor ===> numpy
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [4]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [5]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [6]:
import numpy as np

In [7]:
a = np.ones(5)
b = torch.from_numpy(a)    # numpy ===> tensor
np.add(a, 1, out=a)        # numpy的相加函数
print(a)

[2. 2. 2. 2. 2.]


In [8]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

### CUDA Tensors

使用``.to``方法，Tensor可以被移动到别的device上。

In [36]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)    # 把tensor搬到GPU
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))    # 把tensor从GPU搬回到CPU
    

tensor([-0.2214], device='cuda:0')
tensor([-0.2214], dtype=torch.float64)


In [37]:
# GPU上的tensor不可以转为numpy，所以必搬回到CPU才能转为numpy，因为numpy是在CPU的
y.to("cpu").data.numpy()
y.cpu().data.numpy()

array([1.], dtype=float32)

In [None]:
# 把model搬到cuda
model = model.cuda()

### 用numpy实现两层神经网络

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。

In [10]:
# 64个训练数据, 输入是1000维, 隐藏层是100维, 输出是10维
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据,正态分布
x = np.random.randn(N, D_in)    # 64个训练数据,每个维度是1000
y = np.random.randn(N, D_out)   # 输出是10维

w1 = np.random.randn(D_in, H)   # 输入层 ==> 隐藏层
w2 = np.random.randn(H, D_out)  # 隐藏层 ==> 输出层

learning_rate = 1e-6            # 学习率
for it in range(500):
    # Forward pass
    h = x.dot(w1)               # N * H
    h_relu = np.maximum(h, 0)   # N * H
    y_pred = h_relu.dot(w2)     # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 31589010.97757206
1 32032361.415442623
2 36532286.92206487
3 38370295.270497814
4 32607056.55221738
5 20820217.776104927
6 10550873.812148217
7 4884393.869130288
8 2492604.7173278416
9 1525314.8289688863
10 1093955.1952010193
11 861240.2854857739
12 710396.2568568155
13 599738.5452808375
14 512760.62678585073
15 441878.9869910692
16 383086.90872362204
17 333786.5498639076
18 292110.2385405301
19 256550.88948888215
20 226137.89787798238
21 200003.8651590041
22 177469.78649526235
23 157927.17434944655
24 140922.51610387655
25 126068.12137287592
26 113057.2017217832
27 101609.05986083572
28 91507.39374227289
29 82560.44391924681
30 74621.71579432023
31 67571.28595227773
32 61289.69514277215
33 55677.382032707894
34 50651.41427469961
35 46147.40230452048
36 42097.54446727611
37 38451.06413643714
38 35160.844602185316
39 32188.421539585855
40 29500.32697434475
41 27058.404850120576
42 24841.87571415577
43 22832.064123187614
44 21007.401921595723
45 19347.685780595522
46 17834.335648431414

358 0.0020136527013626046
359 0.0019245706631653133
360 0.0018395008358377683
361 0.0017581383060139802
362 0.001680376752287538
363 0.001606052153033938
364 0.0015350380863258764
365 0.0014672169929995595
366 0.0014023423249155441
367 0.0013403456065524994
368 0.001281116845712786
369 0.0012244926890471483
370 0.0011703752264242066
371 0.0011186955919421245
372 0.0010693017465316092
373 0.0010220938779933139
374 0.0009769594926719825
375 0.0009338044349611398
376 0.0008925746643684667
377 0.0008531598459401686
378 0.0008154866829858569
379 0.000779484668504332
380 0.0007450799387795288
381 0.0007122107663852029
382 0.0006807724738051952
383 0.000650765381256051
384 0.0006220513425231174
385 0.0005946014935396558
386 0.0005683683001121029
387 0.0005433039239757075
388 0.0005193400989004649
389 0.0004964472150734609
390 0.00047455419439608037
391 0.0004536382024296203
392 0.0004336349315978706
393 0.0004145164088530977
394 0.00039624858300219435
395 0.0003787954326159242
396 0.000362097

### PyTorch: Tensors

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。

In [11]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1)             # N * H, 矩阵乘法
    h_relu = h.clamp(min=0)  # N * H, 下限是0
    y_pred = h_relu.mm(w2)   # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()   # tensor转为python表示
    print(it, loss)
    
    # Backward pass 
    # compute the gradient (手动)
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 34576312.0
1 32510804.0
2 34094456.0
3 33462306.0
4 27375170.0
5 17991892.0
6 9881738.0
7 5087395.5
8 2782959.5
9 1734153.25
10 1228542.625
11 950661.5
12 773756.625
13 647202.75
14 549723.5625
15 471403.875
16 406932.3125
17 353084.46875
18 307790.5
19 269375.8125
20 236593.90625
21 208498.21875
22 184310.5625
23 163393.125
24 145251.8125
25 129458.296875
26 115659.046875
27 103561.484375
28 92932.5625
29 83571.8203125
30 75300.2109375
31 67973.671875
32 61469.37109375
33 55681.40625
34 50518.44140625
35 45905.5390625
36 41776.42578125
37 38071.5234375
38 34743.53125
39 31748.9765625
40 29049.58984375
41 26611.625
42 24406.646484375
43 22409.712890625
44 20598.01171875
45 18953.95703125
46 17459.67578125
47 16098.291015625
48 14856.2099609375
49 13722.8232421875
50 12687.0947265625
51 11739.0966796875
52 10870.513671875
53 10073.9970703125
54 9342.740234375
55 8670.7529296875
56 8052.81787109375
57 7483.79443359375
58 6959.6083984375
59 6476.2353515625
60 6030.0576171875
61 5618.171

372 0.0014411802403628826
373 0.001391457743011415
374 0.0013453233987092972
375 0.0012995358556509018
376 0.0012558529851958156
377 0.0012115391436964273
378 0.0011727800592780113
379 0.0011330812703818083
380 0.0010954414028674364
381 0.001058986410498619
382 0.0010246583260595798
383 0.0009920638985931873
384 0.000960951205343008
385 0.0009319901582784951
386 0.0009013290982693434
387 0.0008743526414036751
388 0.0008464703569188714
389 0.0008192691020667553
390 0.0007957531488500535
391 0.0007711197249591351
392 0.000747534038964659
393 0.0007259614649228752
394 0.000703289988450706
395 0.0006827063043601811
396 0.0006629342096857727
397 0.0006426958134397864
398 0.0006252044113352895
399 0.000607246533036232
400 0.0005891717737540603
401 0.0005724278744310141
402 0.0005563916638493538
403 0.000541627814527601
404 0.0005272082053124905
405 0.0005129955243319273
406 0.0004990817396901548
407 0.0004859707551077008
408 0.0004727276391349733
409 0.000460826006019488
410 0.00044788033119

简单的autograd

In [12]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b     # y = 2*1+3

y.backward()    # 

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)

tensor(1.)
tensor(2.)
tensor(1.)


### PyTorch: Tensor和autograd  

**PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。**

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [13]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)   # 模型参数W1需要gradient
w2 = torch.randn(H, D_out, requires_grad=True)  # 模型参数W2需要gradient

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()                  # 自动求导
    
    # update weights of w1 and w2
    with torch.no_grad():              # 不会把W1和W2的gradients记住,节约内存
        w1 -= learning_rate * w1.grad  # 梯度下降
        w2 -= learning_rate * w2.grad  # 
        w1.grad.zero_()                # gradient必须清零, 否则gradient会不断累加
        w2.grad.zero_()                # 

0 30404994.0
1 25766830.0
2 24082324.0
3 21928268.0
4 18106224.0
5 13207399.0
6 8685804.0
7 5391984.5
8 3338013.0
9 2152406.5
10 1479057.375
11 1084818.25
12 840327.75
13 677745.875
14 562169.0
15 475117.5
16 406805.84375
17 351438.21875
18 305637.5
19 267208.90625
20 234634.734375
21 206842.109375
22 182954.859375
23 162364.578125
24 144476.53125
25 128903.2421875
26 115301.6015625
27 103363.3125
28 92890.09375
29 83643.1171875
30 75463.6015625
31 68207.453125
32 61756.71484375
33 56002.2265625
34 50856.8828125
35 46242.8828125
36 42112.625
37 38400.4921875
38 35057.40625
39 32040.736328125
40 29314.3828125
41 26848.125
42 24612.9609375
43 22583.958984375
44 20741.703125
45 19064.09765625
46 17535.896484375
47 16143.1767578125
48 14870.91796875
49 13708.2177734375
50 12645.310546875
51 11672.44921875
52 10780.8671875
53 9963.33203125
54 9212.75390625
55 8523.4814453125
56 7889.79736328125
57 7306.9697265625
58 6770.46923828125
59 6276.3115234375
60 5820.9130859375
61 5401.083984375
62

370 0.0005328041152097285
371 0.0005164493923075497
372 0.000500017311424017
373 0.00048426532885059714
374 0.00046968174865469337
375 0.00045600911835208535
376 0.0004418102907948196
377 0.00042840218520723283
378 0.00041617045644670725
379 0.0004022885113954544
380 0.00039072285289876163
381 0.0003796450619120151
382 0.0003684004186652601
383 0.00035825936356559396
384 0.000347874709405005
385 0.00033767143031582236
386 0.00032777353771962225
387 0.0003186811809428036
388 0.0003095779102295637
389 0.00030074219102971256
390 0.0002925771404989064
391 0.00028477847808972
392 0.0002767588593997061
393 0.00026941223768517375
394 0.00026266323402523994
395 0.0002555839892011136
396 0.00024919724091887474
397 0.00024217991449404508
398 0.0002357291814405471
399 0.00022972399892751127
400 0.00022320299467537552
401 0.00021784694399684668
402 0.00021212741557974368
403 0.0002063465362880379
404 0.00020168907940387726
405 0.0001961248053703457
406 0.0001914820313686505
407 0.00018643178918864

### PyTorch: nn  

这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。

In [14]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# 定义一个model,包含一系列层(线性层==>非线性层==>线性层)
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False),  # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

# 初始化第一和第三层的权重为标准正态分布
torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# 在CUDA上操作
# model = model.cuda()

# 定义loss function
loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters():          # param (tensor, grad)
            param -= learning_rate * param.grad   # 更新每个parameter的gradient
            
    model.zero_grad()    # 在下一次做backward之前, 要把model中所有的gradients清零

0 30584082.0
1 27407488.0
2 27411188.0
3 26371430.0
4 22369450.0
5 16013351.0
6 9947126.0
7 5671097.0
8 3233256.0
9 1955288.875
10 1294840.5
11 934720.375
12 721134.125
13 581780.5
14 482886.8125
15 407987.0625
16 348679.3125
17 300402.1875
18 260369.46875
19 226770.265625
20 198309.1875
21 174049.203125
22 153255.453125
23 135330.75
24 119811.8984375
25 106337.0859375
26 94587.265625
27 84326.15625
28 75330.28125
29 67426.015625
30 60455.8203125
31 54293.1328125
32 48837.6328125
33 43995.109375
34 39691.5859375
35 35856.9765625
36 32435.734375
37 29382.40234375
38 26646.267578125
39 24191.486328125
40 21987.462890625
41 20006.26171875
42 18222.091796875
43 16612.62890625
44 15159.1015625
45 13843.458984375
46 12652.052734375
47 11571.60546875
48 10592.5322265625
49 9703.494140625
50 8894.634765625
51 8158.60546875
52 7488.68115234375
53 6877.5263671875
54 6319.71240234375
55 5810.2685546875
56 5344.748046875
57 4919.052734375
58 4529.76416015625
59 4173.2646484375
60 3846.3330078125
6

365 0.0001878506736829877
366 0.00018308285507373512
367 0.00017855956684798002
368 0.0001748212380334735
369 0.00017010618466883898
370 0.00016587047139182687
371 0.00016235324437730014
372 0.0001590313040651381
373 0.0001551359164295718
374 0.00015127583174034953
375 0.00014766555977985263
376 0.0001444069785065949
377 0.00014071290206629783
378 0.00013767943892162293
379 0.00013477879110723734
380 0.0001322646567132324
381 0.00012937886640429497
382 0.00012646598042920232
383 0.0001234419905813411
384 0.00012085858907084912
385 0.00011868035653606057
386 0.00011608334898483008
387 0.00011425316915847361
388 0.00011180944420630112
389 0.00010900234337896109
390 0.00010676450619939715
391 0.00010481813660589978
392 0.00010279849084326997
393 0.0001007954851957038
394 9.859858255367726e-05
395 9.672382293501869e-05
396 9.456236875848845e-05
397 9.289037552662194e-05
398 9.047762432601303e-05
399 8.896305371308699e-05
400 8.69113500812091e-05
401 8.499843534082174e-05
402 8.416582568315

In [15]:
# 拿到model中第一层的权重值
model[0].weight

Parameter containing:
tensor([[-0.4157,  0.5799,  0.8094,  ...,  0.7169, -0.1234,  0.0714],
        [ 0.3616,  0.3003, -0.7448,  ...,  1.7456,  0.4496,  1.4171],
        [ 0.5597,  0.5766, -0.1384,  ..., -0.7784,  0.2010, -0.4795],
        ...,
        [ 0.7863,  1.4226,  0.0921,  ..., -0.7318,  0.4703, -1.4407],
        [-1.2247,  0.7996,  0.3422,  ...,  1.0790, -0.7005, -1.3105],
        [ 0.6267, -0.2989, -1.8842,  ..., -0.2625,  0.9338, -1.2195]],
       requires_grad=True)

### PyTorch: optim

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。

In [16]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
# 定义优化器,参数必须是model的parameters
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()     # 求导(backward)之前需要清零优化器
    
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()

0 25701622.0
1 19079542.0
2 16068579.0
3 14051148.0
4 12072919.0
5 9941292.0
6 7767226.0
7 5804557.5
8 4201435.0
9 2997906.0
10 2139265.0
11 1546468.125
12 1141855.875
13 865508.25
14 674153.5
15 538986.375
16 440507.375
17 366907.28125
18 310321.09375
19 265740.375
20 229801.53125
21 200220.109375
22 175543.28125
23 154712.125
24 136921.203125
25 121619.90625
26 108366.5625
27 96822.8984375
28 86730.4609375
29 77876.015625
30 70082.109375
31 63193.2265625
32 57075.359375
33 51639.19921875
34 46800.8203125
35 42481.48046875
36 38616.953125
37 35158.9609375
38 32057.267578125
39 29273.125
40 26763.650390625
41 24498.216796875
42 22449.0390625
43 20593.20703125
44 18911.91796875
45 17385.615234375
46 15998.322265625
47 14736.3427734375
48 13585.8681640625
49 12536.798828125
50 11579.095703125
51 10702.66796875
52 9900.484375
53 9166.28125
54 8492.826171875
55 7874.24658203125
56 7305.67138671875
57 6782.3173828125
58 6300.42822265625
59 5856.1904296875
60 5446.380859375
61 5068.076171875

372 0.0008906740695238113
373 0.000860811211168766
374 0.0008342242217622697
375 0.000808666693046689
376 0.0007823614869266748
377 0.0007579271332360804
378 0.0007335823611356318
379 0.0007114170002751052
380 0.0006893493700772524
381 0.0006672799936495721
382 0.0006469915388152003
383 0.0006274401093833148
384 0.0006093415431678295
385 0.0005897798691876233
386 0.0005732934223487973
387 0.0005575974937528372
388 0.0005410705343820155
389 0.0005251782131381333
390 0.0005095519009046257
391 0.0004958304343745112
392 0.0004805728094652295
393 0.00046700192615389824
394 0.00045495288213714957
395 0.0004424403887242079
396 0.0004310978692956269
397 0.00041862763464450836
398 0.00040753581561148167
399 0.00039663855568505824
400 0.0003858022391796112
401 0.0003751878102775663
402 0.0003657260676845908
403 0.00035614476655609906
404 0.0003474010736681521
405 0.0003383298171684146
406 0.0003288863517809659
407 0.00032129453029483557
408 0.00031325648888014257
409 0.0003063096955884248
410 0.

### PyTorch: 自定义 nn Modules  

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。

In [17]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# 自定义两层model
class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))  # 计算预测值
        return y_pred

# 初始化model
model = TwoLayerNet(D_in, H, D_out)

# 定义loss function
loss_fn = nn.MSELoss(reduction='sum')

# 定义优化器
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 开始训练
for it in range(500):
    # Forward pass
    y_pred = model(x)          # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y)  # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()

0 612.8851928710938
1 596.4942626953125
2 580.6094360351562
3 565.1251220703125
4 550.0615234375
5 535.4725341796875
6 521.3463745117188
7 507.6250915527344
8 494.3796081542969
9 481.5726318359375
10 469.1031188964844
11 457.09967041015625
12 445.44354248046875
13 434.09368896484375
14 423.11212158203125
15 412.403076171875
16 401.92144775390625
17 391.70135498046875
18 381.7684326171875
19 372.1032409667969
20 362.67822265625
21 353.500244140625
22 344.55584716796875
23 335.8111572265625
24 327.3143310546875
25 319.0458984375
26 311.01580810546875
27 303.20648193359375
28 295.5889587402344
29 288.1719665527344
30 280.933349609375
31 273.88067626953125
32 267.0078125
33 260.31976318359375
34 253.7830047607422
35 247.40472412109375
36 241.19911193847656
37 235.16195678710938
38 229.24734497070312
39 223.470703125
40 217.82260131835938
41 212.28904724121094
42 206.87698364257812
43 201.5718536376953
44 196.37486267089844
45 191.2836151123047
46 186.3026580810547
47 181.42311096191406
48 

351 8.305721712531522e-05
352 7.918832125142217e-05
353 7.550574810011312e-05
354 7.198997627710924e-05
355 6.864087481517345e-05
356 6.54560572002083e-05
357 6.24128442723304e-05
358 5.951173807261512e-05
359 5.6746710470179096e-05
360 5.4109827033244073e-05
361 5.159877036930993e-05
362 4.919943239656277e-05
363 4.6911678509786725e-05
364 4.473177978070453e-05
365 4.26536425948143e-05
366 4.066774999955669e-05
367 3.8775269786128774e-05
368 3.6969780921936035e-05
369 3.52479110006243e-05
370 3.360582559253089e-05
371 3.2037216442404315e-05
372 3.05416360788513e-05
373 2.9115142751834355e-05
374 2.775425855361391e-05
375 2.64569534920156e-05
376 2.521764690754935e-05
377 2.4034838133957237e-05
378 2.2906850063009188e-05
379 2.183021388191264e-05
380 2.080512058455497e-05
381 1.9824223272735253e-05
382 1.889078521344345e-05
383 1.7998743714997545e-05
384 1.7148997358162887e-05
385 1.633744250284508e-05
386 1.5563589840894565e-05
387 1.4824500794929918e-05
388 1.4120972991804592e-05
389

## FizzBuzz

FizzBuzz是一个简单的小游戏。游戏规则如下：从1开始往上数数，当遇到3的倍数的时候，说fizz，当遇到5的倍数，说buzz，当遇到15的倍数，就说fizzbuzz，其他情况下则正常数数。

我们可以写一个简单的小程序来决定要返回正常数值还是fizz, buzz 或者 fizzbuzz。

In [38]:
# One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if   i % 15 == 0: return 3
    elif i % 5  == 0: return 2
    elif i % 3  == 0: return 1
    else:             return 0
    
def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

print(fizz_buzz_decode(1, fizz_buzz_encode(1)))
print(fizz_buzz_decode(2, fizz_buzz_encode(2)))
print(fizz_buzz_decode(5, fizz_buzz_encode(5)))
print(fizz_buzz_decode(12, fizz_buzz_encode(12)))
print(fizz_buzz_decode(15, fizz_buzz_encode(15)))

1
2
buzz
fizz
fizzbuzz


我们首先定义模型的输入与输出(训练数据)

In [46]:
import numpy as np
import torch

NUM_DIGITS = 10

# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])  # 用num_digits长度的二进制表示一个十进制数字

# 定义训练数据: 输入是923*10, 输出是923*1
trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])
print(trX.shape, trY.shape)

torch.Size([923, 10]) torch.Size([923])


然后我们用PyTorch定义模型

In [47]:
# Define the model.
NUM_HIDDEN = 100
model = torch.nn.Sequential(
    torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN, 4)    # 四分类问题
)
# if torch.cuda.is_available():
#     model = model.cuda()

- 为了让我们的模型学会FizzBuzz这个游戏，我们需要定义一个损失函数，和一个优化算法。
- 这个优化算法会不断优化（降低）损失函数，使得模型的在该任务上取得尽可能低的损失值。
- 损失值低往往表示我们的模型表现好，损失值高表示我们的模型表现差。
- 由于FizzBuzz游戏本质上是一个分类问题，我们选用Cross Entropyy Loss函数。
- 优化函数我们选用Stochastic Gradient Descent。

In [48]:
# 定义分类问题的损失函数和优化器
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.05)

以下是模型的训练代码

In [49]:
# Start training it
BATCH_SIZE = 128
for epoch in range(10000):                          # 训练10000次
    for start in range(0, len(trX), BATCH_SIZE):    # 每次训练中再分批处理
        end = start + BATCH_SIZE
        batchX = trX[start:end]
        batchY = trY[start:end]
        
        # 训练数据放到CUDA
#         if torch.cuda.is_available():
#             batchX = batchX.cuda()
#             batchY = batchY.cuda()
        
        # forward pass
        y_pred = model(batchX)
        loss = loss_fn(y_pred, batchY)

        optimizer.zero_grad()
        # backward pass
        loss.backward()
        # gradient descent
        optimizer.step()

    # Find loss on training data
    loss = loss_fn(model(trX), trY).item()
    print('Epoch:', epoch, 'Loss:', loss)

Epoch: 0 Loss: 1.1671295166015625
Epoch: 1 Loss: 1.1505825519561768
Epoch: 2 Loss: 1.1473468542099
Epoch: 3 Loss: 1.1460673809051514
Epoch: 4 Loss: 1.1453193426132202
Epoch: 5 Loss: 1.1447807550430298
Epoch: 6 Loss: 1.1443426609039307
Epoch: 7 Loss: 1.1439651250839233
Epoch: 8 Loss: 1.143627405166626
Epoch: 9 Loss: 1.1433199644088745
Epoch: 10 Loss: 1.1430312395095825
Epoch: 11 Loss: 1.1427569389343262
Epoch: 12 Loss: 1.1424994468688965
Epoch: 13 Loss: 1.14225435256958
Epoch: 14 Loss: 1.1420193910598755
Epoch: 15 Loss: 1.1417971849441528
Epoch: 16 Loss: 1.1415824890136719
Epoch: 17 Loss: 1.1413750648498535
Epoch: 18 Loss: 1.1411782503128052
Epoch: 19 Loss: 1.140988826751709
Epoch: 20 Loss: 1.1408054828643799
Epoch: 21 Loss: 1.1406307220458984
Epoch: 22 Loss: 1.14046049118042
Epoch: 23 Loss: 1.140296220779419
Epoch: 24 Loss: 1.1401362419128418
Epoch: 25 Loss: 1.1399821043014526
Epoch: 26 Loss: 1.1398342847824097
Epoch: 27 Loss: 1.1396851539611816
Epoch: 28 Loss: 1.1395463943481445
Epoch

Epoch: 232 Loss: 1.1187427043914795
Epoch: 233 Loss: 1.118605613708496
Epoch: 234 Loss: 1.118482232093811
Epoch: 235 Loss: 1.118325114250183
Epoch: 236 Loss: 1.118159294128418
Epoch: 237 Loss: 1.1180609464645386
Epoch: 238 Loss: 1.1179159879684448
Epoch: 239 Loss: 1.1177430152893066
Epoch: 240 Loss: 1.117587685585022
Epoch: 241 Loss: 1.117477536201477
Epoch: 242 Loss: 1.117321252822876
Epoch: 243 Loss: 1.117167353630066
Epoch: 244 Loss: 1.1170377731323242
Epoch: 245 Loss: 1.1168949604034424
Epoch: 246 Loss: 1.1167738437652588
Epoch: 247 Loss: 1.1166027784347534
Epoch: 248 Loss: 1.1164711713790894
Epoch: 249 Loss: 1.1163445711135864
Epoch: 250 Loss: 1.1161770820617676
Epoch: 251 Loss: 1.1160564422607422
Epoch: 252 Loss: 1.1158812046051025
Epoch: 253 Loss: 1.1157456636428833
Epoch: 254 Loss: 1.1156063079833984
Epoch: 255 Loss: 1.1154855489730835
Epoch: 256 Loss: 1.115277886390686
Epoch: 257 Loss: 1.115141749382019
Epoch: 258 Loss: 1.1150747537612915
Epoch: 259 Loss: 1.1148408651351929
Ep

Epoch: 461 Loss: 1.0576612949371338
Epoch: 462 Loss: 1.05746591091156
Epoch: 463 Loss: 1.0572350025177002
Epoch: 464 Loss: 1.0565558671951294
Epoch: 465 Loss: 1.056056261062622
Epoch: 466 Loss: 1.055806040763855
Epoch: 467 Loss: 1.0552853345870972
Epoch: 468 Loss: 1.0549802780151367
Epoch: 469 Loss: 1.0545657873153687
Epoch: 470 Loss: 1.0539137125015259
Epoch: 471 Loss: 1.0535575151443481
Epoch: 472 Loss: 1.053863286972046
Epoch: 473 Loss: 1.0530118942260742
Epoch: 474 Loss: 1.0524027347564697
Epoch: 475 Loss: 1.05244779586792
Epoch: 476 Loss: 1.0515512228012085
Epoch: 477 Loss: 1.0511388778686523
Epoch: 478 Loss: 1.0507194995880127
Epoch: 479 Loss: 1.050369381904602
Epoch: 480 Loss: 1.0503205060958862
Epoch: 481 Loss: 1.0492684841156006
Epoch: 482 Loss: 1.0495191812515259
Epoch: 483 Loss: 1.0488884449005127
Epoch: 484 Loss: 1.0481494665145874
Epoch: 485 Loss: 1.048030138015747
Epoch: 486 Loss: 1.0474677085876465
Epoch: 487 Loss: 1.047338604927063
Epoch: 488 Loss: 1.0470020771026611
Ep

Epoch: 689 Loss: 0.9402035474777222
Epoch: 690 Loss: 0.9396965503692627
Epoch: 691 Loss: 0.9394804239273071
Epoch: 692 Loss: 0.9383803606033325
Epoch: 693 Loss: 0.9381263256072998
Epoch: 694 Loss: 0.9378105401992798
Epoch: 695 Loss: 0.937196671962738
Epoch: 696 Loss: 0.9359325170516968
Epoch: 697 Loss: 0.9357682466506958
Epoch: 698 Loss: 0.9363159537315369
Epoch: 699 Loss: 0.9349292516708374
Epoch: 700 Loss: 0.9337396025657654
Epoch: 701 Loss: 0.9325191378593445
Epoch: 702 Loss: 0.9320878386497498
Epoch: 703 Loss: 0.9331204295158386
Epoch: 704 Loss: 0.9317018985748291
Epoch: 705 Loss: 0.9308800101280212
Epoch: 706 Loss: 0.9301913380622864
Epoch: 707 Loss: 0.9293592572212219
Epoch: 708 Loss: 0.9280061721801758
Epoch: 709 Loss: 0.9275913238525391
Epoch: 710 Loss: 0.9270132184028625
Epoch: 711 Loss: 0.9266520142555237
Epoch: 712 Loss: 0.9265201091766357
Epoch: 713 Loss: 0.9251084327697754
Epoch: 714 Loss: 0.9243316650390625
Epoch: 715 Loss: 0.9247979521751404
Epoch: 716 Loss: 0.9232249855

Epoch: 917 Loss: 0.7658336758613586
Epoch: 918 Loss: 0.7644246816635132
Epoch: 919 Loss: 0.7642066478729248
Epoch: 920 Loss: 0.7636033296585083
Epoch: 921 Loss: 0.7617868185043335
Epoch: 922 Loss: 0.7605236172676086
Epoch: 923 Loss: 0.7596681118011475
Epoch: 924 Loss: 0.7591968178749084
Epoch: 925 Loss: 0.7579259276390076
Epoch: 926 Loss: 0.7570577263832092
Epoch: 927 Loss: 0.755851686000824
Epoch: 928 Loss: 0.7550821900367737
Epoch: 929 Loss: 0.7555748820304871
Epoch: 930 Loss: 0.7537617683410645
Epoch: 931 Loss: 0.7533740997314453
Epoch: 932 Loss: 0.7519095540046692
Epoch: 933 Loss: 0.7503618597984314
Epoch: 934 Loss: 0.7508559226989746
Epoch: 935 Loss: 0.7491081357002258
Epoch: 936 Loss: 0.7479967474937439
Epoch: 937 Loss: 0.7467458844184875
Epoch: 938 Loss: 0.7469519376754761
Epoch: 939 Loss: 0.7451224327087402
Epoch: 940 Loss: 0.7452773451805115
Epoch: 941 Loss: 0.7438775897026062
Epoch: 942 Loss: 0.7440063953399658
Epoch: 943 Loss: 0.7416478991508484
Epoch: 944 Loss: 0.7409866452

Epoch: 1141 Loss: 0.5578786134719849
Epoch: 1142 Loss: 0.5572047829627991
Epoch: 1143 Loss: 0.5561752319335938
Epoch: 1144 Loss: 0.554753303527832
Epoch: 1145 Loss: 0.5540217757225037
Epoch: 1146 Loss: 0.5529293417930603
Epoch: 1147 Loss: 0.5527163147926331
Epoch: 1148 Loss: 0.5513776540756226
Epoch: 1149 Loss: 0.5501625537872314
Epoch: 1150 Loss: 0.5498420000076294
Epoch: 1151 Loss: 0.5484408736228943
Epoch: 1152 Loss: 0.5476326942443848
Epoch: 1153 Loss: 0.5470748543739319
Epoch: 1154 Loss: 0.5459793210029602
Epoch: 1155 Loss: 0.5454975366592407
Epoch: 1156 Loss: 0.5441316366195679
Epoch: 1157 Loss: 0.5430188775062561
Epoch: 1158 Loss: 0.5423689484596252
Epoch: 1159 Loss: 0.54164057970047
Epoch: 1160 Loss: 0.5406807661056519
Epoch: 1161 Loss: 0.5396685600280762
Epoch: 1162 Loss: 0.5391404032707214
Epoch: 1163 Loss: 0.538253128528595
Epoch: 1164 Loss: 0.5368703007698059
Epoch: 1165 Loss: 0.5366700291633606
Epoch: 1166 Loss: 0.5359495878219604
Epoch: 1167 Loss: 0.5345215797424316
Epoch

Epoch: 1361 Loss: 0.3813348710536957
Epoch: 1362 Loss: 0.3804241716861725
Epoch: 1363 Loss: 0.38091710209846497
Epoch: 1364 Loss: 0.3790685534477234
Epoch: 1365 Loss: 0.37828388810157776
Epoch: 1366 Loss: 0.37794965505599976
Epoch: 1367 Loss: 0.37711581587791443
Epoch: 1368 Loss: 0.376871794462204
Epoch: 1369 Loss: 0.3755810260772705
Epoch: 1370 Loss: 0.3763425052165985
Epoch: 1371 Loss: 0.375156044960022
Epoch: 1372 Loss: 0.374043345451355
Epoch: 1373 Loss: 0.37314683198928833
Epoch: 1374 Loss: 0.3726581931114197
Epoch: 1375 Loss: 0.37171319127082825
Epoch: 1376 Loss: 0.3712460994720459
Epoch: 1377 Loss: 0.37088415026664734
Epoch: 1378 Loss: 0.3695302903652191
Epoch: 1379 Loss: 0.3689679801464081
Epoch: 1380 Loss: 0.368332177400589
Epoch: 1381 Loss: 0.367782324552536
Epoch: 1382 Loss: 0.3667169511318207
Epoch: 1383 Loss: 0.3672838509082794
Epoch: 1384 Loss: 0.36589735746383667
Epoch: 1385 Loss: 0.3650367558002472
Epoch: 1386 Loss: 0.3644337058067322
Epoch: 1387 Loss: 0.363414824008941

Epoch: 1580 Loss: 0.25483015179634094
Epoch: 1581 Loss: 0.25377357006073
Epoch: 1582 Loss: 0.2532642185688019
Epoch: 1583 Loss: 0.25255560874938965
Epoch: 1584 Loss: 0.25359776616096497
Epoch: 1585 Loss: 0.25196704268455505
Epoch: 1586 Loss: 0.251701682806015
Epoch: 1587 Loss: 0.2513778507709503
Epoch: 1588 Loss: 0.2506248950958252
Epoch: 1589 Loss: 0.2500211000442505
Epoch: 1590 Loss: 0.24985864758491516
Epoch: 1591 Loss: 0.24937964975833893
Epoch: 1592 Loss: 0.24945852160453796
Epoch: 1593 Loss: 0.24892711639404297
Epoch: 1594 Loss: 0.24874241650104523
Epoch: 1595 Loss: 0.24771364033222198
Epoch: 1596 Loss: 0.2481001615524292
Epoch: 1597 Loss: 0.2468033730983734
Epoch: 1598 Loss: 0.24642738699913025
Epoch: 1599 Loss: 0.2458091676235199
Epoch: 1600 Loss: 0.245590940117836
Epoch: 1601 Loss: 0.24502046406269073
Epoch: 1602 Loss: 0.24440829455852509
Epoch: 1603 Loss: 0.2440701723098755
Epoch: 1604 Loss: 0.24364666640758514
Epoch: 1605 Loss: 0.24319373071193695
Epoch: 1606 Loss: 0.2425256

Epoch: 1797 Loss: 0.1782175749540329
Epoch: 1798 Loss: 0.17774489521980286
Epoch: 1799 Loss: 0.17766305804252625
Epoch: 1800 Loss: 0.1773190200328827
Epoch: 1801 Loss: 0.17686007916927338
Epoch: 1802 Loss: 0.17690642178058624
Epoch: 1803 Loss: 0.17646771669387817
Epoch: 1804 Loss: 0.17642800509929657
Epoch: 1805 Loss: 0.17585347592830658
Epoch: 1806 Loss: 0.17583943903446198
Epoch: 1807 Loss: 0.17536661028862
Epoch: 1808 Loss: 0.17530374228954315
Epoch: 1809 Loss: 0.17478276789188385
Epoch: 1810 Loss: 0.17495375871658325
Epoch: 1811 Loss: 0.17449894547462463
Epoch: 1812 Loss: 0.1740904003381729
Epoch: 1813 Loss: 0.17379769682884216
Epoch: 1814 Loss: 0.17366880178451538
Epoch: 1815 Loss: 0.17384852468967438
Epoch: 1816 Loss: 0.17306627333164215
Epoch: 1817 Loss: 0.17292238771915436
Epoch: 1818 Loss: 0.17275463044643402
Epoch: 1819 Loss: 0.17227086424827576
Epoch: 1820 Loss: 0.17200851440429688
Epoch: 1821 Loss: 0.17191989719867706
Epoch: 1822 Loss: 0.17207300662994385
Epoch: 1823 Loss: 

Epoch: 2013 Loss: 0.134088397026062
Epoch: 2014 Loss: 0.13401709496974945
Epoch: 2015 Loss: 0.1337732970714569
Epoch: 2016 Loss: 0.13364332914352417
Epoch: 2017 Loss: 0.13344885408878326
Epoch: 2018 Loss: 0.13342343270778656
Epoch: 2019 Loss: 0.13308849930763245
Epoch: 2020 Loss: 0.1330573856830597
Epoch: 2021 Loss: 0.13291144371032715
Epoch: 2022 Loss: 0.13264153897762299
Epoch: 2023 Loss: 0.1325129121541977
Epoch: 2024 Loss: 0.13241155445575714
Epoch: 2025 Loss: 0.13228698074817657
Epoch: 2026 Loss: 0.13211336731910706
Epoch: 2027 Loss: 0.1320229321718216
Epoch: 2028 Loss: 0.13181649148464203
Epoch: 2029 Loss: 0.13162218034267426
Epoch: 2030 Loss: 0.13148058950901031
Epoch: 2031 Loss: 0.13129925727844238
Epoch: 2032 Loss: 0.13116848468780518
Epoch: 2033 Loss: 0.1310192346572876
Epoch: 2034 Loss: 0.13089904189109802
Epoch: 2035 Loss: 0.13075819611549377
Epoch: 2036 Loss: 0.13060802221298218
Epoch: 2037 Loss: 0.1304006427526474
Epoch: 2038 Loss: 0.1302492767572403
Epoch: 2039 Loss: 0.1

Epoch: 2229 Loss: 0.10650036484003067
Epoch: 2230 Loss: 0.1063622385263443
Epoch: 2231 Loss: 0.1062159538269043
Epoch: 2232 Loss: 0.10617890954017639
Epoch: 2233 Loss: 0.10606302320957184
Epoch: 2234 Loss: 0.10593526065349579
Epoch: 2235 Loss: 0.1058071106672287
Epoch: 2236 Loss: 0.10571444779634476
Epoch: 2237 Loss: 0.10560999810695648
Epoch: 2238 Loss: 0.10556701570749283
Epoch: 2239 Loss: 0.10543933510780334
Epoch: 2240 Loss: 0.10537666827440262
Epoch: 2241 Loss: 0.10525594651699066
Epoch: 2242 Loss: 0.10512358695268631
Epoch: 2243 Loss: 0.105152927339077
Epoch: 2244 Loss: 0.10492606461048126
Epoch: 2245 Loss: 0.10481537133455276
Epoch: 2246 Loss: 0.10470451414585114
Epoch: 2247 Loss: 0.1045757308602333
Epoch: 2248 Loss: 0.10452643781900406
Epoch: 2249 Loss: 0.10438501089811325
Epoch: 2250 Loss: 0.10435175895690918
Epoch: 2251 Loss: 0.10421611368656158
Epoch: 2252 Loss: 0.10409204661846161
Epoch: 2253 Loss: 0.1039891466498375
Epoch: 2254 Loss: 0.10393382608890533
Epoch: 2255 Loss: 0

Epoch: 2445 Loss: 0.0876474604010582
Epoch: 2446 Loss: 0.08762016892433167
Epoch: 2447 Loss: 0.08747073262929916
Epoch: 2448 Loss: 0.08747440576553345
Epoch: 2449 Loss: 0.08735126256942749
Epoch: 2450 Loss: 0.08744354546070099
Epoch: 2451 Loss: 0.08726350963115692
Epoch: 2452 Loss: 0.08713896572589874
Epoch: 2453 Loss: 0.08712287992238998
Epoch: 2454 Loss: 0.08701343834400177
Epoch: 2455 Loss: 0.08688829094171524
Epoch: 2456 Loss: 0.08687110245227814
Epoch: 2457 Loss: 0.08676072210073471
Epoch: 2458 Loss: 0.08675374835729599
Epoch: 2459 Loss: 0.08679033815860748
Epoch: 2460 Loss: 0.08658914268016815
Epoch: 2461 Loss: 0.08649464696645737
Epoch: 2462 Loss: 0.0865168496966362
Epoch: 2463 Loss: 0.08637602627277374
Epoch: 2464 Loss: 0.08626803755760193
Epoch: 2465 Loss: 0.0862186849117279
Epoch: 2466 Loss: 0.08617985248565674
Epoch: 2467 Loss: 0.08611767739057541
Epoch: 2468 Loss: 0.08598871529102325
Epoch: 2469 Loss: 0.08607611805200577
Epoch: 2470 Loss: 0.08585193753242493
Epoch: 2471 Los

Epoch: 2661 Loss: 0.0740671157836914
Epoch: 2662 Loss: 0.07396484911441803
Epoch: 2663 Loss: 0.07387436926364899
Epoch: 2664 Loss: 0.07390188425779343
Epoch: 2665 Loss: 0.07376675307750702
Epoch: 2666 Loss: 0.0737120658159256
Epoch: 2667 Loss: 0.07372020184993744
Epoch: 2668 Loss: 0.07360817492008209
Epoch: 2669 Loss: 0.07362392544746399
Epoch: 2670 Loss: 0.07347884774208069
Epoch: 2671 Loss: 0.07349720597267151
Epoch: 2672 Loss: 0.07339026778936386
Epoch: 2673 Loss: 0.07331346720457077
Epoch: 2674 Loss: 0.07327642291784286
Epoch: 2675 Loss: 0.07323070615530014
Epoch: 2676 Loss: 0.07315930724143982
Epoch: 2677 Loss: 0.0731341689825058
Epoch: 2678 Loss: 0.07303822785615921
Epoch: 2679 Loss: 0.07300972938537598
Epoch: 2680 Loss: 0.07297687977552414
Epoch: 2681 Loss: 0.0728595182299614
Epoch: 2682 Loss: 0.07283385097980499
Epoch: 2683 Loss: 0.0727756917476654
Epoch: 2684 Loss: 0.07270215451717377
Epoch: 2685 Loss: 0.0727299153804779
Epoch: 2686 Loss: 0.0726659819483757
Epoch: 2687 Loss: 0

Epoch: 2877 Loss: 0.06360799819231033
Epoch: 2878 Loss: 0.06356651335954666
Epoch: 2879 Loss: 0.06348317861557007
Epoch: 2880 Loss: 0.06342194229364395
Epoch: 2881 Loss: 0.06345760822296143
Epoch: 2882 Loss: 0.0633319616317749
Epoch: 2883 Loss: 0.06330428272485733
Epoch: 2884 Loss: 0.0632735937833786
Epoch: 2885 Loss: 0.06320297718048096
Epoch: 2886 Loss: 0.06327451020479202
Epoch: 2887 Loss: 0.06317539513111115
Epoch: 2888 Loss: 0.06311025470495224
Epoch: 2889 Loss: 0.06308712065219879
Epoch: 2890 Loss: 0.06301146745681763
Epoch: 2891 Loss: 0.06299193203449249
Epoch: 2892 Loss: 0.06292460858821869
Epoch: 2893 Loss: 0.06289585679769516
Epoch: 2894 Loss: 0.06285305321216583
Epoch: 2895 Loss: 0.06281646341085434
Epoch: 2896 Loss: 0.0628601610660553
Epoch: 2897 Loss: 0.06273584067821503
Epoch: 2898 Loss: 0.06269146502017975
Epoch: 2899 Loss: 0.06267240643501282
Epoch: 2900 Loss: 0.0626172199845314
Epoch: 2901 Loss: 0.06256529688835144
Epoch: 2902 Loss: 0.06254670023918152
Epoch: 2903 Loss

Epoch: 3091 Loss: 0.05529477074742317
Epoch: 3092 Loss: 0.05525246635079384
Epoch: 3093 Loss: 0.05526745691895485
Epoch: 3094 Loss: 0.05518920347094536
Epoch: 3095 Loss: 0.055153556168079376
Epoch: 3096 Loss: 0.05513948202133179
Epoch: 3097 Loss: 0.055093973875045776
Epoch: 3098 Loss: 0.055090487003326416
Epoch: 3099 Loss: 0.055023182183504105
Epoch: 3100 Loss: 0.05499022454023361
Epoch: 3101 Loss: 0.054966818541288376
Epoch: 3102 Loss: 0.05492798238992691
Epoch: 3103 Loss: 0.05487947165966034
Epoch: 3104 Loss: 0.054848767817020416
Epoch: 3105 Loss: 0.054788876324892044
Epoch: 3106 Loss: 0.05477519333362579
Epoch: 3107 Loss: 0.05473906919360161
Epoch: 3108 Loss: 0.05473509058356285
Epoch: 3109 Loss: 0.05466680973768234
Epoch: 3110 Loss: 0.05461501330137253
Epoch: 3111 Loss: 0.0546061210334301
Epoch: 3112 Loss: 0.05457378923892975
Epoch: 3113 Loss: 0.054580289870500565
Epoch: 3114 Loss: 0.054497528821229935
Epoch: 3115 Loss: 0.05446775257587433
Epoch: 3116 Loss: 0.05442870035767555
Epoc

Epoch: 3305 Loss: 0.04858308658003807
Epoch: 3306 Loss: 0.0485985092818737
Epoch: 3307 Loss: 0.048588771373033524
Epoch: 3308 Loss: 0.04849693179130554
Epoch: 3309 Loss: 0.04848721995949745
Epoch: 3310 Loss: 0.04842355102300644
Epoch: 3311 Loss: 0.04841632395982742
Epoch: 3312 Loss: 0.048388198018074036
Epoch: 3313 Loss: 0.04837508127093315
Epoch: 3314 Loss: 0.048339053988456726
Epoch: 3315 Loss: 0.048307813704013824
Epoch: 3316 Loss: 0.04828929901123047
Epoch: 3317 Loss: 0.04831169918179512
Epoch: 3318 Loss: 0.04822473227977753
Epoch: 3319 Loss: 0.0482025109231472
Epoch: 3320 Loss: 0.048156023025512695
Epoch: 3321 Loss: 0.04814111068844795
Epoch: 3322 Loss: 0.04811404272913933
Epoch: 3323 Loss: 0.04808041453361511
Epoch: 3324 Loss: 0.0480370819568634
Epoch: 3325 Loss: 0.04804364591836929
Epoch: 3326 Loss: 0.048013053834438324
Epoch: 3327 Loss: 0.0479828380048275
Epoch: 3328 Loss: 0.04795282334089279
Epoch: 3329 Loss: 0.04792524874210358
Epoch: 3330 Loss: 0.047881271690130234
Epoch: 33

Epoch: 3519 Loss: 0.04314957931637764
Epoch: 3520 Loss: 0.04313018172979355
Epoch: 3521 Loss: 0.043101776391267776
Epoch: 3522 Loss: 0.04309447482228279
Epoch: 3523 Loss: 0.04307859390974045
Epoch: 3524 Loss: 0.043071117252111435
Epoch: 3525 Loss: 0.04301359876990318
Epoch: 3526 Loss: 0.042984578758478165
Epoch: 3527 Loss: 0.04297308251261711
Epoch: 3528 Loss: 0.0429404079914093
Epoch: 3529 Loss: 0.04292804002761841
Epoch: 3530 Loss: 0.042902871966362
Epoch: 3531 Loss: 0.042886875569820404
Epoch: 3532 Loss: 0.042831871658563614
Epoch: 3533 Loss: 0.042847126722335815
Epoch: 3534 Loss: 0.04279229789972305
Epoch: 3535 Loss: 0.04278385266661644
Epoch: 3536 Loss: 0.04275187849998474
Epoch: 3537 Loss: 0.04277915880084038
Epoch: 3538 Loss: 0.04273777827620506
Epoch: 3539 Loss: 0.042712800204753876
Epoch: 3540 Loss: 0.04267468303442001
Epoch: 3541 Loss: 0.042635951191186905
Epoch: 3542 Loss: 0.042645134031772614
Epoch: 3543 Loss: 0.042598433792591095
Epoch: 3544 Loss: 0.042578183114528656
Epoc

Epoch: 3733 Loss: 0.03863080218434334
Epoch: 3734 Loss: 0.03865200653672218
Epoch: 3735 Loss: 0.0386202335357666
Epoch: 3736 Loss: 0.038585685193538666
Epoch: 3737 Loss: 0.038553424179553986
Epoch: 3738 Loss: 0.038538843393325806
Epoch: 3739 Loss: 0.03853343799710274
Epoch: 3740 Loss: 0.03851146250963211
Epoch: 3741 Loss: 0.03848296031355858
Epoch: 3742 Loss: 0.038466718047857285
Epoch: 3743 Loss: 0.03843557834625244
Epoch: 3744 Loss: 0.03842293843626976
Epoch: 3745 Loss: 0.038407906889915466
Epoch: 3746 Loss: 0.03839730843901634
Epoch: 3747 Loss: 0.0383693091571331
Epoch: 3748 Loss: 0.038352251052856445
Epoch: 3749 Loss: 0.03832171857357025
Epoch: 3750 Loss: 0.03831395506858826
Epoch: 3751 Loss: 0.03828231245279312
Epoch: 3752 Loss: 0.03828957676887512
Epoch: 3753 Loss: 0.038245826959609985
Epoch: 3754 Loss: 0.03824087232351303
Epoch: 3755 Loss: 0.03820888325572014
Epoch: 3756 Loss: 0.03820405900478363
Epoch: 3757 Loss: 0.038175296038389206
Epoch: 3758 Loss: 0.03818564862012863
Epoch:

Epoch: 3946 Loss: 0.03480196371674538
Epoch: 3947 Loss: 0.034811053425073624
Epoch: 3948 Loss: 0.03478027880191803
Epoch: 3949 Loss: 0.0347663052380085
Epoch: 3950 Loss: 0.034769583493471146
Epoch: 3951 Loss: 0.034731674939394
Epoch: 3952 Loss: 0.034740149974823
Epoch: 3953 Loss: 0.03470452129840851
Epoch: 3954 Loss: 0.034696463495492935
Epoch: 3955 Loss: 0.034682583063840866
Epoch: 3956 Loss: 0.034650176763534546
Epoch: 3957 Loss: 0.0346502885222435
Epoch: 3958 Loss: 0.03461654111742973
Epoch: 3959 Loss: 0.03460745885968208
Epoch: 3960 Loss: 0.03458546847105026
Epoch: 3961 Loss: 0.03457333520054817
Epoch: 3962 Loss: 0.03459347411990166
Epoch: 3963 Loss: 0.03455909341573715
Epoch: 3964 Loss: 0.03452179580926895
Epoch: 3965 Loss: 0.0345202274620533
Epoch: 3966 Loss: 0.03450445458292961
Epoch: 3967 Loss: 0.03448176383972168
Epoch: 3968 Loss: 0.034482911229133606
Epoch: 3969 Loss: 0.034443385899066925
Epoch: 3970 Loss: 0.03443125635385513
Epoch: 3971 Loss: 0.034398771822452545
Epoch: 3972

Epoch: 4160 Loss: 0.031588662415742874
Epoch: 4161 Loss: 0.03158016875386238
Epoch: 4162 Loss: 0.03156815469264984
Epoch: 4163 Loss: 0.03155025467276573
Epoch: 4164 Loss: 0.031543150544166565
Epoch: 4165 Loss: 0.031523317098617554
Epoch: 4166 Loss: 0.0315127857029438
Epoch: 4167 Loss: 0.03148986026644707
Epoch: 4168 Loss: 0.031485363841056824
Epoch: 4169 Loss: 0.03146245703101158
Epoch: 4170 Loss: 0.03144361823797226
Epoch: 4171 Loss: 0.031446464359760284
Epoch: 4172 Loss: 0.03142792731523514
Epoch: 4173 Loss: 0.031410470604896545
Epoch: 4174 Loss: 0.03139812499284744
Epoch: 4175 Loss: 0.031393811106681824
Epoch: 4176 Loss: 0.03137332946062088
Epoch: 4177 Loss: 0.03135902062058449
Epoch: 4178 Loss: 0.03135116025805473
Epoch: 4179 Loss: 0.031320054084062576
Epoch: 4180 Loss: 0.03132839873433113
Epoch: 4181 Loss: 0.03132300823926926
Epoch: 4182 Loss: 0.03131265565752983
Epoch: 4183 Loss: 0.03127366676926613
Epoch: 4184 Loss: 0.03127559274435043
Epoch: 4185 Loss: 0.03124488890171051
Epoch

Epoch: 4372 Loss: 0.028874104842543602
Epoch: 4373 Loss: 0.02885342389345169
Epoch: 4374 Loss: 0.02886348031461239
Epoch: 4375 Loss: 0.028829989954829216
Epoch: 4376 Loss: 0.02885289303958416
Epoch: 4377 Loss: 0.028825094923377037
Epoch: 4378 Loss: 0.028806529939174652
Epoch: 4379 Loss: 0.028808800503611565
Epoch: 4380 Loss: 0.02877996116876602
Epoch: 4381 Loss: 0.02877817675471306
Epoch: 4382 Loss: 0.028748977929353714
Epoch: 4383 Loss: 0.02876213751733303
Epoch: 4384 Loss: 0.028731053695082664
Epoch: 4385 Loss: 0.02874632366001606
Epoch: 4386 Loss: 0.02870936132967472
Epoch: 4387 Loss: 0.028700949624180794
Epoch: 4388 Loss: 0.028684353455901146
Epoch: 4389 Loss: 0.028681935742497444
Epoch: 4390 Loss: 0.028658859431743622
Epoch: 4391 Loss: 0.028644489124417305
Epoch: 4392 Loss: 0.02863495424389839
Epoch: 4393 Loss: 0.028632666915655136
Epoch: 4394 Loss: 0.028606900945305824
Epoch: 4395 Loss: 0.02859797701239586
Epoch: 4396 Loss: 0.028579341247677803
Epoch: 4397 Loss: 0.028568591922521

Epoch: 4584 Loss: 0.02650831826031208
Epoch: 4585 Loss: 0.026502549648284912
Epoch: 4586 Loss: 0.02648783102631569
Epoch: 4587 Loss: 0.026474803686141968
Epoch: 4588 Loss: 0.026474975049495697
Epoch: 4589 Loss: 0.026460379362106323
Epoch: 4590 Loss: 0.02644963562488556
Epoch: 4591 Loss: 0.026437778025865555
Epoch: 4592 Loss: 0.0264251958578825
Epoch: 4593 Loss: 0.026422036811709404
Epoch: 4594 Loss: 0.026402892544865608
Epoch: 4595 Loss: 0.026397066190838814
Epoch: 4596 Loss: 0.026393307372927666
Epoch: 4597 Loss: 0.02637767605483532
Epoch: 4598 Loss: 0.026388607919216156
Epoch: 4599 Loss: 0.026369476690888405
Epoch: 4600 Loss: 0.026348022744059563
Epoch: 4601 Loss: 0.026339272037148476
Epoch: 4602 Loss: 0.026331881061196327
Epoch: 4603 Loss: 0.026311229914426804
Epoch: 4604 Loss: 0.02630537562072277
Epoch: 4605 Loss: 0.02629939094185829
Epoch: 4606 Loss: 0.02627985179424286
Epoch: 4607 Loss: 0.026276525110006332
Epoch: 4608 Loss: 0.026269955560564995
Epoch: 4609 Loss: 0.02624501101672

Epoch: 4796 Loss: 0.024475684389472008
Epoch: 4797 Loss: 0.0244619008153677
Epoch: 4798 Loss: 0.024462934583425522
Epoch: 4799 Loss: 0.02443576604127884
Epoch: 4800 Loss: 0.024438951164484024
Epoch: 4801 Loss: 0.024418462067842484
Epoch: 4802 Loss: 0.02440616860985756
Epoch: 4803 Loss: 0.02440042607486248
Epoch: 4804 Loss: 0.02439405955374241
Epoch: 4805 Loss: 0.024389345198869705
Epoch: 4806 Loss: 0.02437085658311844
Epoch: 4807 Loss: 0.02437242679297924
Epoch: 4808 Loss: 0.0243550892919302
Epoch: 4809 Loss: 0.02434581331908703
Epoch: 4810 Loss: 0.024335499852895737
Epoch: 4811 Loss: 0.024330895394086838
Epoch: 4812 Loss: 0.024321621283888817
Epoch: 4813 Loss: 0.024315886199474335
Epoch: 4814 Loss: 0.02430216409265995
Epoch: 4815 Loss: 0.02429346740245819
Epoch: 4816 Loss: 0.02428961731493473
Epoch: 4817 Loss: 0.024273250252008438
Epoch: 4818 Loss: 0.024264460429549217
Epoch: 4819 Loss: 0.02426619827747345
Epoch: 4820 Loss: 0.02424491196870804
Epoch: 4821 Loss: 0.02424008399248123
Epo

Epoch: 5008 Loss: 0.022664718329906464
Epoch: 5009 Loss: 0.022655921056866646
Epoch: 5010 Loss: 0.022654902189970016
Epoch: 5011 Loss: 0.02263820916414261
Epoch: 5012 Loss: 0.022644851356744766
Epoch: 5013 Loss: 0.022620242089033127
Epoch: 5014 Loss: 0.022610072046518326
Epoch: 5015 Loss: 0.022608313709497452
Epoch: 5016 Loss: 0.022599443793296814
Epoch: 5017 Loss: 0.02258887328207493
Epoch: 5018 Loss: 0.02259071171283722
Epoch: 5019 Loss: 0.022572103887796402
Epoch: 5020 Loss: 0.022577723488211632
Epoch: 5021 Loss: 0.02255808562040329
Epoch: 5022 Loss: 0.02255302295088768
Epoch: 5023 Loss: 0.022560536861419678
Epoch: 5024 Loss: 0.022538594901561737
Epoch: 5025 Loss: 0.022539332509040833
Epoch: 5026 Loss: 0.022525422275066376
Epoch: 5027 Loss: 0.022525090724229813
Epoch: 5028 Loss: 0.022509397938847542
Epoch: 5029 Loss: 0.02250605635344982
Epoch: 5030 Loss: 0.022485975176095963
Epoch: 5031 Loss: 0.02248804271221161
Epoch: 5032 Loss: 0.022474220022559166
Epoch: 5033 Loss: 0.022475993260

Epoch: 5219 Loss: 0.02106330916285515
Epoch: 5220 Loss: 0.021058961749076843
Epoch: 5221 Loss: 0.021054577082395554
Epoch: 5222 Loss: 0.02103879675269127
Epoch: 5223 Loss: 0.0210349690169096
Epoch: 5224 Loss: 0.021032122895121574
Epoch: 5225 Loss: 0.02102215588092804
Epoch: 5226 Loss: 0.02101711370050907
Epoch: 5227 Loss: 0.021000072360038757
Epoch: 5228 Loss: 0.020994046702980995
Epoch: 5229 Loss: 0.02099994197487831
Epoch: 5230 Loss: 0.020983409136533737
Epoch: 5231 Loss: 0.020993608981370926
Epoch: 5232 Loss: 0.020975880324840546
Epoch: 5233 Loss: 0.020960833877325058
Epoch: 5234 Loss: 0.0209753829985857
Epoch: 5235 Loss: 0.020948203280568123
Epoch: 5236 Loss: 0.020940646529197693
Epoch: 5237 Loss: 0.020949769765138626
Epoch: 5238 Loss: 0.020928846672177315
Epoch: 5239 Loss: 0.020926978439092636
Epoch: 5240 Loss: 0.020917434245347977
Epoch: 5241 Loss: 0.020907577127218246
Epoch: 5242 Loss: 0.02090400457382202
Epoch: 5243 Loss: 0.020892206579446793
Epoch: 5244 Loss: 0.020883832126855

Epoch: 5431 Loss: 0.019630402326583862
Epoch: 5432 Loss: 0.01963251456618309
Epoch: 5433 Loss: 0.019622886553406715
Epoch: 5434 Loss: 0.019616374745965004
Epoch: 5435 Loss: 0.019615592435002327
Epoch: 5436 Loss: 0.019603539258241653
Epoch: 5437 Loss: 0.01960066705942154
Epoch: 5438 Loss: 0.019586961716413498
Epoch: 5439 Loss: 0.019593341276049614
Epoch: 5440 Loss: 0.019579291343688965
Epoch: 5441 Loss: 0.019574251025915146
Epoch: 5442 Loss: 0.019566316157579422
Epoch: 5443 Loss: 0.019557740539312363
Epoch: 5444 Loss: 0.01955810748040676
Epoch: 5445 Loss: 0.019545666873455048
Epoch: 5446 Loss: 0.019538583233952522
Epoch: 5447 Loss: 0.019535765051841736
Epoch: 5448 Loss: 0.019532736390829086
Epoch: 5449 Loss: 0.019522326067090034
Epoch: 5450 Loss: 0.0195145383477211
Epoch: 5451 Loss: 0.019507339224219322
Epoch: 5452 Loss: 0.01949773170053959
Epoch: 5453 Loss: 0.01949547789990902
Epoch: 5454 Loss: 0.019488176330924034
Epoch: 5455 Loss: 0.01948007382452488
Epoch: 5456 Loss: 0.0194825232028

Epoch: 5643 Loss: 0.018366504460573196
Epoch: 5644 Loss: 0.01836196519434452
Epoch: 5645 Loss: 0.01836235821247101
Epoch: 5646 Loss: 0.018352510407567024
Epoch: 5647 Loss: 0.018343111500144005
Epoch: 5648 Loss: 0.018339697271585464
Epoch: 5649 Loss: 0.01833920180797577
Epoch: 5650 Loss: 0.01832304708659649
Epoch: 5651 Loss: 0.018324173986911774
Epoch: 5652 Loss: 0.018320312723517418
Epoch: 5653 Loss: 0.018309645354747772
Epoch: 5654 Loss: 0.01830052211880684
Epoch: 5655 Loss: 0.018294520676136017
Epoch: 5656 Loss: 0.01829470694065094
Epoch: 5657 Loss: 0.018296385183930397
Epoch: 5658 Loss: 0.01827872544527054
Epoch: 5659 Loss: 0.018281789496541023
Epoch: 5660 Loss: 0.018271364271640778
Epoch: 5661 Loss: 0.01827244833111763
Epoch: 5662 Loss: 0.018257858231663704
Epoch: 5663 Loss: 0.018254686146974564
Epoch: 5664 Loss: 0.01824389211833477
Epoch: 5665 Loss: 0.018249809741973877
Epoch: 5666 Loss: 0.018236706033349037
Epoch: 5667 Loss: 0.018231716006994247
Epoch: 5668 Loss: 0.01822692900896

Epoch: 5854 Loss: 0.017235027626156807
Epoch: 5855 Loss: 0.017233042046427727
Epoch: 5856 Loss: 0.01722843572497368
Epoch: 5857 Loss: 0.017220064997673035
Epoch: 5858 Loss: 0.01721598580479622
Epoch: 5859 Loss: 0.017212482169270515
Epoch: 5860 Loss: 0.01720256917178631
Epoch: 5861 Loss: 0.017202502116560936
Epoch: 5862 Loss: 0.017199188470840454
Epoch: 5863 Loss: 0.017195727676153183
Epoch: 5864 Loss: 0.017191380262374878
Epoch: 5865 Loss: 0.017182299867272377
Epoch: 5866 Loss: 0.01717516779899597
Epoch: 5867 Loss: 0.017173875123262405
Epoch: 5868 Loss: 0.017168357968330383
Epoch: 5869 Loss: 0.017160281538963318
Epoch: 5870 Loss: 0.01715925894677639
Epoch: 5871 Loss: 0.01715238392353058
Epoch: 5872 Loss: 0.01714947074651718
Epoch: 5873 Loss: 0.017138788476586342
Epoch: 5874 Loss: 0.01713680848479271
Epoch: 5875 Loss: 0.017122777178883553
Epoch: 5876 Loss: 0.017125548794865608
Epoch: 5877 Loss: 0.017120348289608955
Epoch: 5878 Loss: 0.017120102420449257
Epoch: 5879 Loss: 0.0171088818460

Epoch: 6066 Loss: 0.016208970919251442
Epoch: 6067 Loss: 0.016211193054914474
Epoch: 6068 Loss: 0.01620197854936123
Epoch: 6069 Loss: 0.01620187610387802
Epoch: 6070 Loss: 0.016191786155104637
Epoch: 6071 Loss: 0.016184154897928238
Epoch: 6072 Loss: 0.016181642189621925
Epoch: 6073 Loss: 0.01618550345301628
Epoch: 6074 Loss: 0.016172295436263084
Epoch: 6075 Loss: 0.01617502048611641
Epoch: 6076 Loss: 0.0161634162068367
Epoch: 6077 Loss: 0.016164781525731087
Epoch: 6078 Loss: 0.01615702547132969
Epoch: 6079 Loss: 0.01615230180323124
Epoch: 6080 Loss: 0.016150273382663727
Epoch: 6081 Loss: 0.01614685356616974
Epoch: 6082 Loss: 0.016139307990670204
Epoch: 6083 Loss: 0.016134057193994522
Epoch: 6084 Loss: 0.016128214076161385
Epoch: 6085 Loss: 0.016126956790685654
Epoch: 6086 Loss: 0.016122017055749893
Epoch: 6087 Loss: 0.01611149124801159
Epoch: 6088 Loss: 0.016112226992845535
Epoch: 6089 Loss: 0.016112612560391426
Epoch: 6090 Loss: 0.016100358217954636
Epoch: 6091 Loss: 0.016102651134133

Epoch: 6277 Loss: 0.015293457545340061
Epoch: 6278 Loss: 0.015281787142157555
Epoch: 6279 Loss: 0.015279457904398441
Epoch: 6280 Loss: 0.015277737751603127
Epoch: 6281 Loss: 0.01527487114071846
Epoch: 6282 Loss: 0.015264777466654778
Epoch: 6283 Loss: 0.015266494825482368
Epoch: 6284 Loss: 0.015255718491971493
Epoch: 6285 Loss: 0.01525099016726017
Epoch: 6286 Loss: 0.015252144075930119
Epoch: 6287 Loss: 0.015254057012498379
Epoch: 6288 Loss: 0.01524109672755003
Epoch: 6289 Loss: 0.01523952092975378
Epoch: 6290 Loss: 0.015235040336847305
Epoch: 6291 Loss: 0.01522713154554367
Epoch: 6292 Loss: 0.015223617665469646
Epoch: 6293 Loss: 0.01522841677069664
Epoch: 6294 Loss: 0.015213387086987495
Epoch: 6295 Loss: 0.015218283049762249
Epoch: 6296 Loss: 0.015207178890705109
Epoch: 6297 Loss: 0.015199312940239906
Epoch: 6298 Loss: 0.015199176967144012
Epoch: 6299 Loss: 0.015200613997876644
Epoch: 6300 Loss: 0.015187595039606094
Epoch: 6301 Loss: 0.015187143348157406
Epoch: 6302 Loss: 0.01518675033

Epoch: 6488 Loss: 0.014441343024373055
Epoch: 6489 Loss: 0.014437396079301834
Epoch: 6490 Loss: 0.014440896920859814
Epoch: 6491 Loss: 0.014431515708565712
Epoch: 6492 Loss: 0.014426948502659798
Epoch: 6493 Loss: 0.014420179650187492
Epoch: 6494 Loss: 0.014427611604332924
Epoch: 6495 Loss: 0.014415138401091099
Epoch: 6496 Loss: 0.014410548843443394
Epoch: 6497 Loss: 0.014409594237804413
Epoch: 6498 Loss: 0.014405421912670135
Epoch: 6499 Loss: 0.014402884058654308
Epoch: 6500 Loss: 0.014394918456673622
Epoch: 6501 Loss: 0.014392444863915443
Epoch: 6502 Loss: 0.014389054849743843
Epoch: 6503 Loss: 0.01439059991389513
Epoch: 6504 Loss: 0.014384816400706768
Epoch: 6505 Loss: 0.014378857798874378
Epoch: 6506 Loss: 0.014374110847711563
Epoch: 6507 Loss: 0.014369262382388115
Epoch: 6508 Loss: 0.01436582487076521
Epoch: 6509 Loss: 0.014364433474838734
Epoch: 6510 Loss: 0.014360551722347736
Epoch: 6511 Loss: 0.014356265775859356
Epoch: 6512 Loss: 0.01435016468167305
Epoch: 6513 Loss: 0.01434578

Epoch: 6699 Loss: 0.013677502050995827
Epoch: 6700 Loss: 0.013670291751623154
Epoch: 6701 Loss: 0.013669594191014767
Epoch: 6702 Loss: 0.01366325281560421
Epoch: 6703 Loss: 0.013665442354977131
Epoch: 6704 Loss: 0.013658369891345501
Epoch: 6705 Loss: 0.013654530048370361
Epoch: 6706 Loss: 0.01365496776998043
Epoch: 6707 Loss: 0.013643614947795868
Epoch: 6708 Loss: 0.013641893863677979
Epoch: 6709 Loss: 0.013645777478814125
Epoch: 6710 Loss: 0.013638176023960114
Epoch: 6711 Loss: 0.013634759932756424
Epoch: 6712 Loss: 0.013629346154630184
Epoch: 6713 Loss: 0.013628237880766392
Epoch: 6714 Loss: 0.013624166138470173
Epoch: 6715 Loss: 0.013616535812616348
Epoch: 6716 Loss: 0.01361448410898447
Epoch: 6717 Loss: 0.013613563030958176
Epoch: 6718 Loss: 0.01361231692135334
Epoch: 6719 Loss: 0.013604959473013878
Epoch: 6720 Loss: 0.013606392778456211
Epoch: 6721 Loss: 0.013602356426417828
Epoch: 6722 Loss: 0.01359637826681137
Epoch: 6723 Loss: 0.013591623865067959
Epoch: 6724 Loss: 0.0135896690

Epoch: 6910 Loss: 0.012975716963410378
Epoch: 6911 Loss: 0.01297629065811634
Epoch: 6912 Loss: 0.012972857803106308
Epoch: 6913 Loss: 0.012969987466931343
Epoch: 6914 Loss: 0.012963468208909035
Epoch: 6915 Loss: 0.012964988127350807
Epoch: 6916 Loss: 0.012958734296262264
Epoch: 6917 Loss: 0.012955163605511189
Epoch: 6918 Loss: 0.012952114455401897
Epoch: 6919 Loss: 0.01295043807476759
Epoch: 6920 Loss: 0.012946098111569881
Epoch: 6921 Loss: 0.012940588407218456
Epoch: 6922 Loss: 0.012940720655024052
Epoch: 6923 Loss: 0.012936641462147236
Epoch: 6924 Loss: 0.01293251384049654
Epoch: 6925 Loss: 0.01293143816292286
Epoch: 6926 Loss: 0.012925473041832447
Epoch: 6927 Loss: 0.012924447655677795
Epoch: 6928 Loss: 0.012922566384077072
Epoch: 6929 Loss: 0.012920045293867588
Epoch: 6930 Loss: 0.012913551181554794
Epoch: 6931 Loss: 0.012911789119243622
Epoch: 6932 Loss: 0.01290962565690279
Epoch: 6933 Loss: 0.012906743213534355
Epoch: 6934 Loss: 0.012903264723718166
Epoch: 6935 Loss: 0.0128963869

Epoch: 7121 Loss: 0.012338750064373016
Epoch: 7122 Loss: 0.012336580082774162
Epoch: 7123 Loss: 0.012332262471318245
Epoch: 7124 Loss: 0.012327105738222599
Epoch: 7125 Loss: 0.012328977696597576
Epoch: 7126 Loss: 0.012323540635406971
Epoch: 7127 Loss: 0.012324693612754345
Epoch: 7128 Loss: 0.012317649088799953
Epoch: 7129 Loss: 0.012318364344537258
Epoch: 7130 Loss: 0.012310488149523735
Epoch: 7131 Loss: 0.012313244864344597
Epoch: 7132 Loss: 0.012305942364037037
Epoch: 7133 Loss: 0.012304417788982391
Epoch: 7134 Loss: 0.012300257571041584
Epoch: 7135 Loss: 0.012297755107283592
Epoch: 7136 Loss: 0.012294451706111431
Epoch: 7137 Loss: 0.012295006774365902
Epoch: 7138 Loss: 0.01229019183665514
Epoch: 7139 Loss: 0.012287821620702744
Epoch: 7140 Loss: 0.012283770367503166
Epoch: 7141 Loss: 0.012279285117983818
Epoch: 7142 Loss: 0.012281577102839947
Epoch: 7143 Loss: 0.012273618020117283
Epoch: 7144 Loss: 0.012273786589503288
Epoch: 7145 Loss: 0.012269181199371815
Epoch: 7146 Loss: 0.012266

Epoch: 7332 Loss: 0.011751516722142696
Epoch: 7333 Loss: 0.011749742552638054
Epoch: 7334 Loss: 0.01174484845250845
Epoch: 7335 Loss: 0.011746757663786411
Epoch: 7336 Loss: 0.011740592308342457
Epoch: 7337 Loss: 0.011739619076251984
Epoch: 7338 Loss: 0.011737870052456856
Epoch: 7339 Loss: 0.011731612496078014
Epoch: 7340 Loss: 0.011730342172086239
Epoch: 7341 Loss: 0.011728283949196339
Epoch: 7342 Loss: 0.011724792420864105
Epoch: 7343 Loss: 0.01172205526381731
Epoch: 7344 Loss: 0.01172073557972908
Epoch: 7345 Loss: 0.01171799749135971
Epoch: 7346 Loss: 0.01171465776860714
Epoch: 7347 Loss: 0.011712812818586826
Epoch: 7348 Loss: 0.011710315942764282
Epoch: 7349 Loss: 0.011705719865858555
Epoch: 7350 Loss: 0.011704226024448872
Epoch: 7351 Loss: 0.011701486073434353
Epoch: 7352 Loss: 0.011697920970618725
Epoch: 7353 Loss: 0.011694717220962048
Epoch: 7354 Loss: 0.011693190783262253
Epoch: 7355 Loss: 0.011689376085996628
Epoch: 7356 Loss: 0.011692219413816929
Epoch: 7357 Loss: 0.0116850044

Epoch: 7542 Loss: 0.011211843229830265
Epoch: 7543 Loss: 0.011212269775569439
Epoch: 7544 Loss: 0.01120558101683855
Epoch: 7545 Loss: 0.011205676943063736
Epoch: 7546 Loss: 0.011202927678823471
Epoch: 7547 Loss: 0.011200660839676857
Epoch: 7548 Loss: 0.011197700165212154
Epoch: 7549 Loss: 0.011198450811207294
Epoch: 7550 Loss: 0.011193188838660717
Epoch: 7551 Loss: 0.011190082877874374
Epoch: 7552 Loss: 0.011188091710209846
Epoch: 7553 Loss: 0.011184507049620152
Epoch: 7554 Loss: 0.01118476502597332
Epoch: 7555 Loss: 0.011182622984051704
Epoch: 7556 Loss: 0.011176747269928455
Epoch: 7557 Loss: 0.011175629682838917
Epoch: 7558 Loss: 0.011172179132699966
Epoch: 7559 Loss: 0.011173378676176071
Epoch: 7560 Loss: 0.011167734861373901
Epoch: 7561 Loss: 0.011166310869157314
Epoch: 7562 Loss: 0.011162039823830128
Epoch: 7563 Loss: 0.011161628179252148
Epoch: 7564 Loss: 0.011159946210682392
Epoch: 7565 Loss: 0.011155006475746632
Epoch: 7566 Loss: 0.0111548388376832
Epoch: 7567 Loss: 0.011152147

Epoch: 7752 Loss: 0.01071358472108841
Epoch: 7753 Loss: 0.010709492489695549
Epoch: 7754 Loss: 0.010707016102969646
Epoch: 7755 Loss: 0.0107034957036376
Epoch: 7756 Loss: 0.010702506639063358
Epoch: 7757 Loss: 0.01070021279156208
Epoch: 7758 Loss: 0.010701454244554043
Epoch: 7759 Loss: 0.0106947747990489
Epoch: 7760 Loss: 0.010692443698644638
Epoch: 7761 Loss: 0.010692368261516094
Epoch: 7762 Loss: 0.010689185000956059
Epoch: 7763 Loss: 0.010687173344194889
Epoch: 7764 Loss: 0.010682260617613792
Epoch: 7765 Loss: 0.010682106018066406
Epoch: 7766 Loss: 0.010679584927856922
Epoch: 7767 Loss: 0.01067686453461647
Epoch: 7768 Loss: 0.010676591657102108
Epoch: 7769 Loss: 0.01067190058529377
Epoch: 7770 Loss: 0.010670258663594723
Epoch: 7771 Loss: 0.01067042350769043
Epoch: 7772 Loss: 0.010665043257176876
Epoch: 7773 Loss: 0.010662705637514591
Epoch: 7774 Loss: 0.010659780353307724
Epoch: 7775 Loss: 0.010659157298505306
Epoch: 7776 Loss: 0.010658832266926765
Epoch: 7777 Loss: 0.01065355818718

Epoch: 7963 Loss: 0.010245437733829021
Epoch: 7964 Loss: 0.01024144422262907
Epoch: 7965 Loss: 0.01024196483194828
Epoch: 7966 Loss: 0.010238809511065483
Epoch: 7967 Loss: 0.010238075628876686
Epoch: 7968 Loss: 0.010234435088932514
Epoch: 7969 Loss: 0.010233636945486069
Epoch: 7970 Loss: 0.010229585692286491
Epoch: 7971 Loss: 0.010227646678686142
Epoch: 7972 Loss: 0.010225696489214897
Epoch: 7973 Loss: 0.010227516293525696
Epoch: 7974 Loss: 0.010223123244941235
Epoch: 7975 Loss: 0.010221190750598907
Epoch: 7976 Loss: 0.01021785568445921
Epoch: 7977 Loss: 0.010216020978987217
Epoch: 7978 Loss: 0.010214261710643768
Epoch: 7979 Loss: 0.01021112222224474
Epoch: 7980 Loss: 0.010209384374320507
Epoch: 7981 Loss: 0.010208181105554104
Epoch: 7982 Loss: 0.010206608101725578
Epoch: 7983 Loss: 0.01020297035574913
Epoch: 7984 Loss: 0.010200988501310349
Epoch: 7985 Loss: 0.010199165903031826
Epoch: 7986 Loss: 0.010198703035712242
Epoch: 7987 Loss: 0.010194633156061172
Epoch: 7988 Loss: 0.0101922387

Epoch: 8174 Loss: 0.009814353659749031
Epoch: 8175 Loss: 0.009812308475375175
Epoch: 8176 Loss: 0.009809087961912155
Epoch: 8177 Loss: 0.00980925653129816
Epoch: 8178 Loss: 0.009805773384869099
Epoch: 8179 Loss: 0.009803352877497673
Epoch: 8180 Loss: 0.009803073480725288
Epoch: 8181 Loss: 0.009801332838833332
Epoch: 8182 Loss: 0.009798821993172169
Epoch: 8183 Loss: 0.009796719066798687
Epoch: 8184 Loss: 0.009794076904654503
Epoch: 8185 Loss: 0.00979231670498848
Epoch: 8186 Loss: 0.009790013544261456
Epoch: 8187 Loss: 0.009789232164621353
Epoch: 8188 Loss: 0.009785255417227745
Epoch: 8189 Loss: 0.009786093607544899
Epoch: 8190 Loss: 0.009782669134438038
Epoch: 8191 Loss: 0.0097826411947608
Epoch: 8192 Loss: 0.009777930565178394
Epoch: 8193 Loss: 0.009777247905731201
Epoch: 8194 Loss: 0.009776277467608452
Epoch: 8195 Loss: 0.009772338904440403
Epoch: 8196 Loss: 0.009769017808139324
Epoch: 8197 Loss: 0.009767847135663033
Epoch: 8198 Loss: 0.00976862758398056
Epoch: 8199 Loss: 0.0097646294

Epoch: 8385 Loss: 0.009410257451236248
Epoch: 8386 Loss: 0.009408164769411087
Epoch: 8387 Loss: 0.009407280012965202
Epoch: 8388 Loss: 0.00940706767141819
Epoch: 8389 Loss: 0.009403713978827
Epoch: 8390 Loss: 0.009402341209352016
Epoch: 8391 Loss: 0.009399773553013802
Epoch: 8392 Loss: 0.00939889531582594
Epoch: 8393 Loss: 0.009396613575518131
Epoch: 8394 Loss: 0.009392747655510902
Epoch: 8395 Loss: 0.009393873624503613
Epoch: 8396 Loss: 0.009391027502715588
Epoch: 8397 Loss: 0.009388111531734467
Epoch: 8398 Loss: 0.009387001395225525
Epoch: 8399 Loss: 0.00938517414033413
Epoch: 8400 Loss: 0.009384898468852043
Epoch: 8401 Loss: 0.009381471201777458
Epoch: 8402 Loss: 0.009378673508763313
Epoch: 8403 Loss: 0.009377563372254372
Epoch: 8404 Loss: 0.009375834837555885
Epoch: 8405 Loss: 0.009374462068080902
Epoch: 8406 Loss: 0.009373062290251255
Epoch: 8407 Loss: 0.009369083680212498
Epoch: 8408 Loss: 0.009367423132061958
Epoch: 8409 Loss: 0.00936795398592949
Epoch: 8410 Loss: 0.009366708807

Epoch: 8596 Loss: 0.009039564989507198
Epoch: 8597 Loss: 0.009037237614393234
Epoch: 8598 Loss: 0.009034101851284504
Epoch: 8599 Loss: 0.00903293676674366
Epoch: 8600 Loss: 0.0090297507122159
Epoch: 8601 Loss: 0.009030956774950027
Epoch: 8602 Loss: 0.009028551168739796
Epoch: 8603 Loss: 0.009027065709233284
Epoch: 8604 Loss: 0.009025794453918934
Epoch: 8605 Loss: 0.009021949954330921
Epoch: 8606 Loss: 0.009020415134727955
Epoch: 8607 Loss: 0.00901717133820057
Epoch: 8608 Loss: 0.009017235599458218
Epoch: 8609 Loss: 0.009016643278300762
Epoch: 8610 Loss: 0.00901438295841217
Epoch: 8611 Loss: 0.009013702161610126
Epoch: 8612 Loss: 0.009010478854179382
Epoch: 8613 Loss: 0.009007961489260197
Epoch: 8614 Loss: 0.00900834146887064
Epoch: 8615 Loss: 0.009006131440401077
Epoch: 8616 Loss: 0.009004208259284496
Epoch: 8617 Loss: 0.009003369137644768
Epoch: 8618 Loss: 0.009001177735626698
Epoch: 8619 Loss: 0.008998207747936249
Epoch: 8620 Loss: 0.008996511809527874
Epoch: 8621 Loss: 0.00899466220

Epoch: 8807 Loss: 0.008691161870956421
Epoch: 8808 Loss: 0.008690773509442806
Epoch: 8809 Loss: 0.0086869727820158
Epoch: 8810 Loss: 0.008687221445143223
Epoch: 8811 Loss: 0.008685048669576645
Epoch: 8812 Loss: 0.008684951812028885
Epoch: 8813 Loss: 0.008681927807629108
Epoch: 8814 Loss: 0.008682546205818653
Epoch: 8815 Loss: 0.008678250014781952
Epoch: 8816 Loss: 0.008676857687532902
Epoch: 8817 Loss: 0.008675388991832733
Epoch: 8818 Loss: 0.008674715645611286
Epoch: 8819 Loss: 0.008673912845551968
Epoch: 8820 Loss: 0.008671076968312263
Epoch: 8821 Loss: 0.008669481612741947
Epoch: 8822 Loss: 0.00866849347949028
Epoch: 8823 Loss: 0.008665661327540874
Epoch: 8824 Loss: 0.008664622902870178
Epoch: 8825 Loss: 0.008663220331072807
Epoch: 8826 Loss: 0.008661550469696522
Epoch: 8827 Loss: 0.008659928105771542
Epoch: 8828 Loss: 0.008658980950713158
Epoch: 8829 Loss: 0.008656993508338928
Epoch: 8830 Loss: 0.008657140657305717
Epoch: 8831 Loss: 0.008653772063553333
Epoch: 8832 Loss: 0.00865108

Epoch: 9018 Loss: 0.008366095833480358
Epoch: 9019 Loss: 0.008365835063159466
Epoch: 9020 Loss: 0.008362245745956898
Epoch: 9021 Loss: 0.008363158442080021
Epoch: 9022 Loss: 0.008361408486962318
Epoch: 9023 Loss: 0.008359527215361595
Epoch: 9024 Loss: 0.008358237333595753
Epoch: 9025 Loss: 0.008355372585356236
Epoch: 9026 Loss: 0.00835416465997696
Epoch: 9027 Loss: 0.008353564888238907
Epoch: 9028 Loss: 0.008353703655302525
Epoch: 9029 Loss: 0.008350304327905178
Epoch: 9030 Loss: 0.008349190466105938
Epoch: 9031 Loss: 0.008347251452505589
Epoch: 9032 Loss: 0.008344486355781555
Epoch: 9033 Loss: 0.008344761095941067
Epoch: 9034 Loss: 0.008342038840055466
Epoch: 9035 Loss: 0.0083413515239954
Epoch: 9036 Loss: 0.008341345004737377
Epoch: 9037 Loss: 0.008338822983205318
Epoch: 9038 Loss: 0.008337332867085934
Epoch: 9039 Loss: 0.008334635756909847
Epoch: 9040 Loss: 0.008334430865943432
Epoch: 9041 Loss: 0.008331706747412682
Epoch: 9042 Loss: 0.008330631069839
Epoch: 9043 Loss: 0.00832854770

Epoch: 9229 Loss: 0.008065410889685154
Epoch: 9230 Loss: 0.008063126355409622
Epoch: 9231 Loss: 0.008062590844929218
Epoch: 9232 Loss: 0.008059917017817497
Epoch: 9233 Loss: 0.008057991042733192
Epoch: 9234 Loss: 0.008056676015257835
Epoch: 9235 Loss: 0.008056667633354664
Epoch: 9236 Loss: 0.008054149337112904
Epoch: 9237 Loss: 0.008053140714764595
Epoch: 9238 Loss: 0.008051207289099693
Epoch: 9239 Loss: 0.008051753975450993
Epoch: 9240 Loss: 0.008048572577536106
Epoch: 9241 Loss: 0.008048422634601593
Epoch: 9242 Loss: 0.008046719245612621
Epoch: 9243 Loss: 0.008043814450502396
Epoch: 9244 Loss: 0.008043681271374226
Epoch: 9245 Loss: 0.008042174391448498
Epoch: 9246 Loss: 0.008040567860007286
Epoch: 9247 Loss: 0.008041037246584892
Epoch: 9248 Loss: 0.008038198575377464
Epoch: 9249 Loss: 0.008037485182285309
Epoch: 9250 Loss: 0.008035843260586262
Epoch: 9251 Loss: 0.00803342740982771
Epoch: 9252 Loss: 0.008032134734094143
Epoch: 9253 Loss: 0.008033109828829765
Epoch: 9254 Loss: 0.008029

Epoch: 9440 Loss: 0.0077818105928599834
Epoch: 9441 Loss: 0.00778015935793519
Epoch: 9442 Loss: 0.007779161911457777
Epoch: 9443 Loss: 0.007778869941830635
Epoch: 9444 Loss: 0.0077768657356500626
Epoch: 9445 Loss: 0.00777604803442955
Epoch: 9446 Loss: 0.00777348130941391
Epoch: 9447 Loss: 0.007772673387080431
Epoch: 9448 Loss: 0.007770674768835306
Epoch: 9449 Loss: 0.007770088966935873
Epoch: 9450 Loss: 0.007769712246954441
Epoch: 9451 Loss: 0.007767428178340197
Epoch: 9452 Loss: 0.007767190225422382
Epoch: 9453 Loss: 0.007764569018036127
Epoch: 9454 Loss: 0.007764304522424936
Epoch: 9455 Loss: 0.00776159530505538
Epoch: 9456 Loss: 0.00776186166331172
Epoch: 9457 Loss: 0.007759660482406616
Epoch: 9458 Loss: 0.007759453728795052
Epoch: 9459 Loss: 0.0077573680318892
Epoch: 9460 Loss: 0.007756866980344057
Epoch: 9461 Loss: 0.0077538564801216125
Epoch: 9462 Loss: 0.007754669990390539
Epoch: 9463 Loss: 0.0077517107129096985
Epoch: 9464 Loss: 0.007750430144369602
Epoch: 9465 Loss: 0.00774921

Epoch: 9649 Loss: 0.007519821170717478
Epoch: 9650 Loss: 0.0075194574892520905
Epoch: 9651 Loss: 0.007516226731240749
Epoch: 9652 Loss: 0.007516443729400635
Epoch: 9653 Loss: 0.007514316122978926
Epoch: 9654 Loss: 0.007512956392019987
Epoch: 9655 Loss: 0.007511714473366737
Epoch: 9656 Loss: 0.007510228548198938
Epoch: 9657 Loss: 0.007510131224989891
Epoch: 9658 Loss: 0.0075102560222148895
Epoch: 9659 Loss: 0.007507494185119867
Epoch: 9660 Loss: 0.007507339119911194
Epoch: 9661 Loss: 0.007504912093281746
Epoch: 9662 Loss: 0.007502904161810875
Epoch: 9663 Loss: 0.007503639906644821
Epoch: 9664 Loss: 0.0075004748068749905
Epoch: 9665 Loss: 0.007500283420085907
Epoch: 9666 Loss: 0.0074988496489822865
Epoch: 9667 Loss: 0.0074975620955228806
Epoch: 9668 Loss: 0.007495954167097807
Epoch: 9669 Loss: 0.007496558595448732
Epoch: 9670 Loss: 0.0074934023432433605
Epoch: 9671 Loss: 0.007493134122341871
Epoch: 9672 Loss: 0.007492254488170147
Epoch: 9673 Loss: 0.0074903578497469425
Epoch: 9674 Loss: 

Epoch: 9858 Loss: 0.007271279580891132
Epoch: 9859 Loss: 0.007270950824022293
Epoch: 9860 Loss: 0.007270221132785082
Epoch: 9861 Loss: 0.0072693112306296825
Epoch: 9862 Loss: 0.007266562897711992
Epoch: 9863 Loss: 0.00726598035544157
Epoch: 9864 Loss: 0.0072646550834178925
Epoch: 9865 Loss: 0.0072630527429282665
Epoch: 9866 Loss: 0.007262321189045906
Epoch: 9867 Loss: 0.00726131210103631
Epoch: 9868 Loss: 0.007261947728693485
Epoch: 9869 Loss: 0.007259030360728502
Epoch: 9870 Loss: 0.0072572557255625725
Epoch: 9871 Loss: 0.007256685756146908
Epoch: 9872 Loss: 0.007256521377712488
Epoch: 9873 Loss: 0.007253957912325859
Epoch: 9874 Loss: 0.0072546727024018764
Epoch: 9875 Loss: 0.007252660114318132
Epoch: 9876 Loss: 0.007251051254570484
Epoch: 9877 Loss: 0.007249149028211832
Epoch: 9878 Loss: 0.0072498260997235775
Epoch: 9879 Loss: 0.007247134577482939
Epoch: 9880 Loss: 0.007246037945151329
Epoch: 9881 Loss: 0.007246627006679773
Epoch: 9882 Loss: 0.007244156673550606
Epoch: 9883 Loss: 0.0

最后我们用训练好的模型尝试在1到100这些数字上玩FizzBuzz游戏

In [50]:
# Output now
testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)])
with torch.no_grad():
    testY = model(testX)
predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist()))

print([fizz_buzz_decode(i, x) for (i, x) in predictions])

['fizzbuzz', '2', 'fizz', 'buzz', 'buzz', 'fizz', '7', '8', '9', 'buzz', '11', '12', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19', 'buzz', 'fizz', '22', '23', '24', 'buzz', '26', 'fizz', '28', '29', 'fizzbuzz', '31', '32', 'fizz', 'buzz', 'buzz', 'fizz', '37', '38', 'fizz', 'buzz', '41', 'fizz', '43', '44', 'fizzbuzz', '46', '47', 'fizz', '49', 'buzz', 'fizz', '52', '53', 'fizz', 'buzz', '56', 'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', 'buzz', 'buzz', 'fizz', '67', '68', 'fizz', 'buzz', '71', 'fizz', '73', '74', 'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', '81', '82', '83', 'fizz', 'buzz', '86', '87', '88', '89', 'fizzbuzz', '91', '92', 'fizz', '94', 'buzz', 'fizz', '97', '98', 'fizz', 'buzz']


In [51]:
print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])))
testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])

91


array([False,  True,  True, False,  True,  True,  True,  True, False,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True, False,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

## CNN-Image-Classification  

参考资料
- [Stanford CS231n](http://cs231n.github.io/convolutional-networks/)
- [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
- [VGG](https://arxiv.org/pdf/1409.1556.pdf)
- [ResNet](https://arxiv.org/pdf/1512.03385.pdf)
- [DenseNet](https://arxiv.org/pdf/1608.06993.pdf)

In [52]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
# torchvision是独立于pytorch的关于图像操作的一些方便工具库。
# torchvision的详细介绍在：https://pypi.org/project/torchvision/0.1.8/
# torchvision主要包括一下几个包：
# vision.datasets : 几个常用视觉数据集，可以下载和加载
# vision.models : 流行的模型，例如 AlexNet, VGG, and ResNet 以及 与训练好的参数。
# vision.transforms : 常用的图像操作，例如：随机切割，旋转等。
# vision.utils : 用于把形似 (3 x H x W) 的张量保存到硬盘中，给一个mini-batch的图像可以产生一个图像格网。

print("PyTorch Version: ",torch.__version__)

PyTorch Version:  1.4.0


### 加载数据

In [54]:
torch.manual_seed(53113)  #cpu随机种子

#没gpu下面可以忽略
use_cuda = torch.cuda.is_available()  
device = torch.device("cuda" if use_cuda else "cpu")  
batch_size = test_batch_size = 32  
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}


#torch.utils.data.DataLoader在训练模型时使用到此函数，用来把训练数据分成多个batch，
#此函数每次抛出一个batch数据，直至把所有的数据都抛出，也就是个数据迭代器。
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', 
                   train=True, #如果true，从training.pt创建数据集
                   download=True, #如果ture，从网上自动下载
                   
#transform 接受一个图像返回变换后的图像的函数，相当于图像先预处理下
#常用的操作如 ToTensor, RandomCrop，Normalize等. 
#他们可以通过transforms.Compose被组合在一起 
                   transform=transforms.Compose([
                       
                       transforms.ToTensor(), 
#.ToTensor()将shape为(H, W, C)的nump.ndarray或img转为shape为(C, H, W)的tensor，
#其将每一个数值归一化到[0,1]，其归一化方法比较简单，直接除以255即可。
                       
                       transforms.Normalize((0.1307,), (0.3081,)) # 所有图片像素均值和方差
#.Normalize作用就是.ToTensor将输入归一化到(0,1)后，再使用公式”(x-mean)/std”，将每个元素分布到(-1,1)  
                   ])), # 第一个参数dataset：数据集
    batch_size=batch_size, 
    shuffle=True,  #随机打乱数据
    **kwargs)##kwargs是上面gpu的设置
  

# 测试数据集
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', 
                   train=False, #如果False，从test.pt创建数据集
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=test_batch_size, 
    shuffle=True, 
    **kwargs)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./mnist_data\MNIST\raw\train-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./mnist_data\MNIST\raw\train-images-idx3-ubyte.gz to ./mnist_data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./mnist_data\MNIST\raw\train-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./mnist_data\MNIST\raw\train-labels-idx1-ubyte.gz to ./mnist_data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./mnist_data\MNIST\raw\t10k-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./mnist_data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./mnist_data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./mnist_data\MNIST\raw\t10k-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./mnist_data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./mnist_data\MNIST\raw
Processing...
Done!


### 定义CNN模型

首先我们定义一个基于ConvNet的简单神经网络

In [53]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)  # 28*28的图像, kernel大小是5,步长是1,那么移动了28+1-5次, 产生24*24的特征
        #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1)
        #in_channels：输入图像通道数，手写数字图像为1，彩色图像为3
        #out_channels：输出通道数，这个等于卷积核的数量
        #kernel_size：卷积核大小
        #stride：步长
         
        self.conv2 = nn.Conv2d(20, 50, 5, 1) # 24*24的特征, kernel大小是5,步长是1,那么移动了24+1-5次, 产生20*20的特征
        #上个卷积网络的out_channels，就是下一个网络的in_channels，所以这里是20
        #out_channels：卷积核数量50
        
        
        self.fc1 = nn.Linear(4*4*50, 500)    # inf_features*out_features就是全连接层做矩阵乘法的右边矩阵
        #全连接层torch.nn.Linear(in_features, out_features)
        #in_features:输入特征维度，4*4*50是自己算出来的，跟输入图像维度有关
        #out_features；输出特征维度
        
        self.fc2 = nn.Linear(500, 10)
        #输出维度10，10分类

    def forward(self, x):  
        #print(x.shape)  #手写数字的输入维度，(N,1,28,28), N为batch_size, 1为单通道灰度图像, 分辨率是28*28
        x = F.relu(self.conv1(x)) # x = (N,20,24,24), 5*5的卷积核, 产生特征图是24*24
        x = F.max_pool2d(x, 2, 2) # x = (N,20,12,12), 2*2的maxpool, 产生downsampling特征图是12*12
        x = F.relu(self.conv2(x)) # x = (N,50,8,8), 5*5的卷积核, 产生特征图是(12+1-5)*(12+1-5)=8*8
        x = F.max_pool2d(x, 2, 2) # x = (N,50,4,4), 2*2的maxpool, 产生downsampling特征图是4*4
        x = x.view(-1, 4*4*50)    # x = (N,4*4*50), 将N*50*4*4的四维矩阵转为N*(4*4*50)的二维矩阵
        x = F.relu(self.fc1(x))   # x = (N,4*4*50)*(4*4*50, 500)=(N,500), 全连接层, 就是矩阵乘法
        x = self.fc2(x)           # x = (N,500)*(500, 10)=(N,10)
        return F.log_softmax(x, dim=1)  # 带log的softmax分类，每张图片返回10个概率

### 初始化模型和定义优化函数

In [55]:
lr = 0.01
momentum = 0.5

#模型初始化, 将model放到device上
model = Net().to(device) 

#定义优化器
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum) 

NLL loss的定义

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
        l_n = - w_{y_n} x_{n,y_n}, \quad
        w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\}$

### 定义训练和测试模型

In [56]:
def train(model, device, train_loader, optimizer, epoch, log_interval=100):
    model.train() # 进入训练模式
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)  # data和target放到GPU上
        optimizer.zero_grad() # 梯度归零
        output = model(data)  # 输出的维度[N,10] 这里的data是函数的forward参数x
        loss = F.nll_loss(output, target) #这里loss求的是平均数，除以了batch
#F.nll_loss(F.log_softmax(input), target) ：
#单分类交叉熵损失函数，一张图片里只能有一个类别，输入input的需要softmax
#还有一种是多分类损失函数，一张图片有多个类别，输入的input需要sigmoid
        
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print("Train Epoch: {} [{}/{} ({:0f}%)]\tLoss: {:.6f}".format(
                epoch, 
                batch_idx * len(data), #100*32
                len(train_loader.dataset), #60000
                100. * batch_idx / len(train_loader), #len(train_loader)=60000/32=1875
                loss.item()
            ))
            #print(len(train_loader))


In [57]:
def test(model, device, test_loader):
    model.eval() #进入测试模式
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data) 
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            #reduction='sum'代表batch的每个元素loss累加求和，默认是mean求平均
                       
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            
            #print(target.shape) #torch.Size([32])
            #print(pred.shape) #torch.Size([32, 1])
            correct += pred.eq(target.view_as(pred)).sum().item()
            #pred和target的维度不一样
            #pred.eq()相等返回1，不相等返回0，返回的tensor维度(32，1)。

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

### 查看运行结果

In [58]:
epochs = 2
for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

save_model = True
if (save_model):
    torch.save(model.state_dict(),"mnist_cnn.pt")   #词典格式，model.state_dict()只保存模型参数
    


Test set: Average loss: 0.0657, Accuracy: 9794/10000 (98%)


Test set: Average loss: 0.0463, Accuracy: 9852/10000 (99%)



In [59]:
#同上
torch.manual_seed(53113)

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
batch_size = test_batch_size = 32
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.FashionMNIST('./fashion_mnist_data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,)) 
                   ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.FashionMNIST('./fashion_mnist_data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=test_batch_size, shuffle=True, **kwargs)


lr = 0.01
momentum = 0.5
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

epochs = 2
for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

save_model = True
if (save_model):
    torch.save(model.state_dict(),"fashion_mnist_cnn.pt")

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw\train-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./fashion_mnist_data\FashionMNIST\raw\train-images-idx3-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./fashion_mnist_data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./fashion_mnist_data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./fashion_mnist_data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to ./fashion_mnist_data\FashionMNIST\raw
Processing...
Done!


Test set: Average loss: 0.4423, Accuracy: 8379/10000 (84%)


Test set: Average loss: 0.3729, Accuracy: 8643/10000 (86%)



## CNN模型的Transfer Learning (迁移学习) 

- 很多时候当我们需要训练一个新的图像分类任务，我们不会完全从一个随机的模型开始训练，而是利用_预训练_的模型来加速训练的过程。我们经常使用在`ImageNet`上的预训练模型。
- 这是一种transfer learning的方法。我们常用以下两种方法做迁移学习。
    - fine tuning: 从一个预训练模型开始，我们改变一些模型的架构，然后继续训练整个模型的参数。
    - feature extraction: 我们不再改变与训练模型的参数，而是只更新我们改变过的部分模型参数。我们之所以叫它feature extraction是因为我们把预训练的CNN模型当做一个特征提取模型，利用提取出来的特征做来完成我们的训练任务。
    
以下是构建和训练迁移学习模型的基本步骤：
- 初始化预训练模型
- 把最后一层的输出层改变成我们想要分的类别总数
- 定义一个optimizer来更新参数
- 模型训练

In [None]:
import numpy as np
import torchvision
from torchvision import datasets, transforms, models

import matplotlib.pyplot as plt
import time
import os
import copy
print("Torchvision Version: ",torchvision.__version__)

### 数据

我们会使用*hymenoptera_data*数据集，[下载](https://download.pytorch.org/tutorial/hymenoptera_data.zip).

这个数据集包括两类图片, **bees** 和 **ants**, 这些数据都被处理成了可以使用`ImageFolder <https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.ImageFolder>`来读取的格式。我们只需要把``data_dir``设置成数据的根目录，然后把``model_name``设置成我们想要使用的与训练模型：
::
   [resnet, alexnet, vgg, squeezenet, densenet, inception]

其他的参数有：
- ``num_classes``表示数据集分类的类别数
- ``batch_size``
- ``num_epochs``
- ``feature_extract``表示我们训练的时候使用fine tuning还是feature extraction方法。如果``feature_extract = False``，整个模型都会被同时更新。如果``feature_extract = True``，只有模型的最后一层被更新。

### 查看数据

In [None]:
# Top level data directory. Here we assume the format of the directory conforms 
#   to the ImageFolder structure
data_dir = "./hymenoptera_data"
# Batch size for training (change depending on how much memory you have)
batch_size = 32


#蜜蜂和蚂蚁数据集不会自动下载，请到群文件下载，并放在当前代码目录下
#os.path.join() 连接路径，相当于.../data_dir/train
all_imgs = datasets.ImageFolder(os.path.join(data_dir, "train"),
                                transforms.Compose([
        transforms.RandomResizedCrop(input_size), #把每张图片变成resnet需要输入的维度224
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
    ]))
loader = torch.utils.data.DataLoader(all_imgs, batch_size=batch_size, shuffle=True, num_workers=4)
#训练数据分batch，变成tensor迭代器

In [None]:
img = next(iter(loader))[0] #这个img是一个batch的tensor

In [None]:
img.shape

In [None]:
unloader = transforms.ToPILImage()  # reconvert into PIL image
#transforms：torchvision的子模块，常用的图像操作
#.ToPILImage() 把tensor或数组转换成图像
#详细转换过程可以看这个：https://blog.csdn.net/qq_37385726/article/details/81811466

plt.ion() #交互模式，默认是交互模式，可以不写
#详细了解看这个：https://blog.csdn.net/SZuoDao/article/details/52973621
#plt.ioff()

def imshow(tensor, title=None):
    image = tensor.cpu().clone()  # we clone the tensor to not do changes on it
    image = image.squeeze(0)      # remove the fake batch dimension 
    #这个.squeeze(0)看不懂，去掉也可以运行
    
    image = unloader(image) #tensor转换成图像
    plt.imshow(image)
    if title is not None:
        plt.title(title)
    plt.pause(1) # pause a bit so that plots are updated
    #可以去掉看看，只是延迟显示作用


plt.figure()
imshow(img[8], title='Image') 
imshow(img[9], title='Image')
imshow(img[10], title='Image')

### 把训练集和验证集分batch转换成迭代器

现在我们知道了模型输入的size，我们就可以把数据预处理成相应的格式。

In [None]:
data_transforms = {
    "train": transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    "val": transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")


# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}
#把迭代器存放到字典里作为value，key是train和val，后面调用key即可。

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
inputs, labels=next(iter(dataloaders_dict["train"])) #一个batch
print(inputs.shape)
print(labels)

In [None]:
for inputs, labels in dataloaders_dict["train"]:
    #print(inputs)
    #print(labels)
    print(labels.size()) #最后一个batch不足32

### 加载resnet模型并修改全连接层

In [None]:
# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "resnet"
# Number of classes in the dataset
num_classes = 2
# Number of epochs to train for 
num_epochs = 2
# Flag for feature extracting. When False, we finetune the whole model, 
#   when True we only update the reshaped layer params
feature_extract = True  #只更新修改的层

In [None]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False #提取的参数梯度不更新

In [None]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    if model_name == "resnet":
        model_ft = models.resnet18(pretrained=use_pretrained) 
        #如果True，从imagenet上返回预训练的模型和参数
        
        set_parameter_requires_grad(model_ft, feature_extract)#提取的参数梯度不更新
        #print(model_ft) 可以打印看下
        num_ftrs = model_ft.fc.in_features 
        #model_ft.fc是resnet的最后全连接层
        #(fc): Linear(in_features=512, out_features=1000, bias=True)
        #in_features 是全连接层的输入特征维度
        #print(num_ftrs)
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        #out_features=1000 改为 num_classes=2
        input_size = 224 #resnet18网络输入图片维度是224，resnet34，50，101，152也是
        
    return model_ft, input_size
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)
print(model_ft)

### 查看需要更新的参数、定义优化器

In [None]:
next(iter(model_ft.named_parameters()))

In [None]:
len(next(iter(model_ft.named_parameters()))) #是元组，只有两个值

In [None]:
for name,param in model_ft.named_parameters():
    print(name) #看下都有哪些参数

In [None]:
# Send the model to GPU
model_ft = model_ft.to(device)

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are 
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.
params_to_update = model_ft.parameters() #需要更新的参数
print("Params to learn:")
if feature_extract:
    params_to_update = [] #需要更新的参数存放在此
    for name,param in model_ft.named_parameters(): 
        #model_ft.named_parameters()有啥看上面cell
        if param.requires_grad == True: 
#这里要知道全连接层之前的层param.requires_grad == Flase
#后面加的全连接层param.requires_grad == True
            params_to_update.append(param)
            print("\t",name)
else: #否则，所有的参数都会更新
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9) #定义优化器
# Setup the loss fxn
criterion = nn.CrossEntropyLoss() #定义损失函数

### 定义训练模型

In [None]:
#训练测试合一起了
def train_model(model, dataloaders, criterion, optimizer, num_epochs=5):
    since = time.time()
    val_acc_history = [] 
    best_model_wts = copy.deepcopy(model.state_dict())#深拷贝上面resnet模型参数
#.copy和.deepcopy区别看这个：https://blog.csdn.net/u011630575/article/details/78604226 
    best_acc = 0.
    for epoch in range(num_epochs):
        print("Epoch {}/{}".format(epoch, num_epochs-1))
        print("-"*10)
        
        for phase in ["train", "val"]:
            running_loss = 0.
            running_corrects = 0.
            if phase == "train":
                model.train()
            else: 
                model.eval()
            
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                with torch.autograd.set_grad_enabled(phase=="train"):
                    #torch.autograd.set_grad_enabled梯度管理器，可设置为打开或关闭
                    #phase=="train"是True和False，双等号要注意
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
                    
                _, preds = torch.max(outputs, 1)
                #返回每一行最大的数和索引，prds的位置是索引的位置
                #也可以preds = outputs.argmax(dim=1)
                if phase == "train":
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                    
                running_loss += loss.item() * inputs.size(0) #交叉熵损失函数是平均过的
                running_corrects += torch.sum(preds.view(-1) == labels.view(-1)).item()
                #.view(-1)展开到一维，并自己计算
            
            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects / len(dataloaders[phase].dataset)
       
            print("{} Loss: {} Acc: {}".format(phase, epoch_loss, epoch_acc))
            if phase == "val" and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                #模型变好，就拷贝更新后的模型参数
                
            if phase == "val":
                val_acc_history.append(epoch_acc) #记录每个epoch验证集的准确率
            
        print()
    
    time_elapsed = time.time() - since
    print("Training compete in {}m {}s".format(time_elapsed // 60, time_elapsed % 60))
    print("Best val Acc: {}".format(best_acc))
    
    model.load_state_dict(best_model_wts) #把最新的参数复制到model中
    return model, val_acc_history

### 运行模型

In [None]:
# Train and evaluate
model_ft, ohist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs)

In [None]:
ohist

In [None]:
model_ft

In [None]:
# Initialize the non-pretrained version of the model used for this run
scratch_model,_ = initialize_model(model_name, 
                                   num_classes, 
                                   feature_extract=False, #所有参数都训练
                                   use_pretrained=False)# 不要imagenet的参数
scratch_model = scratch_model.to(device)
scratch_optimizer = optim.SGD(scratch_model.parameters(), 
                              lr=0.001, momentum=0.9)
scratch_criterion = nn.CrossEntropyLoss()
_,scratch_hist = train_model(scratch_model, 
                             dataloaders_dict, 
                             scratch_criterion, 
                             scratch_optimizer, 
                             num_epochs=num_epochs)

In [None]:
# Plot the training curves of validation accuracy vs. number 
#  of training epochs for the transfer learning method and
#  the model trained from scratch
# ohist = []
# shist = []

# ohist = [h.cpu().numpy() for h in ohist]
# shist = [h.cpu().numpy() for h in scratch_hist]

plt.title("Validation Accuracy vs. Number of Training Epochs")
plt.xlabel("Training Epochs")
plt.ylabel("Validation Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Pretrained")
plt.plot(range(1,num_epochs+1),scratch_hist,label="Scratch")
plt.ylim((0,1.))
plt.xticks(np.arange(1, num_epochs+1, 1.0))
plt.legend()
plt.show()