### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [8]:
import torch
device = torch.device('cpu')

In [14]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
w1 = torch.rand((D_in, H)).to(device)
w2 = torch.rand((H, D_out)).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  #y_pred = torch.matmul(torch.relu(torch.matmul(x, w1)), w2)
  h = torch.matmul(x, w1)   
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, w2)  
    
  # 計算loss
  loss = torch.square(y_pred - y).sum()
  print('step:', t,'  loss:' , loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  y_pred_grad = 2. * (y_pred - y) 
  w2_grad = h_relu.T.matmul(y_pred_grad) 
  h_grad = y_pred_grad.matmul(w2.T) * (h > 0.)
  w1_grad = x.T.matmul(h_grad)
    
  # 參數更新
  w1.data = w1.data - learning_rate * w1_grad
  w2.data = w2.data - learning_rate * w2_grad

step: 0   loss: 169438688.0
step: 1   loss: 7283528.5
step: 2   loss: 2385691.5
step: 3   loss: 1025069.6875
step: 4   loss: 503857.65625
step: 5   loss: 281201.59375
step: 6   loss: 180641.875
step: 7   loss: 133140.0625
step: 8   loss: 109414.109375
step: 9   loss: 96563.8828125
step: 10   loss: 88776.390625
step: 11   loss: 83405.3359375
step: 12   loss: 79240.2734375
step: 13   loss: 75722.40625
step: 14   loss: 72581.8203125
step: 15   loss: 69694.78125
step: 16   loss: 66998.4375
step: 17   loss: 64459.4921875
step: 18   loss: 62057.34375
step: 19   loss: 59778.7421875
step: 20   loss: 57613.35546875
step: 21   loss: 55553.7265625
step: 22   loss: 53593.515625
step: 23   loss: 51726.6484375
step: 24   loss: 49947.49609375
step: 25   loss: 48251.23828125
step: 26   loss: 46633.66796875
step: 27   loss: 45090.2734375
step: 28   loss: 43615.72265625
step: 29   loss: 42206.4921875
step: 30   loss: 40858.8125
step: 31   loss: 39569.96875
step: 32   loss: 38337.59375
step: 33   loss: 3

step: 439   loss: 659.8140869140625
step: 440   loss: 657.5277099609375
step: 441   loss: 655.2560424804688
step: 442   loss: 652.9986572265625
step: 443   loss: 650.7556762695312
step: 444   loss: 648.527587890625
step: 445   loss: 646.3128051757812
step: 446   loss: 644.11279296875
step: 447   loss: 641.9264526367188
step: 448   loss: 639.7538452148438
step: 449   loss: 637.5944213867188
step: 450   loss: 635.4494018554688
step: 451   loss: 633.3182373046875
step: 452   loss: 631.1998291015625
step: 453   loss: 629.0948486328125
step: 454   loss: 627.0020141601562
step: 455   loss: 624.9228515625
step: 456   loss: 622.856201171875
step: 457   loss: 620.802978515625
step: 458   loss: 618.7622680664062
step: 459   loss: 616.73388671875
step: 460   loss: 614.7181396484375
step: 461   loss: 612.7149047851562
step: 462   loss: 610.723388671875
step: 463   loss: 608.7442016601562
step: 464   loss: 606.7772827148438
step: 465   loss: 604.8221435546875
step: 466   loss: 602.8787841796875
ste

### 使用Pytorch的Autograd

In [None]:
import torch
device = torch.device('cpu')

In [None]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
w1 = torch.randn((D_in, H), requires_grad=True).to(device)
w2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
  # 向前傳遞: 計算y_pred
  h = torch.matmul(x, w1)
  h_relu = torch.relu(h)
  y_pred = torch.matmul(h_relu, w2)
  
  # 計算loss
  loss = torch.square(y_pred - y).sum() 
  print('step:',t,' loss' ,loss.item())

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    w1.data = w1

    # 將紀錄的gradient清空(因為已經更新參數)
    w1.grad.zero_()
    w2.grad.zero_()

0 34531528.0
1 37563884.0
2 45605692.0
3 47743296.0
4 37549408.0
5 19983466.0
6 8218459.0
7 3358081.75
8 1769839.75
9 1200070.875
10 933849.8125
11 768861.875
12 647828.125
13 552127.8125
14 474321.3125
15 409854.5
16 355987.53125
17 310554.46875
18 271993.3125
19 239127.03125
20 210943.40625
21 186672.421875
22 165673.984375
23 147426.53125
24 131516.640625
25 117607.96875
26 105414.3125
27 94697.796875
28 85241.1640625
29 76882.578125
30 69470.8125
31 62886.90625
32 57027.1640625
33 51796.1875
34 47115.85546875
35 42919.08203125
36 39149.59375
37 35757.9609375
38 32700.904296875
39 29940.25390625
40 27445.466796875
41 25186.919921875
42 23138.40234375
43 21277.962890625
44 19586.13671875
45 18047.291015625
46 16644.80859375
47 15364.494140625
48 14194.1484375
49 13123.5673828125
50 12142.908203125
51 11244.23046875
52 10419.24609375
53 9661.384765625
54 8964.6005859375
55 8323.494140625
56 7732.97802734375
57 7188.859375
58 6686.65673828125
59 6223.0908203125
60 5794.66650390625
61 5

431 0.00015894130046945065
432 0.00015560245083179325
433 0.00015244056703522801
434 0.00014918063243385404
435 0.00014623792958445847
436 0.00014322339848149568
437 0.0001401646004524082
438 0.0001371508842566982
439 0.00013460413902066648
440 0.00013200324610807002
441 0.00012920792505610734
442 0.0001268077758140862
443 0.0001243362348759547
444 0.00012227214756421745
445 0.00011963630095124245
446 0.00011724793148459867
447 0.00011501400149427354
448 0.00011302570783300325
449 0.00011090389307355508
450 0.00010875487350858748
451 0.00010680056584533304
452 0.00010480251512490213
453 0.00010332858073525131
454 0.00010135072079719976
455 9.952658729162067e-05
456 9.77267773123458e-05
457 9.615623275749385e-05
458 9.459262946620584e-05
459 9.324532584287226e-05
460 9.124297503149137e-05
461 8.984180021798238e-05
462 8.840537338983268e-05
463 8.65921683725901e-05
464 8.517434616805986e-05
465 8.388777496293187e-05
466 8.267535304185003e-05
467 8.130612695822492e-05
468 8.00859561422839