### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [51]:
import torch
import time
device = torch.device('cuda')

In [57]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in,device=device)
y = torch.randn(N, D_out,device=device)
# 初始化weight W1, W2
W1 = torch.randn(D_in, H, device=device)
W2 = torch.randn(H, D_out, device=device)

# 設置learning rate
lr = 1e-6
t1 = time.time()
# 訓練500個epoch
for t in range(501):
  # 向前傳遞: 計算y_pred
  h = x.mm(W1)
  y_pred= h.mm(W2)

  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  if t % 50 == 0:
    print(f'round {t}, loss: {loss.item()}')

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h.t().mm(grad_y_pred)
  grad_h = grad_y_pred.mm(W2.t())
  grad_w1 = x.t().mm(grad_h)

  # 參數更新
  W1 = W1 - lr * grad_w1
  W2 = W2 - lr * grad_w2
print(f'Takes {round(time.time() - t1,2)}s to finish')

round 0, loss: 71942704.0
round 50, loss: 140.9893035888672
round 100, loss: 0.34629976749420166
round 150, loss: 0.0014960408443585038
round 200, loss: 4.160517346463166e-05
round 250, loss: 8.722498932911549e-06
round 300, loss: 3.6732722037413623e-06
round 350, loss: 2.1441210265038535e-06
round 400, loss: 1.4016502518643392e-06
round 450, loss: 9.892359003060847e-07
round 500, loss: 7.624586828569591e-07
Takes 0.33s to finish


### 使用Pytorch的Autograd

In [53]:
import torch
device = torch.device('cpu')

In [55]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn(N, D_in,device=device)
y = torch.randn(N, D_out,device=device)
# 初始化weight W1, W2
W1 = torch.randn(D_in, H, device=device, requires_grad=True)
W2 = torch.randn(H, D_out, device=device, requires_grad=True)

# 設置learning rate
lr = 1e-6
t1 = time.time()
# 訓練500個epoch
for t in range(501):
  # 向前傳遞: 計算y_pred
  y_pred = x.mm(W1).clamp(min=0).mm(W2)
  
  # 計算loss
  loss = (y_pred - y).pow(2).sum()
  if t % 50 == 0:
    print(f'round {t}, loss: {loss.item()}')

  # 倒傳遞: 計算W1與W2對loss的微分(梯度)
  loss.backward()

  # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
  with torch.no_grad():
    # 更新參數W1 W2
    W1 -= lr * W1.grad
    W2 -= lr * W2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
    W1.grad.zero_()
    W2.grad.zero_()
print(f'Takes {round(time.time() - t1,2)}s to finish')

round 0, loss: 35622304.0
round 50, loss: 14778.615234375
round 100, loss: 373.82574462890625
round 150, loss: 13.941972732543945
round 200, loss: 0.6314912438392639
round 250, loss: 0.03267338126897812
round 300, loss: 0.002095188247039914
round 350, loss: 0.0002926356391981244
round 400, loss: 8.399760554311797e-05
round 450, loss: 3.687238859129138e-05
round 500, loss: 2.0867588318651542e-05
Takes 0.41s to finish
