### 作業目標: 使用Pytorch進行微分與倒傳遞
這份作業我們會實作微分與倒傳遞以及使用Pytorch的Autograd。

### 使用Pytorch實作微分與倒傳遞

這裡我們很簡單的實作兩層的神經網路進行回歸問題，其中loss function為L2 loss

$$
L2\_loss = (y_{pred}-y)^2
$$

兩層經網路如下所示
$$
y_{pred} = ReLU(XW_1)W_2
$$

In [7]:
import torch
device = torch.device('cuda')

In [35]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)
# 初始化weight W1, W2

W1 = torch.randn((D_in, H), requires_grad=True).to(device)
W2 = torch.randn((H, D_out), requires_grad=True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.matmul(h_relu, W2)

    # 計算loss
    loss = torch.square( y_pred - y).sum()
    if t % 50 == 0 :
        print(t, loss.item())
    
    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    y_pred_grad = 2 *(y_pred - y)
    W2_grad = h_relu.T.matmul(y_pred_grad)
    h_grad = y_pred_grad.matmul(W2.T) * (h > 0)
    W1_grad = x.T.matmul(h_grad)

    W2_grad_test = h_relu.T.matmul(y_pred_grad)
    h_grad_test = y_pred_grad.matmul(W2.T) * (h > 0)
    W1_grad_test = x.T.matmul(h_grad)
    
    # 參數更新
    W1.data -= learning_rate * W1_grad
    W2.data -= learning_rate * W2_grad

0 41974892.0
50 15187.625
100 756.4373168945312
150 63.35027313232422
200 6.909194469451904
250 0.9250452518463135
300 0.1443679928779602
350 0.02511386200785637
400 0.004950829781591892
450 0.001248681452125311


### 使用Pytorch的Autograd

In [12]:
import torch
device = torch.device('cpu')

In [25]:
# N: batch size
# D_in: input dimension
# H: hidden dimension
# D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# 隨機生成x, y
x = torch.randn((N, D_in)).to(device)
y = torch.randn((N, D_out)).to(device)

# 初始化weight W1, W2
W1 = torch.randn((D_in, H), requires_grad = True).to(device)
W2 = torch.randn((H, D_out), requires_grad = True).to(device)

# 設置learning rate
learning_rate = 1e-6

# 訓練500個epoch
for t in range(500):
    # 向前傳遞: 計算y_pred
    h = torch.matmul(x, W1)
    h_relu = torch.relu(h)
    y_pred = torch.matmul(h_relu, W2)

    # 計算loss
    loss = torch.square(y_pred - y).sum()
    if t % 50 == 0:
        print(t, loss.item())

    # 倒傳遞: 計算W1與W2對loss的微分(梯度)
    loss.backward()

    # 參數更新: 這裡再更新參數時，我們不希望更新參數的計算也被紀錄微分相關的資訊，因此使用torch.no_grad()
    with torch.no_grad():
    # 更新參數W1 W2
        W1.data -= learning_rate * W1.grad
        W2.data -= learning_rate * W2.grad

    # 將紀錄的gradient清空(因為已經更新參數)
        W1.grad.zero_()
        W2.grad.zero_()

0 25384656.0
50 7066.51611328125
100 159.8881378173828
150 6.105778694152832
200 0.28546658158302307
250 0.014956336468458176
300 0.0010549200233072042
350 0.0001698558626230806
400 5.5406046158168465e-05
450 2.6562072889646515e-05
