<a href="https://colab.research.google.com/github/twwhatever/cs101/blob/master/ml/backprop/pytorch/Backprop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Backprop example

End-to-end illustration of backpropagation using PyTorch.


In [None]:
import torch

We will use the x-or dataset as an example

In [None]:
x = torch.tensor(
    [
      [0, 0],
      [0, 1],
      [1, 0],
      [1, 1],
    ],
    dtype=torch.float32,
)
y = torch.tensor(
    [
      0,
      1,
      1,
      0,
    ],
    dtype=torch.float32,
)

We'll set up a two-layer network.  PyTorch defines a bunch of ready-made layers and architectures, but for this example we'll explicitly build each of the parameters.

For exposition purposes, we're setting the weights to a known-good initial point.

In [None]:
# Weights for layer 1
w1 = torch.tensor([[0.1, 0.3], [0.2, 0.1]], requires_grad=True)
# bias for layer 1
b1 = torch.tensor([0.2, -0.2], requires_grad=True)
# Weights for layer 2
w2 = torch.tensor([[0.1], [-0.4]], requires_grad=True)
# bias for layer 2
b2 = torch.tensor([0.0], requires_grad=True)

The first thing we need to do is compute the forward pass through the network and the loss.  We'll use MSE (actually, it's square) as the loss for simplicity.

In [None]:
o1 = torch.relu(torch.matmul(x[0], w1) + b1)
print(o1)
o2 = torch.sigmoid(torch.matmul(o1, w2) + b2)
print(o2)
# MSE loss
loss = (y[0] - o2) ** 2
print(loss)

tensor([0.2000, 0.0000], grad_fn=<ReluBackward0>)
tensor([0.5050], grad_fn=<SigmoidBackward>)
tensor([0.2550], grad_fn=<PowBackward0>)


The forward pass tracks operations on the parameters.  For example, you can see the `.grad_fn` attribute in the tensors above.  

Now we use the loss to compute the backward pass.

In [None]:
loss.backward()
print(w2.grad, b2.grad)
print(w1.grad, b1.grad)

tensor([[0.0505],
        [0.0000]]) tensor([0.2525])
tensor([[0., 0.],
        [0., 0.]]) tensor([0.0252, 0.0000])


The backward pass has computed the partial derivatives of the loss with respect to each parameter and stored the results in the `.grad` attribute for each parameter.  

Now we just need to perform the gradient step.

In [None]:
# learning rate
n = 0.01
# We don't want to track updates from gradient descent!
with torch.no_grad():
  w1 -= n * w1.grad
  b1 -= n * b1.grad
  print(w1, b1)
  w2 -= n * w2.grad
  b2 -= n * b2.grad
  print(w2, b2)

tensor([[0.1000, 0.3000],
        [0.2000, 0.1000]], requires_grad=True) tensor([ 0.1997, -0.2000], requires_grad=True)
tensor([[ 0.0995],
        [-0.4000]], requires_grad=True) tensor([-0.0025], requires_grad=True)


Now that we've gone through the basics, we'll write some convenience functions that allow us to illustrate a full training run.

In [None]:
# Forward pass
def forward(x):
  ol1 = torch.relu(torch.matmul(x, w1) + b1)
  ol2 = torch.sigmoid(torch.matmul(ol1, w2) + b2)
  return ol2

# Gradient update
def step(ts):
  with torch.no_grad():
    for t in ts:
      t -= n * t.grad

# Zero gradients
def zero_grad(ts):
  for t in ts:
    if t.grad is not None:
      t.grad.data.zero_()


In [None]:
EPOCHS = 5000

for epoch in range(EPOCHS):
  # "batch size" of 1
  for i in range(4):
    zero_grad([w1, b1, w2, b2])
    loss = (y[i] - forward(x[i])) ** 2
    loss.backward()
    step([w1, b1, w2, b2])
  # report loss on full dataset every 1000 epochs
  if epoch % 1000 == 0:
    loss = torch.mean((y - forward(x).squeeze()) ** 2)
    print(f"EPOCH {epoch + 1} complete, loss {loss.item()}")
print(w1, b1)
print(w2, b2)
for i in range(4):
  print(f"{y[i], forward(x[i])}")

EPOCH 1 complete, loss 0.24752041697502136
EPOCH 1001 complete, loss 0.2040688395500183
EPOCH 2001 complete, loss 0.13945071399211884
EPOCH 3001 complete, loss 0.05831518396735191
EPOCH 4001 complete, loss 0.025512760505080223
tensor([[1.6383, 2.2101],
        [1.6393, 2.2098]], requires_grad=True) tensor([-5.1987e-04, -2.2097e+00], requires_grad=True)
tensor([[ 2.2965],
        [-3.8227]], requires_grad=True) tensor([-1.5349], requires_grad=True)
(tensor(0.), tensor([0.1773], grad_fn=<SigmoidBackward>))
(tensor(1.), tensor([0.9028], grad_fn=<SigmoidBackward>))
(tensor(1.), tensor([0.9025], grad_fn=<SigmoidBackward>))
(tensor(0.), tensor([0.0789], grad_fn=<SigmoidBackward>))
