## Computer Vision Project 2 Task 2 Back propagation
In this task, you will implement the following to train a MLP:
1. forward pass
2. backward pass
3. weights update

The MLP has an input layer, one hidden layer, and one output layer.

The input layer, the hidden layer, and the output layer has 784 nodes, 128 nodes, and 10 nodes, respectively.

You can use only the given sigmoid function as activation function.

You cannot use library functions except:
* torch.add
* torch.mul
* torch.transpose
* torch.mm

In [39]:
import torch
from torchvision import transforms, datasets
from torch.autograd import Variable
import torch.nn.functional as F
import numpy as np

print(torch.__version__)

torch.manual_seed(77)

1.8.1+cu101


<torch._C.Generator at 0x7f76bcbf6d90>

In [40]:
def sigmoid(x):
  return torch.div(torch.tensor(1.0), torch.add(torch.tensor(1.0), torch.exp(torch.negative(x))))

def sigmoid_prime(x):
  return torch.mul(sigmoid(x), torch.subtract(torch.tensor(1.0), sigmoid(x)))

In [41]:
train_MNIST = datasets.MNIST("MNIST_data/", train=True, transform=transforms.ToTensor(), download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_MNIST,
                                          shuffle=True,
                                          drop_last=True)
dtype = torch.float32
D_in, H, D_out = 784, 128, 10

Refer to the following equation to implement forward pass:

$$ z_1 = W_1 x + b_1 $$
$$ a_1 = \sigma(z_1) $$
$$ z_2 = W_2 x + b_2 $$
$$ a_2 = \sigma(z_2) $$


In [49]:

# A weight and a bias for input nodes
w1 = Variable(torch.randn(D_in, H, dtype=dtype, requires_grad=True)) * np.sqrt(1. / D_in)
b1 = Variable(torch.randn(1, H, dtype=dtype, requires_grad=True)) * np.sqrt(1. / D_in)

# A weight and a bias for hidden nodes
w2 = Variable(torch.randn(H, D_out, dtype=dtype, requires_grad=True)) * np.sqrt(1. / H)
b2 = Variable(torch.randn(1, D_out, dtype=dtype, requires_grad=True)) * np.sqrt(1. / H)

learning_rate = 0.1

for epoch in range(5): 
  corrects = 0
  for i, data in enumerate(train_loader):
    x, y = data
    x = x.reshape((1,-1))
    y_onehot = torch.zeros((1,10))
    y_onehot[0,y] += 1


    ############################################################################
    # TODO: Implement the forward pass for the two-layer net                   #
    #                                                                          #
    ############################################################################
    a, b = x.shape
    x = (x - (x.sum()/(a*b)))
    z1 = torch.add(torch.mm(x, w1), b1)
    a1 = sigmoid(z1)
    z2 = torch.add(torch.mm(a1, w2), b2)
    a2 = sigmoid(z2)

    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    
    diff = a2 - y_onehot
    
    ############################################################################
    # TODO: Implement the backword pass for the two-layer net and update the   #
    # parameters                                                               #
    ############################################################################

    # backward pass
    d_z2 = torch.mul(diff, sigmoid_prime(a2))
    d_b2 = 1.0 * d_z2
    d_w2 = torch.mm(torch.transpose(a1, 0, 1), d_z2)
    d_a1 = torch.mm(d_z2, torch.transpose(w2, 0, 1))

    d_z1 = torch.mul(d_a1, sigmoid_prime(a1))
    d_b1 = 1.0 * d_z1
    d_w1 = torch.mm(torch.transpose(x, 0, 1), d_z1)

    # weight update
    w1 -= learning_rate * d_w1
    b1 -= learning_rate * d_b1
    w2 -= learning_rate * d_w2
    b2 -= learning_rate * d_b2

    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    
    if torch.argmax(a2) == y:
      corrects += 1

    if i % 10000 == 0:
      print("Epoch {}: {}/{}".format(epoch+1, i, len(train_MNIST)))
      
  print("Epoch {}, Accuracy: {:.3f}".format(epoch+1, corrects/len(train_MNIST))) 



Epoch 1: 0/60000
Epoch 1: 10000/60000
Epoch 1: 20000/60000
Epoch 1: 30000/60000
Epoch 1: 40000/60000
Epoch 1: 50000/60000
Epoch 1, Accuracy: 0.887
Epoch 2: 0/60000
Epoch 2: 10000/60000
Epoch 2: 20000/60000
Epoch 2: 30000/60000
Epoch 2: 40000/60000
Epoch 2: 50000/60000
Epoch 2, Accuracy: 0.914
Epoch 3: 0/60000
Epoch 3: 10000/60000
Epoch 3: 20000/60000
Epoch 3: 30000/60000
Epoch 3: 40000/60000
Epoch 3: 50000/60000
Epoch 3, Accuracy: 0.921
Epoch 4: 0/60000
Epoch 4: 10000/60000
Epoch 4: 20000/60000
Epoch 4: 30000/60000
Epoch 4: 40000/60000
Epoch 4: 50000/60000
Epoch 4, Accuracy: 0.926
Epoch 5: 0/60000
Epoch 5: 10000/60000
Epoch 5: 20000/60000
Epoch 5: 30000/60000
Epoch 5: 40000/60000
Epoch 5: 50000/60000
Epoch 5, Accuracy: 0.930
