## Back Propagation

In part 3, we have simplfied the calculation of the grad quite a lot using SGD method.  
However, there is still one thing that we need to calculate by hand, i.e. the derivative below:
$$
\omega = \omega - \alpha  \cdot x_n \cdot (x_n \cdot \omega - y_n)
$$
However, in practice, we will have lots of layers, multiple inputs/outputs and non-linear functions, which makes it impossible to calculate these kinds of derivatives by hand. Therefore an automatic method should be implemented (in this part).  

Therefore in this part, we create a so-called **computation graph** to calculate the grad.  
The tensor in pytorch is used, pay attention that as long as you use a tensor, you are actually creating computation graph  
<center>!!! tensor stores the data et grad !!

In [7]:
import numpy as np
import torch

### Step 1: Define the model

In [8]:
# Define our linear model in here:
class BackPropagationModel:
    def __init__(self, weight):
        self.weight = weight

    def forward(self, input_x):
        # attention that if the weight here is a tensor, the return value of
        # forward will be a tensor type as well
        # the computation graph has been created
        return input_x * self.weight

    def loss_function(self, input_x, true_output_y):
        output_prediction = self.forward(input_x)
        return (output_prediction - true_output_y) ** 2

    # no need to define grad method here, we will use Pytorch to calcuate the grad automatically

# Training set
x = np.arange(1.0, 10.0, 1.0)
y = 3 * x

# We need to calc. the grad of the weight, therefore set requires_grad = True
weight = torch.tensor([1.0], requires_grad=True)
lr = 0.01

Let's have a look at the initial weight, which is a tensor type.  
For a tensor type, you can always use *.item()* to see its value.

In [9]:
print('Before starting, the weight is: ', weight.data.item())

Before starting, the weight is:  1.0


### Step 2. Training

In [10]:
for epoch in range(100):
    for training_x, training_y in zip(x, y):
        
        model = BackPropagationModel(weight)
        
        # forward to get the loss
        loss = model.loss_function(training_x, training_y)
        
        # backward to get the opt
        loss.backward()
        print('\tGrad = ', weight.grad.item(), '\t| Input = ', training_x, '\t| Output = ', model.forward(training_x).item())
        
        # update the weight
        weight.data -= lr * weight.grad.data
        
        # the grad computed by .bakcward() will be accumulated -> therefore always reset after update
        weight.grad.data.zero_()
        
    print('Epoch: ', epoch, '\t | Weight = ', weight.data.item(), '\t | Loss = ', loss.item())
print('After starting, the weight is: ', weight.data.item())

	Grad =  -4.0 	| Input =  1.0 	| Output =  1.0
	Grad =  -15.680000305175781 	| Input =  2.0 	| Output =  2.0799999237060547
	Grad =  -32.457603454589844 	| Input =  3.0 	| Output =  3.590399980545044
	Grad =  -47.31596755981445 	| Input =  4.0 	| Output =  6.085504055023193
	Grad =  -50.273216247558594 	| Input =  5.0 	| Output =  9.972678184509277
	Grad =  -36.19672393798828 	| Input =  6.0 	| Output =  14.983606338500977
	Grad =  -13.794975280761719 	| Input =  7.0 	| Output =  20.014644622802734
	Grad =  -0.3603515625 	| Input =  8.0 	| Output =  23.97747802734375
	Grad =  0.12768173217773438 	| Input =  9.0 	| Output =  27.00709342956543
Epoch:  0 	 | Weight =  2.999511480331421 	 | Loss =  5.0316742999712005e-05
	Grad =  -0.0009770393371582031 	| Input =  1.0 	| Output =  2.999511480331421
	Grad =  -0.0038299560546875 	| Input =  2.0 	| Output =  5.999042510986328
	Grad =  -0.007925033569335938 	| Input =  3.0 	| Output =  8.998679161071777
	Grad =  -0.01155853271484375 	| Input =