## Stochastic Gradient Descent

In the previous part, the powerful gradient descent algorithm is developed to obtain the opt.  
However, we have to realize that it is not always possible to calculate the grad over all the samples (or more specifically sum up all gradients and averaging).  
Therefore, we have to simplify this, which is known as the **SGD** method this in part.  

![SGD_vs_GD](./img/P2_et_P3/SGD_vs_GD.png)

The structure of the code is more or less the same as P2, therefore no more detailed explaination.

In [3]:
import numpy as np

# Define our model:
class SGDModel:
    def __init__(self, weight):
        self.weight = weight

    def forward(self, input_x):
        return input_x * self.weight

    # N.B. Here we use loss function once again instead of cost function
    # Since for SGD it calc the grad only on 1 sample <---------------------------------------- Attention !!!!!!!!
    # def of the loss et cost has been illustrated in P2
    def loss(self, input_x, true_output_y):
        # get the predicted output using an ASSIGNED weight
        output_prediction = self.forward(input_x)
        # calculate the differences between true output and predicted output
        return (output_prediction - true_output_y) ** 2

    # Get the grad via the formular given in the img above
    def grad(self, input_x, output_y):
        return 2 * input_x * (input_x * self.weight - output_y)

# Define initial value in here:
weight = 1
lr = 0.001  # can try play with learning rate here

# Training set
x = np.arange(1.0, 10.0, 1.0)
y = 3 * x

print('Before starting, the weight is: ', weight)

for epoch in range(100):
    # here the differneces with P2 is
    # we update weight by every grad of the sample of the training set
    for training_x, training_y in zip(x, y):
        model = SGDModel(weight)
        gradient = model.grad(training_x, training_y)
        weight -= lr * gradient
        print('\tGrad = ', gradient, '\t| Input = ', training_x, '\t| Output = ', model.forward(training_x))
        loss = model.loss(training_x, training_y)
    print('Epoch: ', epoch, '\t | Weight = ', weight, '\t | Loss = ', loss)

print('After starting, the weight is: ', weight)

Before starting, the weight is:  1
	Grad =  -4.0 	| Input =  1.0 	| Output =  1.0
	Grad =  -15.968 	| Input =  2.0 	| Output =  2.008
	Grad =  -35.640576 	| Input =  3.0 	| Output =  3.059904
	Grad =  -62.220525568 	| Input =  4.0 	| Output =  4.222434304
	Grad =  -94.1085449216 	| Input =  5.0 	| Output =  5.58914550784
	Grad =  -128.74048945274882 	| Input =  6.0 	| Output =  7.271625878937599
	Grad =  -162.61354267764983 	| Input =  7.0 	| Output =  9.38474695159644
	Grad =  -191.57866513664018 	| Input =  8.0 	| Output =  12.026333428959989
	Grad =  -211.43100431142452 	| Input =  9.0 	| Output =  15.25383309380975
Epoch:  0 	 | Weight =  1.9063013480680633 	 | Loss =  137.97243698807904
	Grad =  -2.1873973038638734 	| Input =  1.0 	| Output =  1.9063013480680633
	Grad =  -8.732090037024582 	| Input =  2.0 	| Output =  3.8169774907438545
	Grad =  -19.490024962638866 	| Input =  3.0 	| Output =  5.751662506226856
	Grad =  -34.025252468109095 	| Input =  4.0 	| Output =  7.7468434414