## Gradient with Numpy (Manual)

In [1]:
import numpy as np

In [2]:
X = np.array([1, 2, 3, 4], dtype=np.float32) # features
Y = np.array([2, 4, 6, 8], dtype=np.float32) # ground truth

By seeing the input (X) and output (Y), we can clearly see their relation is `Y = 2 * X`

But the model does not know this. It only sees the relation as `Y = w * X`

It has to train itself to figure out the optimum value of `w` that minimizes the loss.

We initialize the `w` at a random value.

In [3]:
w = 0.0

Model training happens in iterations (epochs). Each epoch looks like the following:
- `forward()` pass that utilizes the random value of `w` to calculate the predicted value of Y (`y_pred`)
- Then loss is calculated via `calculate_loss()` method (`y - y_pred`)
- `backward()` pass that calculates gradients of the loss wrt `w` via `gradient()` method
- finally, the gradient is used to update the value of `w`

Gradient tells the condition of the weight, whether its approaching its optimum value or not.

We'll create the methods ourselves.

In [4]:
# model prediction
def forward(x):
    return w * x

# loss = MSE
def calculate_loss(y, y_pred):
    return ((y - y_pred) ** 2).mean()

# gradient
# MSE = 1/N * (w*x - y)**2
# dL/dw = 1/N * 2 * x * (w*x - y)
def gradient(x, y, y_pred):
    return np.dot(2*x, y_pred - y).mean()

In [5]:
print(f"Prediction before training: f(5) = {forward(5):.3f}")

Prediction before training: f(5) = 0.000


It should be `f(5) = 10.000`, but since `w = 0.0`, we get `y_pred = 0.0 * x = 0.0`

Now we need to train the model to adjust this `w` minimizing the loss

In [6]:
learning_rate = 0.01 # step size
epochs = 10 # num of iterations

In [7]:
# training loop:
for epoch in range(epochs):
    # forward pass to calculate predicted output
    y_pred = forward(X) 

    # calculate the loss
    loss = calculate_loss(Y, y_pred)

    # calculate the gradient
    dw = gradient(X, Y, y_pred)

    # adjust weight
    w -= learning_rate * dw

    print(f"epoch {epoch + 1}: w = {w:.3f}, loss = {loss:.3f}")

epoch 1: w = 1.200, loss = 30.000
epoch 2: w = 1.680, loss = 4.800
epoch 3: w = 1.872, loss = 0.768
epoch 4: w = 1.949, loss = 0.123
epoch 5: w = 1.980, loss = 0.020
epoch 6: w = 1.992, loss = 0.003
epoch 7: w = 1.997, loss = 0.001
epoch 8: w = 1.999, loss = 0.000
epoch 9: w = 1.999, loss = 0.000
epoch 10: w = 2.000, loss = 0.000


Let's test it with inference

In [8]:
print(f"Prediction after training: f(5) = {forward(5):.3f}")

Prediction after training: f(5) = 9.999


The model minimized the error to `0.0` and found the optimum weight `w = 2.000`

But you see, the prediction was `9.999`. The model can actually do better if we increase number of training `epochs`

Let's double it.

In [9]:
learning_rate = 0.01 # step size
epochs = 20 # num of iterations

# training loop:
for epoch in range(epochs):
    # forward pass to calculate predicted output
    y_pred = forward(X) 

    # calculate the loss
    loss = calculate_loss(Y, y_pred)

    # calculate the gradient
    dw = gradient(X, Y, y_pred)

    # adjust weight
    w -= learning_rate * dw

    print(f"epoch {epoch + 1}: w = {w:.3f}, loss = {loss:.3f}")

print(f"Prediction after training: f(5) = {forward(5):.3f}")

epoch 1: w = 2.000, loss = 0.000
epoch 2: w = 2.000, loss = 0.000
epoch 3: w = 2.000, loss = 0.000
epoch 4: w = 2.000, loss = 0.000
epoch 5: w = 2.000, loss = 0.000
epoch 6: w = 2.000, loss = 0.000
epoch 7: w = 2.000, loss = 0.000
epoch 8: w = 2.000, loss = 0.000
epoch 9: w = 2.000, loss = 0.000
epoch 10: w = 2.000, loss = 0.000
epoch 11: w = 2.000, loss = 0.000
epoch 12: w = 2.000, loss = 0.000
epoch 13: w = 2.000, loss = 0.000
epoch 14: w = 2.000, loss = 0.000
epoch 15: w = 2.000, loss = 0.000
epoch 16: w = 2.000, loss = 0.000
epoch 17: w = 2.000, loss = 0.000
epoch 18: w = 2.000, loss = 0.000
epoch 19: w = 2.000, loss = 0.000
epoch 20: w = 2.000, loss = 0.000
Prediction after training: f(5) = 10.000


Yay! Now it's a perfect `10.000`