# Autograd

Sofar, we have seen how to make calculations with torch, and how to build a datagenerator. 

So, in theory we have enough knowledge to deliver the data in batches to our machine learning model to perform calculations on the data.

But how to adjust the weights? How does the model learn which weights should be adjusted in which direction?


Let's start with guessing the weights $w$ and $b$

In [None]:
import torch

w = torch.tensor([2.], requires_grad=True)
b = torch.tensor([6.], requires_grad=True)

Note the `requires_grad`, by which we are telling torch to keep track of all calculations, 
so that we can calculate the gradient later on.

We create a new output tensor $Q$ with a calculation:
$$
Q = w x + b
$$

This is our output, with the variables we have guessed.


first we need some data $x$:

In [None]:
x = torch.tensor([1.0, 2.0, 3.0])
Q = w * x + b
Q

This gives a certain outcome. But how do we know if this is correct? For that, we need 

- some sort of ground truth.
- a way to calculate the error

A common way to calculate the error is the Mean Square Error:

In [None]:
def mse(y: torch.Tensor, yhat: torch.Tensor) -> torch.Tensor:
    return ((y - yhat)**2).mean()

Now, lets assume the real values for w and b are 4 and 1

In [None]:
y = 4 * x + 1
y

Normally, you don't have access to these "real" values. You only have access to the outcome $y$, and your guess is that this
outcome is produced something that is close to a model, in our case the linear model.

We can compare the error with our estimates of $w$ and $b$

In [None]:
loss = mse(y, Q)
loss

So, we have some loss, in this case, a total loss of 3.66. We want to minimize this loss, and need to adjust our guessed weights in order to do so.

During training, we need the gradients of the error as defined by the loss function $\mathcal{L}$ with respect to the parameters $w$ and $b$. This means we want:

$$
\frac{\partial \mathcal{L}}{\partial w}
$$
and
$$
\frac{\partial \mathcal{L}}{\partial b}
$$

With the `.backward()` method, torch will calculate all the derivatives. They are stored in the parameters.

In [None]:
loss.backward()

We could calculate the derivatives by hand, which is tedious, especially if you have many nested calculations. But because our two parameters `w` and `b` where marked with `requires_grad=True`, the gradient was tracked.

In [None]:
w.grad, b.grad

You see? Calling `.backward()` modified the loss in the parameters that are tracked with `requires_grad`.

Typically, we would adjust the weights by a certain factor, the learning rate. Typically this is set to `1e-3` , but it can be as big as `1e-1` and as small as `1e-5`. 

It can even vary during training: you start with `1e-1`, and if the improvement of the learning slows down you decrease the learning rate with a certain factor, e.g. to `1e-2`.

Lets adjust the weights:

In [None]:
learning_rate = 1e-1
w = w - learning_rate * w.grad
b = b - learning_rate * b.grad
w, b

And run a new prediction. 

In [None]:
Q = w * x + b
loss = mse(y, Q)
loss

That worked! Our loss is lower!

After the adjustment of the weights, the training continues:

- make a prediction
- calculate the loss
- calculate the gradients
- adjust the weights with respect to the error with a certain rate

And this is how the model learns!

We would need an optimizer to properly reset the accumulated gradients if we want a true learning loop, which we will see later on.
