In [1]:
import torch
import numpy as np

### Calculation of gradients

In [14]:
x = torch.randn(size=(3,), requires_grad=True)
print(x)


tensor([-0.6297,  1.8933, -0.7387], requires_grad=True)


Whenever a PyTorch tensor with ```requires_grad=True``` is used in a computation, a computational graph is created that is utilized during backpropagation.

In [15]:
y = x + 2
print(y)

tensor([1.3703, 3.8933, 1.2613], grad_fn=<AddBackward0>)


The attribute ```grad_fn``` is utilized for the backpropagation of the gradient. As a value is being added, the value of the attribute is ```AddBackward0```.

In [27]:
z = y * y * 2
z  = z.mean()
print(z)

tensor(12.4175, grad_fn=<MeanBackward0>)


It can be seen that the value of ```grad_fn``` is now ```MulBackward0``` as the operation is a multiplication operation.

Computation of the gradients involves invoking the ```backward()``` method.

In [28]:
z.backward() # dz/dx

```backward()``` only works on scalar values. For the method to work with non-scalar tensors, one must keep in mind that the gradients are calculated by a **Jacobian product** and would thus need a vector to be provided that would match the shape of the gradients.

The gradients of a tensor can be found using the ```grad``` attribute.

In [29]:
print(x.grad)

tensor([ 7.3081, 20.7642,  6.7272])


**Prevention of gradient tracking**
- ```x.requires_grad_(False)```
- ```x.detach()```
- ```with torch.no_grad():```
  ```x ...```

To set the ```grad``` attribute to reset:

In [31]:
x.grad.zero_()
print(x.grad)

tensor([0., 0., 0.])


In a training loop, the gradient in the ```grad``` attribute accumulates with each iteration. This is prevented by resetting the values to zero with the ```.grad.zero_()``` method

### Backpropagation

For each operation, PyTorch creates a computational graph that allows for calculation of local gradients which makes gradient calculation simpler.

Training overview:
- Forward pass: Computation of loss
- Computation of local gradients
- Backward pass: Computation of $\frac{\partial Loss}{\partial Weights}$

**Manual** training loop using ```backward()```

In [44]:
x = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

def forward(x):
    return w * x

def loss(y, y_hat):
    return((y - y_hat)**2).mean()

lr = 0.01
n_iters = 50

for epoch in range(n_iters):
    y_pred = forward(x)
    
    l = loss(y, y_pred)
    l.backward() # gradient calculation
    with torch.no_grad():
        w -= lr * w.grad
    
    w.grad.zero_()
    
    if (epoch % 10 == 0):
        print(f"epoch:\t{epoch}\tweight:\t{w.item()}\tloss:\t{l}")
        
with torch.no_grad():
    print(forward(5).item())

epoch:	0	weight:	0.29999998211860657	loss:	30.0
epoch:	10	weight:	1.6653136014938354	loss:	1.1627856492996216
epoch:	20	weight:	1.934108853340149	loss:	0.0450688973069191
epoch:	30	weight:	1.987027645111084	loss:	0.0017468547448515892
epoch:	40	weight:	1.9974461793899536	loss:	6.770494655938819e-05
9.997042655944824


Timestamp 1:15:12