In [1]:
import torch
import numpy as np

### Calculation of gradients

In [2]:
x = torch.randn(size=(3,), requires_grad=True)
print(x)


tensor([-0.6587,  1.2707, -0.9851], requires_grad=True)


Whenever a PyTorch tensor with ```requires_grad=True``` is used in a computation, a computational graph is created that is utilized during backpropagation.

In [3]:
y = x + 2
print(y)

tensor([1.3413, 3.2707, 1.0149], grad_fn=<AddBackward0>)


The attribute ```grad_fn``` is utilized for the backpropagation of the gradient. As a value is being added, the value of the attribute is ```AddBackward0```.

In [4]:
z = y * y * 2
z  = z.mean()
print(z)

tensor(9.0178, grad_fn=<MeanBackward0>)


It can be seen that the value of ```grad_fn``` is now ```MulBackward0``` as the operation is a multiplication operation.

Computation of the gradients involves invoking the ```backward()``` method.

In [5]:
z.backward() # dz/dx

```backward()``` only works on scalar values. For the method to work with non-scalar tensors, one must keep in mind that the gradients are calculated by a **Jacobian product** and would thus need a vector to be provided that would match the shape of the gradients.

The gradients of a tensor can be found using the ```grad``` attribute.

In [6]:
print(x.grad)

tensor([1.7883, 4.3610, 1.3532])


**Prevention of gradient tracking**
- ```x.requires_grad_(False)```
- ```x.detach()```
- ```with torch.no_grad():```
  ```x ...```

To set the ```grad``` attribute to reset:

In [7]:
x.grad.zero_()
print(x.grad)

tensor([0., 0., 0.])


In a training loop, the gradient in the ```grad``` attribute accumulates with each iteration. This is prevented by resetting the values to zero with the ```.grad.zero_()``` method

### Backpropagation

For each operation, PyTorch creates a computational graph that allows for calculation of local gradients which makes gradient calculation simpler.

Training overview:
- Forward pass: Computation of loss
- Computation of local gradients
- Backward pass: Computation of $\frac{\partial Loss}{\partial Weights}$

### Manual training loop using ```backward()```

In [8]:
x = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

def forward(x):
    return w * x

def loss(y, y_hat):
    return((y - y_hat)**2).mean()

lr = 0.01
n_iters = 50

for epoch in range(n_iters):
    y_pred = forward(x)
    
    l = loss(y, y_pred)
    l.backward() # gradient calculation
    with torch.no_grad():
        w -= lr * w.grad
    
    w.grad.zero_()
    
    if (epoch % 10 == 0):
        print(f"epoch:\t{epoch}\tweight:\t{w.item()}\tloss:\t{l}")
        
with torch.no_grad():
    print(forward(5).item())

epoch:	0	weight:	0.29999998211860657	loss:	30.0
epoch:	10	weight:	1.6653136014938354	loss:	1.1627856492996216
epoch:	20	weight:	1.934108853340149	loss:	0.0450688973069191
epoch:	30	weight:	1.987027645111084	loss:	0.0017468547448515892
epoch:	40	weight:	1.9974461793899536	loss:	6.770494655938819e-05
9.997042655944824


### Automating the training loop

Steps:
1) Design model (input, output size, forward size)

2) Construct loss and optimizer

3) Training loop:

    - forward pass: compute prediction
    
    - backward pass: gradients
    
    - update weights

In [10]:
import torch.nn as nn # Neural network model

```torch.nn.Linear```: Applies a linear transformation to the incoming data: 
$y = xA^T + b$

When using Pytorch utilities to train a model:
- Define the model: Can be done using layers given in ```torch.nn```.
- Define the loss and optimizer: Optimizers are under ```torch.optim```.
- One forward pass is done by simpling running the model on the data 

Data to be passed to a model cannot be 1-dimensional. Shape should be ```(n_samples, n_features)```

In [21]:
x = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)

n_samples, n_features = x.shape

x_test = torch.tensor([5], dtype=torch.float32)

# Defining the model
# model = nn.Linear(in_features=n_features, 
#                 out_features=n_features,
#                 bias=False)

lr = 0.01
n_iters = 50

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

for epoch in range(n_iters):
    y_pred = model(x)
    
    l = loss(y, y_pred)
    l.backward() # gradient calculation

    optimizer.step() # Automatic updation of weights
    optimizer.zero_grad() # Automatic resetting of weights
    
    if (epoch % 10 == 0):
        [w,b] = model.parameters()
        print(f"epoch:\t{epoch}\tweight:\t{w[0][0].item()}\tloss:\t{l}")
        
with torch.no_grad():
    print(forward(5).item())

epoch:	0	weight:	1.1769566535949707	loss:	7.69447135925293
epoch:	10	weight:	1.813909649848938	loss:	0.20492786169052124
epoch:	20	weight:	1.9179493188858032	loss:	0.010814713314175606
epoch:	30	weight:	1.9362235069274902	loss:	0.005471664480865002
epoch:	40	weight:	1.9406569004058838	loss:	0.005031246691942215
9.713160514831543


### Custom model
Follows the same setup as TensorFlow. The method for one forward pass is ```forward()```

In [19]:
class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(in_features=input_dim,
                               out_features=output_dim)
        
    def forward(self, x):
        return self.linear(x)
    
model = LinearRegression(n_features, n_features)