# Automatic Differentiation with `torch.autograd`

**Back Propagation** is most frequent used algorithm while training Neural Network.
In this algo, parameters( model weights ) are adjusted according to the gradient of the loss function with respect to the given parameter.

to compute those gradients, pyTorch has a built-in differentiaion engine called `torch.autograd`. it support automatic computation of gradient for any computational graph.

Consider simple one-layer NN, with:

- `x` input
- parameters `w` and `b`
- some loss function

In [13]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
# print(loss)

### Tensors, Functions and Computational Graph

In [14]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x000001DEB0DBAE90>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000001DEB0DB87C0>


### Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need      `∂loss/∂w`​ and `∂loss/∂b​` under some fixed values of `x` and `y`. To compute those derivatives, we call `loss.backward()`, and then retrieve the values from `w.grad` and `b.grad`:

In [15]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3250, 0.1356, 0.0164],
        [0.3250, 0.1356, 0.0164],
        [0.3250, 0.1356, 0.0164],
        [0.3250, 0.1356, 0.0164],
        [0.3250, 0.1356, 0.0164]])
tensor([0.3250, 0.1356, 0.0164])


### Disabling Gradient Tracking

In [None]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [17]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


### More on Computational Graphs

### Optional Reading: Tensor Gradients and Jacobian Products