# Getting familiar with torch.autograd

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

- **Forward Propagation**: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

- **Backward Propagation**: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. 

In [1]:
import torch

### Vector to scalar

In [2]:
# define the x vector
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

In [None]:
# define the y = f(x) function returning a scalar
y = 2 * torch.dot(x, x)
y

In [None]:
# define the y = f(x) function returning a scalar
y = 2 * torch.dot(x, x) # 2 * x dot x
y

tensor(28., grad_fn=<MulBackward0>)

We can now take the gradient of y with respect to x ($\frac{\partial y}{\partial x}$) by calling its `backward` method.

Next, we can access the gradient via x's `grad` attribute

In [5]:
y.backward()    #take gradient of y w.r.t x
x.grad          # x = tensor([0., 1., 2., 3.])

tensor([ 0.,  4.,  8., 12.])

In [6]:
4 * x

tensor([ 0.,  4.,  8., 12.], grad_fn=<MulBackward0>)

In [7]:
x.grad == 4 * x

tensor([True, True, True, True])

### Vector to vector

Because y is a vector, we must pass a gradient argument to backward(). 

We pass $v^ᵀ$ with the same length as y and has values 1.

In [9]:
x = torch.tensor([1., 2.], requires_grad=True)
print('x:', x)

y = 3 * x**2
print('y:', y)

gradient_value = [1., 1.] # here is to show how to use the gradient argument in the backward function
# it is not necessary to use this argument, but it can be useful in some cases, to scale the gradients (see example below)

y.backward(torch.tensor(gradient_value)) 
print('x.grad:', x.grad)

x: tensor([1., 2.], requires_grad=True)
y: tensor([ 3., 12.], grad_fn=<MulBackward0>)
x.grad: tensor([ 6., 12.])


In [10]:
x = torch.tensor([1., 2.], requires_grad=True)
print('x:', x)

y = 3 * x**2
print('y:', y)

gradient_value = [1., 10.] # here is to show how to use the gradient argument
y.backward(torch.tensor(gradient_value)) 
print('x.grad:', x.grad)


x: tensor([1., 2.], requires_grad=True)
y: tensor([ 3., 12.], grad_fn=<MulBackward0>)
x.grad: tensor([  6., 120.])


One benefit of using automatic differentiation is that even if building the computational graph of a function required passing through a maze of Python control flow (e.g., conditionals, loops, and arbitrary function calls), we can still calculate the gradient of the resulting variable.

In [11]:
def f(a): 
    b = a * 2
    while b.norm() < 1000: 
        b = b * 2
    if b.sum() > 0: 
        c = b
    else:
        c = 100 * b
    return c

In [12]:
a = torch.randn(size=(), requires_grad=True)
print('a:', a)
d = f(a)
d.backward()
print('a.grad:', a.grad)
# check that gradient is f(a)/a as expected dince the function is f(a) = constant * a
a.grad == f(a)/a

a: tensor(1.2009, requires_grad=True)
a.grad: tensor(1024.)


tensor(True)

## Exercizes

### Ex 1 -- Practice with this topic, you can follow the step-by-step tutorial here
 
[*The Gradient Argument in PyTorch’s `backward()` Function Explained by Examples*](https://zhang-yang.medium.com/the-gradient-argument-in-pytorchs-backward-function-explained-by-examples-68f266950c29).

### Ex 2 -- Let 𝑓 (𝑥) = sin(𝑥). Plot the graph of 𝑓 and of its derivative 𝑓 ′. Do not exploit the fact that 𝑓 ′ (𝑥) = cos(𝑥) but rather use automatic differentiation to get the result.

### Ex 5 -- Let 𝑓 (𝑥) = ((log $x^2$) · sin 𝑥) + $𝑥^{−1}$. 

Write out a dependency graph tracing results from 𝑥 to 𝑓 (𝑥):
We'll identify intermediate steps starting from x and building up to f(x)

- Start from: x

- Intermediate computations:
    - $x^2$ 
        - log($x^2$)
    - sin(x) 
        - log($x^2$) * sin(x)
    - 1/x
        - log($x^2$) * sin(x) + 1/x

Now use the chain rule to comput the analytical derivative of thea function; also comppute the gradient using autigrad and compare the two solutions by plotting them and see if they overlap.