## Autograd: Automatic Differentiation

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py

Central to all neural networks in PyTorch is the `autograd` package.  

The `autograd` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

### Tensor

- `torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

- To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

- To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad():`. This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don’t need the gradients.

<br>

- There’s one more class which is very important for autograd implementation - a `Function`.

- `Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a `Function` that has created the `Tensor` (except for Tensors created by the user - their `grad_fn is None`).

- If you want to compute the derivatives, you can call `.backward()` on a `Tensor`. If `Tensor` is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `gradient` argument that is a tensor of matching shape.

In [1]:
import torch

In [2]:
# Create a tensor and set requires_grad=True to track computation with it
def create_tensor():
    x = torch.ones(2, 2, requires_grad=True)
    print(x)
    return x

x = create_tensor()

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [3]:
# Do a tensor operation:
def tensor_operation(x):
    y = x + 2
    print(y)
    return y

y = tensor_operation(x)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [4]:
# Do more operations on y
def tensor_operation2(y):
    z = y * y * 3
    out = z.mean()
    print(z, out)
    return out

out = tensor_operation2(y)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


In [5]:
# .requires_grad_( ... ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.
def tensor_operation3():
    a = torch.randn(2, 2)
    a = ((a * 3) / (a - 1))
    print(a.requires_grad)
    a.requires_grad_(True)
    print(a.requires_grad)
    b = (a * a).sum()
    print(b.grad_fn)
    
tensor_operation3()

False
True
<SumBackward0 object at 0x10e9bfed0>


### Gradients

In [7]:
# Let’s backprop now. Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1.)).
out.backward()

# Print gradients d(out)/dx
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [12]:
# Now let’s take a look at an example of vector-Jacobian product:
def tensor_product():
    x = torch.randn(3, requires_grad=True)

    y = x * 2
    while y.data.norm() < 1000:
        y = y * 2

    print(y)
    return y
    
y = tensor_product()

tensor([-511.5748, -816.6202, -738.9502], grad_fn=<MulBackward0>)


In [14]:
# Now in this case y is no longer a scalar. 
# torch.autograd could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument:
def torch_autograd(y):
    v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
    y.backward(v)

    print(x.grad)
    
torch_autograd(y)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [15]:
# You can also stop autograd from tracking history on Tensors with .requires_grad=True either by wrapping the code block in with torch.no_grad():
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


In [16]:
# Or by using .detach() to get a new Tensor with the same content but that does not require gradients:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)
