Tutorial 2 Autograd: a Automatic Differentiation package
===

<br> Central to all neural networks in PyTorch is the ``autograd`` package. Let's first briefly visit this, and we will then go to training our first neural network.

<br> The ``autograd`` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

<br> Let us see this in more simple terms with some examples.

<br> 
## Tensor
<br> ``torch.Tensor`` is the central class of the package. If you set its attribute ``.requires_grad`` as True, it starts to track all operations on it. When you finish your computation you can call ``.backward()`` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into ``.grad`` attribute.

<br> To stop a tensor from tracking history, you can call ``.detach()`` to detach it from the computation history, and to prevent future computation from being tracked.

<br> To prevent tracking history (and using memory), you can also wrap the code block in with ``torch.no_grad()``:. This can be particularly helpful when evaluating a model because the model may have trainable parameters with ``requires_grad=True``, but for which we don’t need the gradients.

<br>There’s one more class which is very important for autograd implementation -- ``Function``.

<br>``Tensor`` and ``Function`` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a ``.grad_fn`` attribute that references a ``Function`` that has created the Tensor (except for Tensors created by the user - their ``grad_fn`` is None).

<br>If you want to compute the derivatives, you can call ``.backward()`` on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to ``backward()``, however if it has more elements, you need to specify a ``gradient`` argument that is a tensor of matching shape.

<br> The following block is an example to illustate:


In [9]:
import torch
# Create a tensor and set ``requires_grad=True`` to track computation with it
x = torch.ones(2, 2, requires_grad=True)
print(x)
# Do a tensor operation:
y = x + 2
print(y)
# y was created as a result of an operation, so it has a grad_fn
print(y.grad_fn)
# Do more operations on y
z = y * y * 3
out = z.mean()
print(z, out)

'''
change an existing tensor's ``.requires_grad_( ... )'' by setting its ``requires_grad flag``
in-place, whose flag is default to ``False`` if not given
'''

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x000001F25E3C93C8>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
False
True
<SumBackward0 object at 0x000001F25E3C9358>


## Gradient

<br> 

In [12]:
out.backward()
print(x.grad)

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.