### Autograd: Automatic Differentiation

Central to all neural networks in PyTorch is the ```autograd``` package. The ```autograd``` package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and every single iteration can be different.


### Tensor

```torch.Tensor``` is the central class of the package. If you set its attribute ```.requires_grad``` as ```True```, it starts to track all operations on it. When you finish your computation you can call ```.backward()``` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into ```.grad``` attribute.



In [1]:
import torch

In [2]:
# Create a tensor and set requires_grad = True to track computation with it.
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [3]:
# Do an operation of tensor.
y = x + 2 
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


 ```y``` was created as a result of an operation, so it has a ```grad_fn```

In [4]:
print(y.grad_fn)

<AddBackward0 object at 0x7f8cc94316a0>


In [5]:
# Do more operations on y
z = y * y * 3 
out = z.mean()
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)


```.requires_grad_()``` changes an existing Tensor's ```requires_grad``` flag in-place. The input flag defaults to ```False``` if not given.

In [6]:
a = torch.randn(2, 2)
a = ((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7f8c6c7cd400>


### Gradient

Let's backprop now Because ```out``` contains a single scalar, ```out.backward()``` is equivalent to ```out.backward(torch.tensor(1.))```.

In [7]:
out.backward()

In [8]:
# print gradients d(out)/dx
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


  You should have got a matrix of ```4.5```. Let's call the ```out``` Tensor "o". We have that $o = 1/4\Sigma_iz_i$, $z_i = 3(x_i + 2)^2$ and $z_i|_{x_i=1}=27$. Therefore, $ {\partial0\over\partial{x_i}} = 3/2(x_i+2)$, hence $ {{\partial{0}\over\partial{x_i}}|_{x_i=1}} =  {9\over2} = 4.5$

In [9]:
# Let's take a look at an example of Jacobian-vector product:
x = torch.randn(3, requires_grad = True)

y = x * 2 

while y.data.norm() < 1000:
    y = y * 2 
    
print(y)

tensor([ 456.1063, -466.0776, 1430.5864], grad_fn=<MulBackward0>)


In [10]:
# Now in this case y is no longer a scalar. torch.autograd could not compute the full Jacobian directly
# but if we just wnat the Jacobian-vector product, simple pass the vector to backward as argument.

v = torch.tensor([0.1, 1.0, 0.0001], dtype = torch.float)
y.backward(v)
print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


In [11]:
y.backward()


RuntimeError: grad can be implicitly created only for scalar outputs

backward()　experiment

In [12]:
#simple gradient
a = torch.tensor([2.,3.], requires_grad=True)
b = a + 3 
c = b * b * 3 
out = c.mean()
out.backward()

In [13]:
print('input: ', a.data)

input:  tensor([2., 3.])


In [14]:
print('compute result is ', out.data)

compute result is  tensor(91.5000)


In [15]:
print('input gradients are: ', a.grad.data)

input gradients are:  tensor([15., 18.])


对于非标量情况下的backward.

In [16]:
m = torch.tensor([[2., 3.]], requires_grad = True)
n = torch.zeros((1,2))
print(n)
n[0,0] = m[0,0] ** 2 
n[0,1] = m[0,1] ** 3

tensor([[0., 0.]])


In [17]:
n.backward(m.data, retain_graph=True)

In [18]:
m.grad

tensor([[ 8., 81.]])

In [19]:
n.backward(torch.tensor([[1.,1]]))

In [20]:
m.grad 

tensor([[ 12., 108.]])