# PyTorch Gradients
<a><strong><tt>autograd</tt></strong></a> implementation of gradient descent. 
* <a><tt><strong>torch.autograd.backward()</strong></tt></a>
* <a ><tt><strong>torch.autograd.grad()</strong></tt></a>

## Autograd - Automatic Differentiation

The PyTorch <a><strong><tt>autograd</tt></strong></a> package provides automatic differentiation for all operations on Tensors. 
This is because operations become attributes of the tensors themselves. 

When a Tensors <tt>.requires_grad</tt> attribute is set to True, it starts to track all operations on it.

When an operation finishes you can call <tt>.backward()</tt> and have all the gradients computed automatically.

The gradient for a tensor will be accumulated into its <tt>.grad</tt> attribute.

<h4> <a><u> Back-Propogation

In [12]:
# STEP 1
import torch

In [13]:
# STEP 2
#create a tensor to track all the operations on it
x=torch.tensor(2.0,requires_grad=True)

#float value is necessary [2(int)] won't work.

In [14]:
# STEP 3
#Defined a function
y = 2*x**4 + x**3 + 3*x**2 + 5*x + 1

print(y)

tensor(63., grad_fn=<AddBackward0>)


###### The output is corresponding to value of y at x=2.0 passed above.
###### <a>Since $y$ was created as a result of an operation, it has an associated gradient function accessible as <tt>y.grad_fn</tt><br>


In [15]:
# STEP 4
# Backdrop
"""Computes the gradient of current tensor"""
y.backward()


In [16]:
# STEP 5
#Display the resulting gradient
print(x.grad)

tensor(93.)


<h7> <tt>x.grad</tt> is an attribute of tensor $x$, so we don't use parentheses.

<h7> Slope of the polynomial at the point(2,63) =93 </h7>

##########################################################################

<h4> <a><u>  Back-propagation on multiple steps

In [17]:
# STEP 1
# create tensor
x = torch.tensor([[1.,2,3],[3,2,1]], requires_grad= True)
print(x)

tensor([[1., 2., 3.],
        [3., 2., 1.]], requires_grad=True)


In [18]:
# STEP 2
# Create the first layer with y = 3x+2
y = 3*x + 2
print(y)

tensor([[ 5.,  8., 11.],
        [11.,  8.,  5.]], grad_fn=<AddBackward0>)


In [19]:
# STEP 3
# Create the second layer with z = 2y^2
z = 2*y**2
print(z)

tensor([[ 50., 128., 242.],
        [242., 128.,  50.]], grad_fn=<MulBackward0>)


In [20]:
# STEP 4
# Set the output to be the matrix mean
out = z.mean()
print(out)

# General mean of all values

tensor(140., grad_fn=<MeanBackward0>)


In [21]:
# STEP 5
# Now perform back-propagation to find the gradient of x w.r.t out

out.backward()

print(x.grad)


tensor([[10., 16., 22.],
        [22., 16., 10.]])


You should see a 2x3 matrix. If we call the final <tt>out</tt> tensor "$o$", we can calculate the partial derivative of $o$ with respect to $x_i$ as follows:<br>

$o = \frac {1} {6}\sum_{i=1}^{6} z_i$<br>

$z_i = 2(y_i)^2 = 2(3x_i+2)^2$<br>

To solve the derivative of $z_i$ we use the <a>chain rule</a>, where the derivative of $f(g(x)) = f'(g(x))g'(x)$<br>

Therefore,<br>

$\frac{\partial o}{\partial x_i} = \frac{1}{6}\times 12(3x+2)$<br>

$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = 2(3(1)+2) = 10$

$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=2} = 2(3(2)+2) = 16$

$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=3} = 2(3(3)+2) = 22$

<h4> <a><u>  Turn off tracking

###### There may be times when we don't want or need to track the computational history.

###### You can reset a tensor's <tt>requires_grad</tt> attribute in-place using `.requires_grad_(True)` (or False) as needed.

###### When performing evaluations, it's often helpful to wrap a set of operations in `with torch.no_grad():`

###### A less-used method is to run `.detach()` on a tensor to prevent future computations from being tracked. This can be handy when cloning a tensor.