# <b>Autograd</b>

In this section we talk about automatic differentiation calculating with tensors. <br>
 <b>Why do we need autograd?</b><br>
 We need to differentiation on backward propagation's step for our loss function and optimizing tensor computing process. In the deep learning training loops, we use this computing and chain-rule differentiation. <br>Pytorch is help us on this with <b>'autograd'.</b> <br> <br> 
We will look closely at <b>requires_grad, backward, chain-rule (Differentation), Custom Forward and Backward Operations</b>


### <i>"Creating Tensor is the first step of Backward Propagation"</i>

Why do we say like this because when we create tensors, we specify whether they will be included in gradient calculations or not with the help of requires_grad property.

In [1]:
import torch

In [2]:
x = torch.tensor(3.0, requires_grad =True) # creating tensor with basic torch operations
x.requires_grad

True

In [3]:
x2 = torch.tensor( [0.57, 0.86, 0.95],requires_grad=True)
x2

tensor([0.5700, 0.8600, 0.9500], requires_grad=True)

### Basic Computation Graph and Backward Process

The <code>backward()</code> method can only be used directly when the output tensor is a scalar.<br>
If we want to compute gradients for vector-valued tensors, PyTorch’s autograd does not automatically know how to handle them. In such cases, we must either provide a <b>gradient argument</b> to <code>backward()</code> or use the <b>Jacobian matrix</b> for vector, matrix (2D), or higher-dimensional outputs.<br><br>

<b>&lt;&lt;&lt; WARNING &gt;&gt;&gt;</b><br>
If we create a tensor like this and try to use autograd:<br><br>

<pre>
<code>
x = torch.tensor([0.5, 0.8, 0.95], requires_grad=True)
y = x**3 * 2 + 4
y.backward()
x.grad
</code>
</pre>

the output will be something like:<br>
<i>RuntimeError: grad can be implicitly created only for scalar outputs</i>


In [4]:
x = torch.tensor(4.0, requires_grad=True)
y = 3 * x**2 + 2 * x + 1
y.backward()  # work clearly because output tensor is scalar (y)

print("y =", y.item())
print("dy/dx =", x.grad.item())

y = 57.0
dy/dx = 26.0


Example for chain-rule;

In [5]:
x = torch.tensor(2.0, requires_grad=True)

a = x**2       
b = 3*a + 5     
c = torch.sin(b) 
d = 2 * c  

d.backward()

print(f"x: {x.item()}")
print(f"a: {a.item()}")
print(f"b: {b.item()}")
print(f"c: {c.item()}")
print(f"d: {d.item()}")
print(f"Gradient dd/dx: {x.grad.item()}")

x: 2.0
a: 4.0
b: 17.0
c: -0.9613974690437317
d: -1.9227949380874634
Gradient dd/dx: -6.6039204597473145


 ![Derivatives](data/images/image.png) ![Chain-Rule](data/images/image-1.png) ![Derivatives2](data/images/image-2.png) ![Derivatives3](data/images/image-3.png) ![Derivative4](data/images/image-4.png)

An also simple example with graph from;
[Image Link from Another Repo](https://github.com/rasbt/stat479-deep-learning-ss19/blob/master/L06_pytorch/code/pytorch-autograd.ipynb)

<img src="data/images/computationGraph.png" alt="computation graph" width="800"/>

### Jakobian Matris Calculation <br>
 The Jacobian matrix contains all first-order partial derivatives of a vector-valued function. For 
𝑦=𝑓(𝑥)
y=f(x), where 𝑦 and 𝑥
x are vectors, the Jacobian is a matrix where each element (i,j) is ∂x <i>j</i>/∂y <i>i</i>


In [6]:
# Define input vector
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# Define a vector-valued function
def vector_func(x):
    # y = [x0 * x1, x1 ** 2, x2 + x0]
    return torch.stack([
        x[0] * x[1],
        x[1] ** 2,
        x[2] + x[0]
    ])

# Calculate the Jacobian matrix
jacobian = torch.autograd.functional.jacobian(vector_func, x)

print("Jacobian matrix:\n", jacobian)

Jacobian matrix:
 tensor([[2., 1., 0.],
        [0., 4., 0.],
        [1., 0., 1.]])


<img src="data/images/jacobian.png" alt="Calculating Jacobian Matris" width="500"/>

### Custom Autograd Function (Forward-Backward) <br>
Now that we have seen how PyTorch’s autograd automatically handles gradients and the chain rule, let’s take a step back and implement the same logic manually. By coding the forward and backward passes ourselves, we can clearly see what happens during backpropagation—and appreciate how much easier autograd makes our lives.

In [7]:
class MyManualPolyTensor:
    def forward(self, x):
        self.x = x
        self.y = 3 * x**2 + 2 * x + 1
        self.z = self.y**3 * 5 + 8
        return self.z

    def backward(self):
        # dy/dx = 6x + 2
        # dz/dy = 15 * y^2
        dy_dx = 6 * self.x + 2
        dz_dy = 15 * self.y ** 2
        dz_dx = dz_dy * dy_dx
        return dz_dx

x = torch.tensor(4.0)  # requires_grad=False!

m = MyManualPolyTensor()
z = m.forward(x)
grad = m.backward()
print(f"x: {x.item()}, z: {z.item()}, dz/dx: {grad.item()}")

x: 4.0, z: 925973.0, dz/dx: 1267110.0


When we need custom behavior in the autograd system—such as defining our own forward and backward passes—we subclass torch.autograd.Function and implement the forward and backward static methods ourselves.
<ul>
    <li>In the forward method, we compute and return the output(s) given the input(s), and we can save any tensors needed for the backward pass using ctx.save_for_backward.</li>
    <li>In the backward method, we receive the gradient of the output with respect to some loss and must compute and return the gradient(s) with respect to each input.</li>
</ul>

In [8]:
class MySquare(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        # In the forward pass, we receive a Tensor containing the input and return a Tensor containing the output.
        ctx.save_for_backward(input)  # We save input for use in the backward pass.
        return input ** 2

    @staticmethod
    def backward(ctx, grad_output):
        # In the backward pass, we receive a Tensor containing the gradient of the loss with respect to the output,
        # and we need to compute the gradient of the loss with respect to the input.
        input, = ctx.saved_tensors
        grad_input = 2 * input * grad_output  # We apply the chain rule.
        return grad_input

# Now we use our custom autograd function.
x = torch.tensor(3.0, requires_grad=True)
y = MySquare.apply(x)
y.backward()
print(f"x: {x.item()}, y: {y.item()}, dy/dx: {x.grad.item()}")

x: 3.0, y: 9.0, dy/dx: 6.0
