<a href="https://vigneashpandiyan.github.io/publications/Codes/" target="_blank" rel="noopener noreferrer">
  <img src="https://vigneashpandiyan.github.io/images/Link.png"
       style="max-width: 800px; width: 100%; height: auto;">
</a>

 Auto-differentiation
================

Auto-differentiation is a feature in libraries like PyTorch and TensorFlow that automatically calculates derivatives, or gradients, for you. The library keeps track of every step and calculates the gradients automatically, no matter how complicated your calculations are. This makes it much easier and faster to train machine learning models.

If you flag a torch Tensor with the attribute `x.requires_grad=True`, then pytorch will automatically keep track the computational history of all tensors that are derived from `x`.  This allows pytorch to figure out derivatives of any scalar result with regard to changes in the components of x.

The function `torch.autograd.grad(output_scalar, [list of input_tensors])` computes `d(output_scalar)/d(input_tensor)` for each input tensor component in the list.  For it to work, the input tensors and output must be part of the same `requires_grad=True` computation.

In the example here, `x` is explicitly marked `requires_grad=True`, so `y.sum()`, which is derived from `x`, automatically comes along with the computation history, and can be differentiated.

In [None]:
import torch
from matplotlib import pyplot as plt

x = torch.linspace(0, 5, 100,
          requires_grad=True)
y = (x**2).cos()
s = y.sum()
[dydx] = torch.autograd.grad(s, [x])

plt.plot(x.detach(), y.detach(), label='y')
plt.plot(x.detach(), dydx, label='dy/dx')
plt.legend()
plt.show()

(Note that in the example above, because the components of the vector space are independent of each other, we happen to have `dy[j] / dx[i] == 0` when `j != i`, so that `d(y.sum())/dx[i] = dy[i]/dx[i]`.  That means computing a single gradient vector of the sum `s` is equivalent to computing elementwise derivatives `dy/dx`.)

**Detaching tensors from the computation history.**

Every tensor that depends on `x` will be `requires_grad=True` and connected to the complete computation history. But if you were to convert a tensor to a regular python number, pytorch would not be able to see the calculations and would not be able to compute gradients on it.

To avoid programming mistakes where some computation invisibly goes through a non-pytorch number that cannot be tracked, pytorch disables requires-grad tensors from being converted to untrackable numbers.  You need to explicitly call `x.detach()` or `y.detach()` first, to explicitly say that you want an untracked reference, before plotting the data or using it as non-pytorch numbers.

### Exercise

Plot the polynomial y=x<sup>3</sup>-6x<sup>2</sup>+8x and its derivative

In [None]:
import torch
import matplotlib.pyplot as plt
import numpy as np

def plot_pytorch_derivative():
    # 1. SETUP DATA
    # Create x values from -2 to 6
    x_np = np.linspace(-2, 6, 100)

    # Convert to PyTorch Tensor and enable gradient tracking
    x = torch.tensor(x_np, requires_grad=True, dtype=torch.float32)

    # 2. DEFINE FUNCTION
    # y = x^3 - 6x^2 + 8x
    y = x**3 - 6*x**2 + 8*x

    # 3. COMPUTE DERIVATIVE
    # We call backward() to compute gradients.
    # Since y is a vector, we pass torch.ones_like(x) to get element-wise derivatives.
    y.backward(torch.ones_like(x))

    # Retrieve the calculated derivative from x.grad
    dy = x.grad

    # 4. PLOT
    plt.figure(figsize=(10, 6))

    # We must use .detach().numpy() to convert PyTorch tensors back to standard numPy arrays for plotting
    plt.plot(x.detach().numpy(), y.detach().numpy(), label='$y = x^3 - 6x^2 + 8x$')
    plt.plot(x.detach().numpy(), dy.detach().numpy(), label="Derivative ($y'$)", linestyle='--')

    plt.axhline(0, color='black', alpha=0.3)
    plt.axvline(0, color='black', alpha=0.3)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.title("Function vs Derivative (Calculated by PyTorch)")
    plt.xlabel("x")
    plt.ylabel("y")

    plt.show()

if __name__ == "__main__":
    plot_pytorch_derivative()

More tricks
-----------

**Gradients over intermediate values.** Normally gradients with respect to intermediate values are not stored in `.grad` - just original input variables - but you can ask for intermediate gradients to be stored using `v.retain_grad()`.

**Second derivatives.** If you want higher-order derivatives, then you want pytorch to build the computation graph when it is computing the gradient itself, so this graph can be differentiated again.  To do this, use the `create_graph=True` option on the `grad` or `backward` methods.

**Gradients of more than one objective.** Usually you only need to call `y.backward()` once per computation tree, and pytorch will not let you call it again. To save memory, pytorch will have deallocated the computation graph after you have computed a single gradient.  But if you need more than one gradient (e.g., if you have different objectives that you want to apply to different subsets of parameters, as with happens with GANs sometimes), you can use `retain_graph=True`.


In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

def visualize_advanced_autograd():
    # 1. SETUP DATA
    # Create a vector of x values
    x_np = np.linspace(-6, 6, 200)
    x = torch.tensor(x_np, dtype=torch.float32, requires_grad=True)

    # 2. DEFINE FUNCTION
    # y = sin(x) * (x^2 / 4)
    # We use this function because it has interesting curves
    y = torch.sin(x) * (x**2) / 4

    # 3. FIRST DERIVATIVE (The Trick: create_graph=True)
    # We sum() y because autograd expects a scalar, but mathematically
    # this computes element-wise derivatives for our independent x values.
    # create_graph=True is the KEY: It lets us differentiate this result again.
    grads_1 = torch.autograd.grad(y.sum(), x, create_graph=True)[0]

    # 4. SECOND DERIVATIVE
    # Now we differentiate the gradients themselves!
    grads_2 = torch.autograd.grad(grads_1.sum(), x)[0]

    # 5. VISUALIZATION
    plt.figure(figsize=(12, 7))

    # Detach tensors to convert to numpy for plotting
    x_val = x.detach().numpy()
    y_val = y.detach().numpy()
    dy_val = grads_1.detach().numpy()
    ddy_val = grads_2.detach().numpy()

    # Plot all three curves
    plt.plot(x_val, y_val, label=r'Function $f(x)$', linewidth=3, color='blue')
    plt.plot(x_val, dy_val, label=r"1st Derivative $f'(x)$", linewidth=2, linestyle='--', color='orange')
    plt.plot(x_val, ddy_val, label=r"2nd Derivative $f''(x)$", linewidth=2, linestyle=':', color='green')

    plt.title("Advanced Autograd: Visualizing Higher-Order Derivatives\n(Using create_graph=True)", fontsize=14)
    plt.xlabel("Input (x)")
    plt.ylabel("Output (y)")
    plt.axhline(0, color='black', alpha=0.3)
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Add a note about the trick
    plt.text(-5.5, 4,
             "TRICK USED:\ncreate_graph=True\n\nAllows differentiating\nthe gradients themselves!",
             fontsize=10, bbox=dict(facecolor='white', alpha=0.9))

    plt.show()

if __name__ == "__main__":
    visualize_advanced_autograd()