# 04 - Autograd and Automatic Differentiation in PyTorch

This notebook covers the concepts and functionalities related to PyTorch's autograd system for automatic differentiation. The focus is on understanding gradients and backpropagation, using `requires_grad`, `.backward()`, and `torch.autograd.grad()`, calculating higher-order gradients, and managing memory with PyTorch's dynamic computation graph.

## 1. Understanding Gradients and Backpropagation

Gradients represent the partial derivatives of a function with respect to its inputs. Backpropagation is the process of computing these gradients in reverse order from the output to the input of a neural network, allowing for weight updates during training.
In PyTorch, gradients are automatically computed for tensors with `requires_grad=True`.


In [None]:
# Example 1: Computing Gradients with PyTorch
import torch

# Define a tensor with requires_grad=True
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2  # y = x^2

# Compute gradient
y.backward()
print('Gradient of y with respect to x:', x.grad)

In [None]:
# Example 2: Gradient of a Multi-variable Function
# Define tensors
x = torch.tensor(1.0, requires_grad=True)
z = torch.tensor(2.0, requires_grad=True)
y = x * z + x ** 2

# Compute gradients
y.backward()
print('Gradients (dy/dx):', x.grad)
print('Gradients (dy/dz):', z.grad)

## 2. Using `requires_grad`, `.backward()`, and `torch.autograd.grad()`

These tools are essential for performing gradient computations in PyTorch:
- **`requires_grad`**: Enables tracking of operations on a tensor.
- **`.backward()`**: Computes the gradients of the scalar output with respect to the input.
- **`torch.autograd.grad()`**: Provides a finer level of control over gradient computation.


In [None]:
# Example 3: Using torch.autograd.grad()
# Define tensors
a = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(4.0, requires_grad=True)
c = a * b

# Compute gradient manually
grad_c_a = torch.autograd.grad(c, a, retain_graph=True)
print('Gradient of c with respect to a:', grad_c_a[0])

In [None]:
# Example 4: Using retain_graph in backward()
# Define tensor
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3

# Compute first gradient
y.backward(retain_graph=True)
print('First Gradient:', x.grad)

# Compute another gradient
y.backward()
print('Accumulated Gradient:', x.grad)

## 3. Higher-Order Gradients, Jacobian, and Hessian Calculations

Higher-order gradients (second derivatives, etc.) are often required in optimization algorithms. PyTorch provides tools for calculating higher-order gradients, Jacobians, and Hessians efficiently.


In [None]:
# Example 5: Calculating Higher-Order Gradients
# Define tensor
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3

# Compute first derivative
grad1 = torch.autograd.grad(y, x, create_graph=True)[0]
print('First Derivative:', grad1)

# Compute second derivative
grad2 = torch.autograd.grad(grad1, x)[0]
print('Second Derivative:', grad2)

In [None]:
# Example 6: Calculating Jacobian Matrix
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x ** 2

# Define function for Jacobian calculation
jacobian = torch.autograd.functional.jacobian(lambda x: x ** 2, x)
print('Jacobian Matrix:', jacobian)

## 4. Managing and Optimizing Memory Usage with PyTorch's Dynamic Computation Graph

PyTorch builds a dynamic computation graph at runtime, which can help optimize memory usage. Techniques such as clearing gradients, detaching tensors, and using `torch.no_grad()` can help manage memory effectively.


In [None]:
# Example 7: Clearing Gradients to Save Memory
x = torch.tensor(1.0, requires_grad=True)
y = x ** 2
y.backward()
print('Gradient before clearing:', x.grad)
x.grad.zero_()
print('Gradient after clearing:', x.grad)

In [None]:
# Example 8: Using torch.no_grad() to Prevent Memory Leaks
# Prevents tracking of operations
with torch.no_grad():
    a = torch.tensor(1.0, requires_grad=True)
    b = a * 2
    print('Result without tracking gradients:', b)

In [None]:
# Example 9: Detaching Tensors to Free Memory
x = torch.tensor([2.0, 3.0], requires_grad=True)
y = x ** 2
z = y.detach()  # Detach tensor from computation graph
print('Detached Tensor:', z)

In [None]:
# Example 10: Optimizing Memory with torch.cuda.memory_allocated()
# Check memory usage (only relevant if CUDA is available)
if torch.cuda.is_available():
    print('Memory allocated on GPU:', torch.cuda.memory_allocated())

## Exercises

1. Compute the gradient of a multi-variable function and interpret the results.
2. Use `torch.autograd.grad()` to compute custom gradients for a specific function.
3. Calculate the Jacobian matrix for a non-linear function.
4. Calculate higher-order gradients for different functions and observe their values.
5. Use `torch.no_grad()` to prevent gradient tracking during certain operations.
6. Detach tensors from the computation graph to save memory and compare results with tracked gradients.
7. Check memory usage before and after performing tensor operations on the GPU (if available).
8. Implement a function that computes both the gradient and the Hessian matrix of a scalar-valued function.
9. Explore the effect of gradient accumulation and how to manage it using `.zero_()` method.
10. Demonstrate the use of `retain_graph` in different scenarios to manage memory usage during backpropagation.