<a href="https://colab.research.google.com/github/vijaygwu/IntroToDeepLearning/blob/main/AutoGradPyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Automatic Differentiation (Autograd) in PyTorch: A Key to Deep Learning

**Summary:** Autograd in PyTorch is a powerful tool that automates the gradient computation process, making it significantly easier to train complex deep learning models. Automatic differentiation, often referred to as "autograd" in the PyTorch world, is the cornerstone of how neural networks are trained efficiently. It's a technique for automatically calculating the gradients (derivatives) of a function with respect to its inputs. In deep learning, this function is typically the loss function of your model, and the inputs are the model's parameters (weights and biases).

****

**How it Works**

1. **Dynamic Computational Graph:**
   * **Building the Graph:** As you perform operations on PyTorch tensors, PyTorch constructs a dynamic computational graph behind the scenes. Each node in this graph represents an operation (e.g., addition, matrix multiplication), and the edges represent the flow of data (tensors) between operations.
   * **Tracking Operations:** PyTorch keeps track of every operation that acts on a tensor, including the input tensors and the function used. This creates a chain of dependencies within the graph.

2. **Forward Pass:**
   * **Computation:**  During the forward pass, you feed your input data through the model, and PyTorch executes the operations in the graph, calculating the output (predictions) and storing intermediate results.

3. **Backward Pass (Backpropagation):**
   * **Gradient Calculation:**  Once you have the loss (a measure of how well your model is performing), you call `.backward()` on the loss tensor. This triggers the backward pass through the computational graph.
   * **Chain Rule:**  PyTorch applies the chain rule of calculus to automatically compute the gradients of the loss with respect to each parameter in the model. It does this by traversing the graph in reverse, using the stored intermediate results and the derivatives of each operation.
   * **Gradient Accumulation:**  The gradients for each parameter are accumulated into the `.grad` attribute of the corresponding tensor.

4. **Parameter Update:**
   * **Optimizer:**  An optimizer (e.g., `torch.optim.SGD`, `torch.optim.Adam`) uses the computed gradients to update the model's parameters, aiming to minimize the loss.

**Key Advantages:**

* **Efficiency:** Autograd eliminates the need to manually derive and implement complex gradient calculations, making deep learning model development much faster and less error-prone.
* **Flexibility:**  The dynamic graph construction allows for easy experimentation with different model architectures and control flow within your code.
* **GPU Acceleration:**  Since PyTorch tensors can reside on GPUs, the gradient computations can also be performed on the GPU, leading to significant speedups.




**Example 1**

In [1]:
import torch

# Create tensors and require gradients
x = torch.tensor([2.0], requires_grad=True)
w = torch.tensor([3.0], requires_grad=True)
b = torch.tensor([1.0], requires_grad=True)

# Define a simple computation
y = w * x + b

# Compute gradients
y.backward()

# Access the computed gradients
print(x.grad)  # Output: tensor([3.])
print(w.grad)  # Output: tensor([2.])
print(b.grad)  # Output: tensor([1.])

tensor([3.])
tensor([2.])
tensor([1.])


**Explanation:**

1. We create tensors `x`, `w`, and `b` and set `requires_grad=True` to tell PyTorch to track operations on them.
2. We perform a computation (`y = w * x + b`), building a computational graph.
3. Calling `y.backward()` triggers backpropagation, calculating the gradients of `y` with respect to `x`, `w`, and `b`.
4. The gradients are stored in the `.grad` attribute of each tensor.

**Example 2**

In [4]:
import torch
import torch.nn as nn

# Define a simple neural network
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(in_features=5, out_features=3)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(in_features=3, out_features=1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create an instance of the network
model = MyNet()

# Input data and target (ground truth)
input_data = torch.randn(1, 5)
target = torch.randn(1, 1)

# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Forward pass
output = model(input_data)
loss = criterion(output, target)

# Backward pass (autograd in action!)
optimizer.zero_grad()
loss.backward()

# Print gradients for each parameter
for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"Gradient for {name}: {param.grad}")

# Parameter update
optimizer.step()

print("Updated Parameters:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data}")

print("Loss:", loss.item())
print("Output:", output.item())
print("Target:", target.item())
print("Model Output:", model(input_data).item())
print("Model Output:", model(input_data))


Gradient for fc1.weight: tensor([[ 0.2684,  0.6162,  0.2490,  0.5346, -0.2217],
        [ 0.0000,  0.0000,  0.0000,  0.0000, -0.0000],
        [-0.0974, -0.2235, -0.0903, -0.1939,  0.0804]])
Gradient for fc1.bias: tensor([ 0.5501,  0.0000, -0.1995])
Gradient for fc2.weight: tensor([[1.1551, 0.0000, 0.8438]])
Gradient for fc2.bias: tensor([1.2540])
Updated Parameters:
fc1.weight: tensor([[ 0.1212,  0.2940,  0.1535,  0.4415, -0.0450],
        [-0.3487, -0.1964, -0.2726,  0.0353,  0.2814],
        [ 0.2427, -0.0500,  0.4210, -0.1063, -0.2250]])
fc1.bias: tensor([-0.0050,  0.0597,  0.4400])
fc2.weight: tensor([[ 0.4271,  0.2673, -0.1675]])
fc2.bias: tensor([-0.0380])
Loss: 0.3931587040424347
Output: 0.2715688645839691
Target: -0.35545486211776733
Model Output: 0.23250102996826172
Model Output: tensor([[0.2325]], grad_fn=<AddmmBackward0>)




**Explanation**

1. **Neural Network Definition:** We define a simple neural network with two fully connected layers and a ReLU activation function in between.

2. **Input and Target:** We create random input data and a target value to simulate a training example.

3. **Loss and Optimizer:**
   * **Loss Function:** We use Mean Squared Error (MSE) loss to measure the discrepancy between the model's output and the target.
   * **Optimizer:** We choose Stochastic Gradient Descent (SGD) to update the model's parameters.

4. **Forward Pass:**
   * `output = model(input_data)`: This line feeds the input data through the network, executing the `forward` method of `MyNet`.
   * `loss = criterion(output, target)`: Calculates the loss by comparing the model's output to the target.
   * **Dynamic Graph:** During this forward pass, PyTorch constructs a dynamic computational graph, keeping track of all operations performed on the tensors.

5. **Backward Pass (Autograd's Magic):**
   * `loss.backward()`: This single line initiates the backward pass (backpropagation) through the computational graph.
   * **Gradient Computation:** PyTorch automatically applies the chain rule to calculate the gradients of the loss with respect to each parameter (weights and biases) in the network.
   * **Gradient Storage:** The computed gradients are stored in the `.grad` attribute of each parameter tensor.

6. **Parameter Update:**
   * `optimizer.step()`:  The optimizer uses the calculated gradients to update the model's parameters, nudging them in a direction that aims to reduce the loss.

**Key Points:**

* **No Manual Gradient Calculation:** You don't need to explicitly derive or implement the gradient calculations for each operation in your network. PyTorch handles it automatically.
* **Dynamic Graph Construction:** The graph is built as your code executes, offering flexibility for complex models and control flow.
* **Efficiency:** The automatic differentiation process is highly optimized, especially when leveraging GPUs.

**In Conclusion:**

This example showcases how PyTorch's automatic differentiation simplifies the training process of neural networks. By automatically computing gradients, PyTorch empowers you to focus on designing and experimenting with models without the burden of manual gradient calculations.


