In [None]:
import torch

# Step 1: Define tensors with 'requires_grad=True' to track gradients
a = torch.tensor([2.0], requires_grad=True)  # Leaf node in the computational graph
b = torch.tensor([1.0], requires_grad=True)  # Leaf node in the computational graph

# Print the initial tensors
print(f"Initial Tensors:\na = {a.item()}, b = {b.item()}\n")

# Step 2: Perform operations to build the computational graph
c = a + b  # Intermediate node (not a leaf node)
d = b + 1  # Intermediate node (not a leaf node)
e = c * d  # Final result (output)

# Retain gradients for intermediate variables
c.retain_grad()  # Not required for leaf nodes, only for intermediate nodes
d.retain_grad()  # Retaining gradients for 'd'
e.retain_grad()  # Retaining gradients for 'e'

# Print the intermediate results
print(f"Intermediate Results:\nc = a + b = {c.item()}\nd = b + 1 = {d.item()}\ne = c * d = {e.item()}\n")

# Step 3: Backpropagate to compute gradients
e.backward()  # Compute gradients with respect to 'e'

# Step 4: Display gradients
print("Gradients after backpropagation:")
print(f"a.grad = {a.grad.item()}")  # Gradient of 'e' with respect to 'a'
print(f"b.grad = {b.grad.item()}")  # Gradient of 'e' with respect to 'b'"
print(f"c.grad = {c.grad.item()}")  # Gradient of 'e' with respect to 'c'
print(f"d.grad = {d.grad.item()}")  # Gradient of 'e' with respect to 'd'"
print(f"e.grad = {e.grad.item()}")         # 'e' is a scalar, so its gradient is 1

### Key Explanations

1. **Leaf Nodes (`a` and `b`)**:  
   These are the initial tensors where we set `requires_grad=True`. PyTorch tracks all operations involving these tensors to enable automatic differentiation.

2. **Intermediate Nodes (`c` and `d`)**:  
   These are results of operations (`a + b` and `b + 1` respectively). While gradients are not retained by default for intermediate nodes, we explicitly call `retain_grad()` for demonstration.

3. **The Computational Graph**:  
   PyTorch dynamically constructs a computational graph behind the scenes. Each operation (e.g., addition, multiplication) creates new nodes in the graph, linking inputs to outputs. When we call `backward()` on the final result (`e`), gradients are propagated back through the graph.

4. **Gradient Calculation**:
   - `e.backward()` computes the gradient of `e` with respect to `a`, `b`, `c`, and `d`.
   - Leaf nodes (`a` and `b`) have their gradients populated, and we can print them using `a.grad` and `b.grad`.
   - Intermediate nodes (`c` and `d`) also have gradients, which we access via `c.grad` and `d.grad` (because we retained these gradients).