# Feature Visualisation 

In this chapter we uncover:
- What patterns are individual neurons actually detecting
- Where in the image the network is focusing to make a descision
- How to understand what features the model is extracting

We'll be following 2 main approaches:
1. Feature-based explanantions $\to$ DeconvNet
2. Descision-Based Explanations $\to$ CAM

## Visualizing The Weights 

The most basic inspection approach is to directly visualize the learned weights, primarily applicable to the **first convolutional layer**.

---




### **What We See in First Layer Weights**

First layer filters operate on raw RGB pixels, making them interpretable as visual templates:

<div align="center">
<img src="../images/chap8/InitialLayer.png" width="400"/>
</div>

| **Pattern** | **What It Detects** |
|-------------|---------------------|
| **Edge detectors** | Oriented gradients (horizontal, vertical, diagonal) |
| **Color blobs** | Specific color combinations (red, green, blue) |
| **Gabor-like patterns** | Textures at various orientations and frequencies |

---

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

Let the first convolution layer have wieghts 
$$
\mathbf{W} \in \mathbb{R}^{C_{\text{out}} \times C_{\text{in}} \times K_h \times K_w}
$$

where 
- $C_{out}$ = Number of output channels
- $C_{in}$ = Number of input channels (e.g. 3 for RGB)
- $K_h, K_w$ = Kernel weights heigh and width (e.g. 11x11 AlexNet)

Each filter $\mathbf{W}_i$ is a tensor shape $[C_{in}, K_h, K_w]$

To visualize filter $i$:
1. **Extract $\mathbf{W}_i \in \mathbb{R}^{C_{in} \times K_h \times K_w}$**
2. **Permute** to $[K_h, K_w, C_in]$ for image display
3. **Normalize** values to [0,1] for visualisation:

$$\mathbf{W}_i^{\text{norm}} = \frac{\mathbf{W}_i - \text{min}(\mathbf{W}_i)}{\text{max}(\mathbf(W_i))\text{min}(\mathbf{W}_i)}$$

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
import torch
import matplotlib.pyplot as plt

# Load pretrained model
model = torchvision.models.alexnet(pretrained=True)

# Extract first conv layer weights: [out_channels, in_channels, height, width]
first_layer_weights = model.features[0].weight.data  # Shape: [64, 3, 11, 11]

# Visualize first 64 filters
fig, axes = plt.subplots(8, 8, figsize=(10, 10))
for i, ax in enumerate(axes.flat):
    # Normalize to [0, 1] for display
    weight = first_layer_weights[i].permute(1, 2, 0)  # [11, 11, 3]
    weight = (weight - weight.min()) / (weight.max() - weight.min())
    ax.imshow(weight.cpu().numpy())
    ax.axis('off')
plt.tight_layout()
plt.show()
```

</td>
</tr>
</table>

### **Limitations**

| **Problem** | **Why It Fails** |
|-------------|------------------|
| **Only Layer 1** | Deeper layers have abstract, high-dimensional features (e.g., 3√ó3√ó256 tensors) ‚Äî uninterpretable blobs |
| **No context** | Shows *what* filters detect, not *where* they activate on real images |
| **Static view** | Ignores how features compose hierarchically through the network |

**Better approach:** Visualize **activations** and **gradients** instead ‚Üí DeconvNet, CAM.

---
---

## Maximally Activating Patches

**Key Question:** "What kind of input patterns cause a specific neuron to activate most strongly?"

---

### **The Approach**

Since a neuron's **receptive field** determines what it can "see" (and this grows with network depth), we can find image patches that maximally excite specific neurons:

**Algorithm:**
1. Select a target neuron/feature map at any layer
2. Run thousands of images through the network
3. Record activation values for that neuron
4. Extract and visualize the top-K image patches with highest activations


---
<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

Given a convolutional layer's activations: $\mathbf{A} \in \mathbb{R}^{C \times H \times W}$ be the activation tensor for a given image, where:

- $C$ = The number of chnanels (filters)
- $H, W$ = Spatial dimensions

for a chosen filter $c^*$:
1. **Extract the activation map** fpr filter $C^*$: $$a_{c^*} \in \mathbb{R}^{H \times W}$$
2. **Find the maximally activating location:** $$(i^*, j^*) = \mathbf{argmax}_{(i,j)}\mathbf{a}_{c^*}[i,j]$$
3. **Extract the corresponding recepive feild patch** from the input image that led to this activation at $(i^*, j^*)$
   - **Retreive Layer Parameters**, For each conv. and Pooling layer up to target layer collect {kernel size, stride, padding, dialiation (if used)}.
   - **Compute the RF size**
   - **Compute the input Coordinates cooresponding to $(i^*, j^*)$** for a single layer: 
     - $x_{center} = s \cdot i^* - p + \lfloor\frac{k-1}{2}\rfloor$
     - $y_{center} = s \cdot j^* - p + \lfloor\frac{k-1}{2}\rfloor$
   - **Detemine the Patch Bounds** from $(x_{center}, y_{center})$ which is the RF size.
   - **Crop the patch from the input image**
4. **Repeat for all images** in the dataset, and collect the top-$K$ patches with the highest activation for filter $c^*$

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
import torch
import numpy as np
from torchvision import models, transforms
from PIL import Image

# Load pretrained model
model = models.vgg16(pretrained=True).eval()

# Hook to capture activations
activations = {}
def get_activation(name):
    def hook(model, input, output):
        activations[name] = output.detach()
    return hook

# Register hook on target layer (e.g., conv3_3)
model.features[16].register_forward_hook(get_activation('conv3_3'))

# Process images
patches = []
for img_path in image_dataset:
    img = Image.open(img_path)
    img_tensor = transforms.ToTensor()(img).unsqueeze(0)
    
    # Forward pass
    model(img_tensor)
    
    # Extract activation for specific filter (e.g., filter 42)
    activation_map = activations['conv3_3'][0, 42]  # [H, W]
    max_row, max_col = np.unravel_index(activation_map.argmax(), activation_map.shape)
    
    # Extract corresponding receptive field patch from original image
    patch = extract_receptive_field_patch(img, max_row, max_col, layer_depth=16)
    patches.append((activation_map.max().item(), patch))

# Display top-9 patches
top_patches = sorted(patches, reverse=True)[:9]
visualize_patches(top_patches)
```

</td>
</tr>
</table>

<div align="center">
<img src="../images/chap8/Layer1.png" width="270"/>
<img src="../images/chap8/Layer2.png" width="370"/>
<img src="../images/chap8/Layer4.png" width="350"/>
<img src="../images/chap8/Layer6.png" width="350"/>
<p><i>Example: Layer 1, 2, 4 and 6 neuron activates </i></p>
</div>


### **Advantages & Limitations**

| **‚úÖ Advantages** | **‚ùå Limitations** |
|-------------------|-------------------|
| Works for **any layer** (not just Layer 1) | Doesn't explain **which class** the features belong to |
| Data-driven and accurate | Doesn't explain the **final decision** |
| No gradients needed (only forward passes) | Limited by **dataset coverage** (only sees patterns in training data) |
| Produces interpretable patterns at every depth | Doesn't work well for **fully connected layers** (no spatial structure) |

**Key Insight:** Shows *what* activates neurons, but not *why* the network makes specific predictions. For decision explanations, we need **CAM** (Class Activation Mapping).

---
---

## Visualizing the Representation Space

**Key Question:** "What does the network represent as a whole? How does it organize different images internally?"

---



### **The Concept**

**At each layer**, an image is transformed into a high-dimensional vector:
- **Early layers**: Low-level features (edges, textures) ‚Üí vectors encode spatial patterns
- **Deep layers**: High-level features (objects, concepts) ‚Üí vectors encode semantic meaning
- **Key insight**: Similar images produce similar vectors in representation space

**The Problem:** 
Fully connected layers produce high-dimensional vectors (e.g., 4096-dimensional) with **no spatial structure**‚Äîwe can't use patch-based visualization anymore.

---

### **Solution 1: Nearest Neighbors in Feature Space**

**Idea:** Extract feature vectors from a chosen layer, then find images with the most similar vectors.

**Algorithm:**
1. Choose a layer (typically the last FC layer before classification)
2. Extract feature vectors for all images: $\mathbf{x}_i \in \mathbb{R}^d$ where $d$ = 4096
3. For a query image, compute distances: $\text{distance}(\mathbf{x}_{\text{query}}, \mathbf{x}_i) = \|\mathbf{x}_{\text{query}} - \mathbf{x}_i\|_2$
4. Retrieve top-K nearest neighbors (smallest distances)

<div align="center">
<img src="../images/chap8/NNRespSpace.png" width="570"/>
<p><i>Test image (left of red line), L2 nearest neighbors in feature space (right)</i></p>
</div>

**What This Reveals:**
- **Learned similarity**: What the network considers "similar" (may differ from human perception)
- **Category structure**: How well classes are separated in feature space
- **Invariances**: Network ignores pose, lighting, background variations

**Use Cases:**
- **Debugging**: Find mislabeled images (distant from their class cluster)
- **Class confusion**: Identify which classes have overlapping representations
- **Dataset bias**: Detect spurious correlations (e.g., "boats" always near water)

---

### **Implementation: Nearest Neighbors**

```python
import torch
import numpy as np
from torchvision import models, transforms
from sklearn.neighbors import NearestNeighbors

# Load pretrained model
model = models.resnet50(pretrained=True).eval()

# Remove final classification layer to get features
feature_extractor = torch.nn.Sequential(*list(model.children())[:-1])  # Output: [N, 2048, 1, 1]

# Extract features for all images
features = []
for img_path in dataset:
    img = Image.open(img_path)
    img_tensor = preprocess(img).unsqueeze(0)
    
    with torch.no_grad():
        feature = feature_extractor(img_tensor).flatten()  # [2048]
        features.append(feature.cpu().numpy())

features = np.array(features)  # [N, 2048]

# Find nearest neighbors for a query image
nn_model = NearestNeighbors(n_neighbors=10, metric='euclidean')
nn_model.fit(features)

query_feature = features[query_idx].reshape(1, -1)
distances, indices = nn_model.kneighbors(query_feature)

# Visualize results
visualize_nearest_neighbors(query_idx, indices[0])
```

---

### **Solution 2: Low-Dimensional Embeddings**

**The Problem:** Feature vectors are 2048-4096 dimensional‚Äîimpossible to visualize directly.

**The Solution:** Project high-dimensional vectors into 2D/3D while **preserving local distances**.

| **Method** | **How It Works** | **Best For** |
|------------|------------------|--------------|
| **t-SNE** | Non-linear; preserves local neighborhoods via probability distributions | Exploring clusters and local structure |
| **UMAP** | Non-linear; faster than t-SNE, preserves global structure better | Large datasets, hierarchical structure |
| **PCA** | Linear projection onto principal components | Quick overview, preserving variance |

**Key Principle:** Points close in high-dimensional space should remain close in 2D.

<div align="center">
<img src="../images/chap8/2dembed.png" width="600"/>
<p><i>t-SNE visualization of ImageNet features: each color = different class</i></p>
</div>

---

### **Implementation: t-SNE Visualization**

```python
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Extract features (same as above)
features = np.array(features)  # [N, 2048]
labels = np.array(labels)       # [N] class labels

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
features_2d = tsne.fit_transform(features)  # [N, 2]

# Visualize
plt.figure(figsize=(12, 10))
scatter = plt.scatter(features_2d[:, 0], features_2d[:, 1], 
                      c=labels, cmap='tab10', alpha=0.6, s=10)
plt.colorbar(scatter)
plt.title('t-SNE Visualization of Feature Space')
plt.xlabel('t-SNE Dimension 1')
plt.ylabel('t-SNE Dimension 2')
plt.show()
```

---

### **What Embeddings Reveal**

| **Observation** | **Interpretation** |
|-----------------|-------------------|
| **Tight clusters** | Class is well-learned, features are consistent |
| **Overlapping clusters** | Model confuses these classes (e.g., "husky" vs "wolf") |
| **Outliers** | Mislabeled images or unusual examples |
| **Smooth transitions** | Network learns continuous representations (e.g., dog breeds form a continuum) |

---

### **Comparison: When to Use Each**

| **Method** | **When to Use** | **Limitation** |
|------------|-----------------|----------------|
| **Nearest Neighbors** | Find similar images, debug specific examples | Doesn't show overall structure |
| **t-SNE/UMAP** | Visualize overall dataset structure, find clusters | Slow for large datasets; can distort global structure |
| **PCA** | Quick linear projection, preserve variance | May miss non-linear relationships |

**Key Insight:** Representation space visualization shows **how the network organizes its knowledge**‚Äîrevealing both its strengths (semantic clustering) and weaknesses (class confusion).

---
---

## Model Inversion: DeconvNet & Guided Backpropagation

**Key Question:** "Which input pixels caused a specific neuron to activate strongly?"

---



### **The Goal**

We want to **reverse-engineer** what the network "sees" by:
1. Identifying which neurons activate for a given image
2. Tracing back through the network to find which input pixels caused those activations
3. Creating a visualization that highlights the important image regions

**Analogy:** If a neuron fires strongly, we want to ask: "What part of the original image made you so excited?"

---

### **When Does This Happen?**

**‚è∞ After Training (Inference/Analysis Phase)**

- The network is **already trained** and frozen (weights are fixed)
- We're not updating any parameters‚Äîjust analyzing what the network has learned
- This is a **post-hoc analysis** tool for understanding trained models

**Why after training?**
- We want to see what patterns the network *has learned* to detect
- If we did this during training, the weights would still be changing
- Think of it as "interrogating" a trained expert about their decision-making process

---

### **Step-by-Step: How Guided Backpropagation Works**

---

#### **Step 1: Forward Pass an Image**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

For a network with $L$ layers, compute forward activations:

$$\mathbf{a}^{(0)} = \mathbf{x}_{\text{input}}$$

For each layer $l = 1, 2, \ldots, L$:

$$\mathbf{z}^{(l)} = \mathbf{W}^{(l)} \mathbf{a}^{(l-1)} + \mathbf{b}^{(l)}$$

$$\mathbf{a}^{(l)} = f(\mathbf{z}^{(l)})$$

Where:
- $\mathbf{x}_{\text{input}} \in \mathbb{R}^{3 \times H \times W}$ (RGB image)
- $\mathbf{W}^{(l)}$ = weights (conv kernels or FC weights)
- $f(\cdot)$ = activation function (ReLU, etc.)
- $\mathbf{a}^{(l)} \in \mathbb{R}^{C_l \times H_l \times W_l}$ = activation at layer $l$

**At layer $l$:** Output is a 3D tensor with $C_l$ channels (feature maps)

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
import torch
from torchvision import models, transforms
from PIL import Image

# Load trained model (frozen)
model = models.vgg16(pretrained=True).eval()

# Load and preprocess image
img = Image.open('dog.jpg')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
])

# x_input: [1, 3, 224, 224]
img_tensor = preprocess(img).unsqueeze(0)
img_tensor.requires_grad_(True)  # Enable gradient tracking

# Forward pass: compute all activations a^(l)
output = model(img_tensor)
# output: [1, 1000] class probabilities
```

**What we have:**
- Input: $224 \times 224 \times 3$ RGB image
- Activations at every layer stored internally
- Model weights are frozen (no training)

</td>
</tr>
</table>

---

#### **Step 2: Choose a Feature Map and a Specific Activation**

**What does "feature map" mean?**
- At any convolutional layer, the output is a 3D tensor: `[channels, height, width]`
- Each **channel** is a 2D "feature map" (also called an activation map)
- Each feature map represents one learned "detector" (e.g., edge detector, texture detector)

**What does "choose an activation" mean?**
- Within a chosen feature map (channel), pick a **specific spatial location** (one pixel/position)
- This location had a strong activation‚Äîwe want to know why

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

Choose a target layer $l$ and extract activations:

$$\mathbf{A}^{(l)} = \left[\mathbf{a}_1^{(l)}, \mathbf{a}_2^{(l)}, \ldots, \mathbf{a}_{C_l}^{(l)}\right]$$

Where:
- $\mathbf{A}^{(l)} \in \mathbb{R}^{C_l \times H_l \times W_l}$ = all feature maps at layer $l$
- $\mathbf{a}_c^{(l)} \in \mathbb{R}^{H_l \times W_l}$ = feature map for channel $c$

**Choose a specific neuron:**
- Channel: $c^* \in \{1, \ldots, C_l\}$
- Position: $(i^*, j^*) \in \{1, \ldots, H_l\} \times \{1, \ldots, W_l\}$
- Activation value: $a_{c^*, i^*, j^*}^{(l)}$

**Find maximum activation:**

$$c^*, i^*, j^* = \arg\max_{c, i, j} \mathbf{A}^{(l)}[c, i, j]$$

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Hook to capture activations at target layer
activations = {}
def get_activation(name):
    def hook(model, input, output):
        # output: a^(l) for layer l
        activations[name] = output.detach()
    return hook

# Register hook at Conv5_3 (layer 28 in VGG16)
# This captures A^(l) with shape [1, 512, 14, 14]
model.features[28].register_forward_hook(
    get_activation('conv5_3')
)

# Forward pass (already done in Step 1)
model(img_tensor)

# Extract: A^(l) with C_l=512 channels, H_l=W_l=14
layer_activations = activations['conv5_3']
# Shape: [1, 512, 14, 14]

# Choose channel c* = 42
chosen_channel = 42
feature_map = layer_activations[0, chosen_channel]
# Shape: [14, 14] - one feature map a_c^(l)

# Find position (i*, j*) with max activation
max_row, max_col = np.unravel_index(
    feature_map.argmax(), 
    feature_map.shape
)
# Example: max_row=7, max_col=9
```

**What we've identified:**
- $c^* = 42$ (this detector/filter)
- $(i^*, j^*) = (7, 9)$ in $14 \times 14$ grid
- Activation: $a_{42,7,9}^{(l)} = 5.7$ (example value)

</td>
</tr>
</table>

---

#### **Step 3: Zero Out All Values Except the One of Interest**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

Create a **mask** $\mathbf{M}^{(l)}$ that isolates one neuron:

$$\mathbf{M}^{(l)}[c, i, j] = \begin{cases} 
\mathbf{A}^{(l)}[c^*, i^*, j^*] & \text{if } c = c^*, i = i^*, j = j^* \\
0 & \text{otherwise}
\end{cases}$$

**Properties:**
- $\mathbf{M}^{(l)} \in \mathbb{R}^{C_l \times H_l \times W_l}$ (same shape as $\mathbf{A}^{(l)}$)
- Only one non-zero entry: $\mathbf{M}^{(l)}[c^*, i^*, j^*] = a_{c^*, i^*, j^*}^{(l)}$
- All other entries: $\mathbf{M}^{(l)}[c, i, j] = 0$ for $(c, i, j) \neq (c^*, i^*, j^*)$

**This mask represents:** "Only the signal from neuron $(c^*, i^*, j^*)$"

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Create mask M^(l) with same shape as A^(l)
mask = torch.zeros_like(layer_activations)
# Shape: [1, 512, 14, 14], all zeros

# Set only the chosen position to its activation
mask[0, chosen_channel, max_row, max_col] = \
    layer_activations[0, chosen_channel, max_row, max_col]

# Now mask has:
# M^(l)[42, 7, 9] = 5.7 (example)
# M^(l)[c, i, j] = 0 for all other (c,i,j)
```

**Visual Example:**
```
Original A^(l) for channel 42:
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ 0.2  0.5  0.1  ‚îÇ
‚îÇ 0.4  0.8  0.6  ‚îÇ
‚îÇ ...  ...  ...  ‚îÇ
‚îÇ 0.1  5.7  0.4  ‚îÇ ‚Üê position (7,9)
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

After masking M^(l):
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ 0.0  0.0  0.0  ‚îÇ
‚îÇ 0.0  0.0  0.0  ‚îÇ
‚îÇ ...  ...  ...  ‚îÇ
‚îÇ 0.0  5.7  0.0  ‚îÇ ‚Üê only this survives!
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

**Why?** Isolate contribution of this single neuron

</td>
</tr>
</table>

---

#### **Step 4: Propagate Back to Input (Guided Backpropagation)**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formulation**

**Standard Backpropagation:**

For layer $l$ with ReLU activation:

$$\frac{\partial \mathcal{L}}{\partial \mathbf{z}^{(l)}} = \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{(l)}} \odot \mathbb{1}[\mathbf{z}^{(l)} > 0]$$

Where $\mathbb{1}[\cdot]$ is the indicator function.

**Guided Backpropagation (Modified ReLU):**

$$\frac{\partial \mathcal{L}}{\partial \mathbf{z}^{(l)}} = \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{(l)}} \odot \mathbb{1}[\mathbf{z}^{(l)} > 0] \odot \mathbb{1}\left[\frac{\partial \mathcal{L}}{\partial \mathbf{a}^{(l)}} > 0\right]$$

**Key Difference:** Two conditions must both be true:
1. Forward activation was positive: $\mathbf{z}^{(l)} > 0$
2. Backward gradient is positive: $\frac{\partial \mathcal{L}}{\partial \mathbf{a}^{(l)}} > 0$

**Chain Rule (going backwards):**

$$\frac{\partial \mathcal{L}}{\partial \mathbf{x}_{\text{input}}} = \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{(L)}} \cdot \frac{\partial \mathbf{a}^{(L)}}{\partial \mathbf{z}^{(L)}} \cdot \ldots \cdot \frac{\partial \mathbf{a}^{(1)}}{\partial \mathbf{x}_{\text{input}}}$$

With $\mathcal{L} = \mathbf{M}^{(l)}$ (our mask), this gives us the gradient w.r.t. input.

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Custom ReLU implementing guided backprop
class GuidedReLU(torch.nn.Module):
    def forward(self, x):
        # Standard ReLU forward pass
        # z^(l) -> a^(l) = ReLU(z^(l))
        self.save_for_backward(x)
        return torch.clamp(x, min=0)
    
    def backward(self, grad_output):
        # grad_output = dL/da^(l)
        x, = self.saved_tensors  # z^(l)
        
        grad_input = grad_output.clone()
        
        # Condition 1: forward was positive
        grad_input[x <= 0] = 0
        
        # Condition 2: gradient is positive
        grad_input[grad_output <= 0] = 0
        
        # Returns: dL/dz^(l)
        return grad_input

# Replace all ReLUs in model
def replace_relu_with_guided_relu(model):
    for child_name, child in model.named_children():
        if isinstance(child, torch.nn.ReLU):
            setattr(model, child_name, GuidedReLU())
        else:
            # Recursive for nested modules
            replace_relu_with_guided_relu(child)

# Apply to model
replace_relu_with_guided_relu(model)

# Backpropagate from mask
model.zero_grad()
mask.backward(gradient=torch.ones_like(mask))

# Extract gradient: dL/dx_input
gradient = img_tensor.grad.data
# Shape: [1, 3, 224, 224]
```

**What we computed:**

$$\nabla_{\mathbf{x}} \mathbf{M}^{(l)} = \frac{\partial \mathbf{M}^{(l)}}{\partial \mathbf{x}_{\text{input}}}$$

This shows which input pixels contributed to neuron $(c^*, i^*, j^*)$

</td>
</tr>
</table>

---

### **Why Guided Backpropagation Works Better**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Comparison**

**Standard ReLU Backward:**

$$\frac{\partial f}{\partial x} = \begin{cases}
1 & \text{if } x > 0 \\
0 & \text{if } x \leq 0
\end{cases}$$

Passes all gradients where forward was positive (including negative gradients).

**Guided ReLU Backward:**

$$\frac{\partial f}{\partial x} = \begin{cases}
1 & \text{if } x > 0 \text{ AND } \frac{\partial \mathcal{L}}{\partial f(x)} > 0 \\
0 & \text{otherwise}
\end{cases}$$

Only passes positive gradients where forward was also positive.

**Effect on Visualization:**
- Standard: Negative gradients create noisy artifacts
- Guided: Only positive contributions ‚Üí sharper, cleaner images

</td>
<td width="50%" valign="top">

**üíª Comparison Table**

| **Method** | **ReLU Backward Rule** | **Output Quality** |
|------------|------------------------|-------------------|
| **Standard** | Pass if $x > 0$ | Blurry, noisy |
| **Deconvolution** | Pass if $x > 0$ (uses transpose conv) | Somewhat sharp |
| **Guided** | Pass if $x > 0$ AND $\nabla > 0$ | Sharp, clean |

**Example:**
```python
# Standard backprop through ReLU
# Forward: x = [-1, 2, -3, 4]
# Forward output: [0, 2, 0, 4]
# Backward gradient: [-0.5, 0.8, -0.2, 1.1]
# Standard passes: [0, 0.8, 0, 1.1]
#                   ‚Üë zeros from forward

# Guided backprop
# Guided passes: [0, 0.8, 0, 1.1]
#                ‚Üë zeros from forward AND negative grad
# (In this case same, but filters negatives too)
```

**Result:** Guided produces interpretable heatmaps!

</td>
</tr>
</table>

---

### **What the Output Shows**

**Interpretation of the Gradient Visualization:**

<table>
<tr>
<td width="50%">

**üìê Mathematical Meaning**

The output $\nabla_{\mathbf{x}} \mathbf{M}^{(l)}$ represents:

$$\frac{\partial a_{c^*, i^*, j^*}^{(l)}}{\partial \mathbf{x}[p, q, r]}$$

For each input pixel $(p, q)$ in channel $r$.

**High magnitude = strong influence:**
- Large positive: Increasing this pixel increases the activation
- Values shown: How much each pixel "contributed" to neuron firing

**Bright regions:** Pixels that caused strong activation

</td>
<td width="50%">

**üíª Visual Output**

```
Original image: Picture of a dog
Chosen neuron: Channel 137, position (7,9)

Heatmap shows:
üî• Bright areas ‚Üí strong contribution
üåë Dark areas ‚Üí little/no contribution

Example:
- Dog's ears: Bright (high gradient)
- Dog's eyes: Bright (high gradient)
- Background: Dark (low gradient)

Interpretation:
Neuron 137 learned to detect
"furry pointed structures" like ears!
```

</td>
</tr>
</table>

---

## Class Activation Mapping (CAM)

**Key Question:** "Which regions of the image did the network look at to make its classification decision?"

---



### **The Problem CAM Solves**

**Guided Backpropagation tells us:** Which pixels activated a specific neuron<br>
**CAM tells us:** Which regions influenced the final **class prediction**

**Example:**
- Image of a dog
- Network predicts: "Golden Retriever" (95% confidence)
- **CAM shows:** Heatmap highlighting the dog's face and body (regions that made the network say "Golden Retriever")

---
<div align="center">
<img src="../images/chap8/CAMsteps.png" width="800" />
<img src="../images/chap8/whatCAm.png" width="600" />
</div>

---

### **How CAM Works: Step-by-Step**

---

#### **Step 1: Forward Pass and Extract Last Conv Features**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Setup**

**Last convolutional layer output:**

$$\mathbf{F} \in \mathbb{R}^{C \times H \times W}$$

Where:
- $C$ = number of channels (feature maps)
- $H \times W$ = spatial dimensions
- $\mathbf{F}_k \in \mathbb{R}^{H \times W}$ = feature map for channel $k$

**Example dimensions:**
- $C = 512$ channels
- $H = W = 7$ (7√ó7 spatial grid)
- Each $\mathbf{F}_k$ is a 7√ó7 feature map

**Notation:**
$$\mathbf{F}_k[i, j] = \text{activation at position } (i,j) \text{ in channel } k$$

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
import torch
from torchvision import models
from PIL import Image
import matplotlib.pyplot as plt

# Load model with GAP architecture (e.g., ResNet)
model = models.resnet50(pretrained=True).eval()

# Hook to capture last conv layer
last_conv_features = {}
def get_features(name):
    def hook(model, input, output):
        last_conv_features[name] = output.detach()
    return hook

# Register hook at last conv layer (layer4 in ResNet)
model.layer4.register_forward_hook(get_features('layer4'))

# Forward pass
img = Image.open('dog.jpg')
img_tensor = preprocess(img).unsqueeze(0)  # [1, 3, 224, 224]
output = model(img_tensor)  # [1, 1000]

# Extract features: F ‚àà R^{C√óH√óW}
F = last_conv_features['layer4']  # Shape: [1, 512, 7, 7]
# 512 channels, 7√ó7 spatial grid
```

**What we have:**
- $\mathbf{F}$: [1, 512, 7, 7] tensor
- 512 feature maps, each 7√ó7
- Each position $(i,j)$ in each channel has a value

</td>
</tr>
</table>

---

#### **Step 2: Apply Global Average Pooling (GAP)**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Operation**

**Global Average Pooling:** For each channel $k$, compute the **average** over all spatial positions:

$$f_k = \text{GAP}(\mathbf{F}_k) = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} \mathbf{F}_k[i, j]$$

**Result:** A vector $\mathbf{f} \in \mathbb{R}^C$ where:

$$\mathbf{f} = [f_1, f_2, \ldots, f_C]$$

**Intuition:**
- Each $f_k$ = "How much channel $k$ was activated overall"
- Collapses 7√ó7 grid into 1 number per channel

**Example:**
```
Channel 1: 7√ó7 feature map
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ 0.1  0.3  0.2  ‚îÇ
‚îÇ 0.4  0.5  0.1  ‚îÇ
‚îÇ ...  ...  ...  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
         ‚Üì Average all values
      f_1 = 0.28
```

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Apply GAP manually
# F shape: [1, 512, 7, 7]

# Average over spatial dimensions (H, W)
f = F.mean(dim=[2, 3])  # Shape: [1, 512]
# f[k] = average of all 7√ó7 values in channel k

# Alternatively, use PyTorch's built-in
gap = torch.nn.AdaptiveAvgPool2d((1, 1))
f = gap(F).squeeze()  # Shape: [512]

# Now f is a vector of 512 values
# f[0] = average activation of channel 0
# f[1] = average activation of channel 1
# ...
# f[511] = average activation of channel 511
```

**What we have:**
- $\mathbf{f}$: [512] vector
- Each value = average of one 7√ó7 feature map
- This is the input to the final classification layer

</td>
</tr>
</table>

---

#### **Step 3: Final Classification Layer**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Operation**

**Fully Connected Layer:** Compute class scores

$$S_c = \sum_{k=1}^{C} w_{c,k} \cdot f_k$$

Where:
- $S_c$ = score for class $c$ (before softmax)
- $w_{c,k}$ = weight connecting channel $k$ to class $c$
- $\mathbf{W} \in \mathbb{R}^{N \times C}$ = weight matrix (N classes, C channels)

**In matrix form:**

$$\mathbf{S} = \mathbf{W} \cdot \mathbf{f}$$

Where $\mathbf{S} \in \mathbb{R}^N$ (N = 1000 classes for ImageNet)

**Intuition:**
- Each class has a weight vector $\mathbf{w}_c = [w_{c,1}, w_{c,2}, \ldots, w_{c,C}]$
- $w_{c,k}$ = "How much does channel $k$ contribute to class $c$?"

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Extract weights from final FC layer
fc_weights = model.fc.weight.data  # Shape: [1000, 512]
# fc_weights[c, k] = weight from channel k to class c

# Compute class scores
# S = W ¬∑ f
S = torch.matmul(fc_weights, f)  # Shape: [1000]
# S[c] = score for class c

# Find predicted class
predicted_class = S.argmax().item()
print(f"Predicted class: {predicted_class}")
# Example: 207 (Golden Retriever)

# Extract weights for predicted class
w_c = fc_weights[predicted_class]  # Shape: [512]
# w_c[k] = importance of channel k for this class
```

**What we have:**
- $\mathbf{w}_c$: [512] vector
- $w_c[k]$ = weight from channel $k$ to predicted class $c$
- These weights tell us which channels matter for this class!

</td>
</tr>
</table>

---

#### **Step 4: Generate Class Activation Map**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formula**

**CAM for class $c$:** Weighted sum of feature maps

$$\text{CAM}_c(i, j) = \sum_{k=1}^{C} w_{c,k} \cdot \mathbf{F}_k[i, j]$$

Where:
- $(i, j)$ = spatial position in the 7√ó7 grid
- $w_{c,k}$ = weight from channel $k$ to class $c$
- $\mathbf{F}_k[i,j]$ = activation at position $(i,j)$ in channel $k$

**In matrix form:**

$$\text{CAM}_c = \sum_{k=1}^{C} w_{c,k} \cdot \mathbf{F}_k$$

**Result:** $\text{CAM}_c \in \mathbb{R}^{H \times W}$ (a 7√ó7 heatmap)

**Intuition:**
- At each spatial location $(i,j)$:
  - Look at all 512 channels
  - Weight each by its importance to class $c$
  - Sum them up
- High value = "this region strongly indicates class $c$"

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# w_c shape: [512] (weights for predicted class)
# F shape: [1, 512, 7, 7] (feature maps)

# Compute weighted sum
# CAM = Œ£ w_c[k] ¬∑ F[k]
CAM = torch.zeros(7, 7)  # Initialize 7√ó7 heatmap

for k in range(512):
    # Add weighted feature map k
    CAM += w_c[k] * F[0, k, :, :]

# Alternatively, vectorized:
CAM = torch.einsum('k,khw->hw', w_c, F[0])
# 'k' = channels, 'h' = height, 'w' = width

# Normalize to [0, 1]
CAM = torch.relu(CAM)  # Remove negative values
CAM = (CAM - CAM.min()) / (CAM.max() - CAM.min())

# Upsample to original image size (224√ó224)
from torch.nn.functional import interpolate
CAM_upsampled = interpolate(
    CAM.unsqueeze(0).unsqueeze(0), 
    size=(224, 224), 
    mode='bilinear'
).squeeze()

# Shape: [224, 224] heatmap
```

**What we have:**
- $\text{CAM}$: [224, 224] heatmap
- High values = regions important for the prediction
- Can overlay on original image!

</td>
</tr>
</table>

---

### **Why CAM Works: The Math Behind It**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Proof**

**Recall class score:**

$$S_c = \sum_{k=1}^{C} w_{c,k} \cdot f_k$$

**Substitute GAP definition:**

$$S_c = \sum_{k=1}^{C} w_{c,k} \cdot \left(\frac{1}{H \times W} \sum_{i,j} \mathbf{F}_k[i,j]\right)$$

**Rearrange:**

$$S_c = \frac{1}{H \times W} \sum_{i,j} \underbrace{\sum_{k=1}^{C} w_{c,k} \cdot \mathbf{F}_k[i,j]}_{\text{CAM}_c(i,j)}$$

**Result:**

$$S_c = \frac{1}{H \times W} \sum_{i,j} \text{CAM}_c(i,j)$$

**Interpretation:**
- The class score is the **average** of the CAM
- High CAM values = high contribution to class score
- CAM directly shows spatial importance!

</td>
<td width="50%" valign="top">

**üíª Verification**

```python
# Verify the math
# S_c should equal average of CAM

# Method 1: Direct computation
S_c_direct = (fc_weights[predicted_class] @ f).item()

# Method 2: Average of CAM
S_c_from_cam = CAM.mean().item() * (7 * 7)

print(f"Direct: {S_c_direct:.4f}")
print(f"From CAM: {S_c_from_cam:.4f}")
# Should be approximately equal!

# This proves CAM captures the spatial
# contribution to the class score
```

**Why this matters:**
- CAM isn't arbitrary‚Äîit's mathematically grounded
- Each pixel's value directly relates to its contribution to the prediction
- This makes CAM interpretable and trustworthy

</td>
</tr>
</table>

---


### **Summary: CAM in 4 Steps**

| **Step** | **Mathematical** | **Result** |
|----------|------------------|------------|
| **1. Extract features** | $\mathbf{F} \in \mathbb{R}^{C \times H \times W}$ | Last conv features (512√ó7√ó7) |
| **2. Global Average Pooling** | $f_k = \frac{1}{HW}\sum_{i,j} \mathbf{F}_k[i,j]$ | Feature vector (512,) |
| **3. Get class weights** | $S_c = \sum_k w_{c,k} f_k$ | Weights $\mathbf{w}_c$ for class $c$ |
| **4. Weighted sum** | $\text{CAM}_c = \sum_k w_{c,k} \mathbf{F}_k$ | Heatmap (7√ó7 ‚Üí 224√ó224) |

**Final Output:** A heatmap showing which regions of the image led to the class prediction! üî•

|Advantages | Limitations | 
|-----------|-------------|
| No backward Pass | Requires learning separate class-specific weights|
| Uses spatial info alread in conv Layers | overhead grows with classes/filters |
| Produces intuitive, class-specific heatmaps| Requires GAP $\to$ restrict desgin | 
| Works without bounding-box supervision | Mostly Limited to classification problems |
---
---

## Grad-CAM (Gradient-weighted Class Activation Mapping)

**Key Question:** "Can we obtain class-specific heatmaps **without modifying the model architecture** (no GAP requirement)?"

---




### **The Problem Grad-CAM Solves**

**CAM's Limitation:**
- Requires **Global Average Pooling (GAP)** + single FC layer
- Must retrain model with this specific architecture
- Doesn't work with existing pretrained models that use different architectures

**Grad-CAM's Solution:**
- Works with **any CNN architecture** (VGG, ResNet, Inception, etc.)
- No retraining needed
- Uses **gradients** instead of learned weights to measure feature map importance

---

### **CAM vs Grad-CAM: The Key Difference**

<table>
<tr>
<td width="50%" valign="top">

**CAM: Learned Weights**

**How it determines importance:**

$$\alpha_k^c = w_{c,k}$$

Where $w_{c,k}$ is the **learned weight** from the FC layer connecting:
- Feature map $k$ 
- To class $c$

**Problem:**
- These weights only exist if you have GAP ‚Üí FC architecture
- Can't use on arbitrary pretrained models

**CAM Formula:**

$$\text{CAM}_c = \sum_{k} w_{c,k} \cdot \mathbf{F}_k$$

</td>
<td width="50%" valign="top">

**Grad-CAM: Gradient-derived Weights**

**How it determines importance:**

$$\alpha_k^c = \frac{1}{Z} \sum_{i,j} \frac{\partial S_c}{\partial \mathbf{F}_k[i,j]}$$

Where:
- $S_c$ = score for class $c$ (before softmax)
- $\frac{\partial S_c}{\partial \mathbf{F}_k[i,j]}$ = gradient of class score w.r.t. feature map $k$ at position $(i,j)$
- We **average the gradients** to get the importance $\alpha_k^c$

**Advantage:**
- Works with any architecture
- Gradients exist for any differentiable model

**Grad-CAM Formula:**

$$\text{Grad-CAM}_c = \text{ReLU}\left(\sum_{k} \alpha_k^c \cdot \mathbf{F}_k\right)$$

</td>
</tr>
</table>

---

### **The Intuition Behind Grad-CAM**

**Question:** How much does feature map $k$ contribute to class $c$?

**Answer:** Look at the gradient $\frac{\partial S_c}{\partial \mathbf{F}_k}$

**Why gradients?**

| **Gradient Value** | **Meaning** | **Interpretation** |
|-------------------|-------------|-------------------|
| **Large positive** | Increasing this feature map increases class score | This feature **supports** the class |
| **Near zero** | Changes don't affect class score | This feature is **irrelevant** to this class |
| **Large negative** | Increasing this feature map decreases class score | This feature **contradicts** the class |

**By averaging gradients across spatial locations:**
- We get a single importance weight $\alpha_k^c$ per feature map $k$ for class $c$
- High $\alpha_k^c$ = "feature map $k$ strongly supports class $c$"

---

### **Grad-CAM Step-by-Step**

---

#### **Step 1: Forward Pass and Extract Feature Maps**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Setup**

**Choose any convolutional layer** (typically the last one):

$$\mathbf{F} \in \mathbb{R}^{C \times H \times W}$$

Where:
- $C$ = number of feature maps (channels)
- $H \times W$ = spatial dimensions
- $\mathbf{F}_k[i, j]$ = activation at position $(i,j)$ in channel $k$

**Forward pass to get class scores:**

$$\mathbf{S} = f_{\text{model}}(\mathbf{x}) \in \mathbb{R}^N$$

Where:
- $\mathbf{x}$ = input image
- $N$ = number of classes
- $S_c$ = score for class $c$ (logit, before softmax)

**No architecture constraints!** Can be:
- VGG with multiple FC layers
- ResNet with adaptive pooling
- Any CNN architecture

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
import torch
from torchvision import models
from PIL import Image
import matplotlib.pyplot as plt

# Load ANY pretrained model (no GAP requirement!)
model = models.vgg16(pretrained=True).eval()
# Or: models.resnet50(), models.inception_v3(), etc.

# Hook to capture feature maps
feature_maps = {}
gradients = {}

def forward_hook(name):
    def hook(model, input, output):
        feature_maps[name] = output
        # Enable gradient tracking
        output.requires_grad_(True)
        output.retain_grad()
    return hook

# Register hook at target layer
# For VGG16: last conv layer (features.29)
# For ResNet: layer4
target_layer = model.features[29]
target_layer.register_forward_hook(forward_hook('target'))

# Forward pass
img = Image.open('dog.jpg')
img_tensor = preprocess(img).unsqueeze(0)  # [1, 3, 224, 224]
img_tensor.requires_grad_(True)

output = model(img_tensor)  # [1, 1000]

# Extract feature maps F
F = feature_maps['target']  # Shape: [1, 512, 14, 14]

```

**What we have:**
- $\mathbf{F}$: [1, 512, 14, 14] feature maps
- $\mathbf{S}$: [1, 1000] class scores
- Model can have ANY architecture after conv layers!

</td>
</tr>
</table>

---

#### **Step 2: Compute Gradients of Class Score w.r.t. Feature Maps**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Operation**

**For target class $c$, compute:**

$$\frac{\partial S_c}{\partial \mathbf{F}_k[i, j]}$$

For all:
- Channels $k = 1, 2, \ldots, C$
- Spatial positions $(i, j)$

**Result:** A gradient tensor

$$\nabla_{\mathbf{F}} S_c \in \mathbb{R}^{C \times H \times W}$$

**Intuition at each position $(i,j)$ in channel $k$:**
- "If I increase $\mathbf{F}_k[i,j]$ by a small amount, how much does $S_c$ increase?"
- Positive gradient = this activation helps predict class $c$
- Negative gradient = this activation hurts prediction of class $c$

**This works via backpropagation:**
- From output class score $S_c$
- Back through all layers
- To the chosen feature map layer

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Choose target class
# Option 1: Use predicted class
predicted_class = output.argmax(dim=1).item()
# Option 2: Use ground truth class
# predicted_class = true_label

# Extract score for class c
S_c = output[0, predicted_class]

# Backward pass: compute ‚àÇS_c/‚àÇF
model.zero_grad()
S_c.backward(retain_graph=True)

# Extract gradients: ‚àÇS_c/‚àÇF_k[i,j]
gradients = F.grad  # Shape: [1, 512, 14, 14]

# gradients[0, k, i, j] = ‚àÇS_c/‚àÇF_k[i,j]
```

**What we computed:**

$$\nabla_{\mathbf{F}} S_c = \begin{bmatrix}
\frac{\partial S_c}{\partial \mathbf{F}_1[1,1]} & \cdots & \frac{\partial S_c}{\partial \mathbf{F}_1[H,W]} \\
\vdots & \ddots & \vdots \\
\frac{\partial S_c}{\partial \mathbf{F}_C[1,1]} & \cdots & \frac{\partial S_c}{\partial \mathbf{F}_C[H,W]}
\end{bmatrix}$$

**Visual Example:**
```
Feature Map k=42 (14√ó14):
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ 0.2  0.5  0.1  ‚îÇ
‚îÇ 0.4  0.8  0.6  ‚îÇ
‚îÇ ...  ...  ...  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

Gradient ‚àÇS_c/‚àÇF_42 (14√ó14):
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ 0.1  0.3 -0.1  ‚îÇ ‚Üê gradients show
‚îÇ 0.5  0.9  0.2  ‚îÇ   which positions
‚îÇ ...  ...  ...  ‚îÇ   matter for class c
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

</td>
</tr>
</table>

---

#### **Step 3: Global Average Pooling of Gradients to Get Weights**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Operation**

**For each feature map $k$, compute importance weight:**

$$\alpha_k^c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} \frac{\partial S_c}{\partial \mathbf{F}_k[i, j]}$$

**Result:** A vector of importance weights

$$\boldsymbol{\alpha}^c = [\alpha_1^c, \alpha_2^c, \ldots, \alpha_C^c] \in \mathbb{R}^C$$

**Intuition:**
- $\alpha_k^c$ = "On average, how much does feature map $k$ contribute to class $c$?"
- We average across spatial locations because:
  - Some positions may have high positive gradients
  - Some may have negative gradients
  - We want the **overall** importance of the entire feature map

**Why GAP (averaging)?**
- Reduces spatial dimensions: $(C \times H \times W) \to (C)$
- Gives us one weight per channel
- Same operation as CAM, but applied to **gradients** instead of activations

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# Compute Œ±_k^c for each channel k
# gradients shape: [1, 512, 14, 14]

# Average over spatial dimensions (H, W)
alpha = gradients.mean(dim=[2, 3])  # Shape: [1, 512]
# alpha[0, k] = Œ±_k^c

# Remove batch dimension
alpha = alpha.squeeze(0)  # Shape: [512]

# Now we have importance weights:
# alpha[0] = Œ±_1^c (importance of channel 1 for class c)
# alpha[1] = Œ±_2^c (importance of channel 2 for class c)
# ...
# alpha[511] = Œ±_512^c (importance of channel 512 for class c)
```

**Example values:**
```python
alpha[0] = 0.42   # Channel 0: moderately important
alpha[1] = -0.15  # Channel 1: slightly suppresses class c
alpha[2] = 0.89   # Channel 2: very important!
alpha[3] = 0.01   # Channel 3: nearly irrelevant
...
```

**Comparison with CAM:**

| **Method** | **Weight Source** | **Formula** |
|------------|------------------|-------------|
| **CAM** | Learned FC weights | $\alpha_k^c = w_{c,k}$ |
| **Grad-CAM** | Gradient averages | $\alpha_k^c = \frac{1}{HW}\sum_{i,j} \frac{\partial S_c}{\partial \mathbf{F}_k[i,j]}$ |

Both give us: "importance of channel $k$ for class $c$"

</td>
</tr>
</table>

---

#### **Step 4: Compute Grad-CAM (Weighted Sum + ReLU)**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Formula**

**Grad-CAM for class $c$:**

$$\text{Grad-CAM}_c = \text{ReLU}\left(\sum_{k=1}^{C} \alpha_k^c \cdot \mathbf{F}_k\right)$$

Where:
- $\alpha_k^c$ = importance weight for channel $k$
- $\mathbf{F}_k \in \mathbb{R}^{H \times W}$ = feature map for channel $k$
- ReLU = $\max(0, x)$ (zero out negative values)

**Result:** $\text{Grad-CAM}_c \in \mathbb{R}^{H \times W}$ (heatmap)

**Breakdown:**

1. **Weighted sum:** $\sum_k \alpha_k^c \cdot \mathbf{F}_k$
   - At each spatial location $(i,j)$
   - Sum the weighted feature maps
   - High value = many important features activate here

2. **ReLU:** Keep only positive values
   - Positive = regions that **support** the class
   - Negative = regions that **contradict** the class
   - We only visualize supporting regions

**Why ReLU?**
- Negative values mean "this region suppresses the class"
- We want to show "what made the network confident"
- Not "what made it less confident"
- ReLU removes distracting negative influences

</td>
<td width="50%" valign="top">

**üíª Implementation**

```python
# alpha shape: [512] (importance weights)
# F shape: [1, 512, 14, 14] (feature maps)

# Remove batch dimension from F
F = F.squeeze(0)  # Shape: [512, 14, 14]

# Compute weighted sum: Œ£ Œ±_k^c ¬∑ F_k
# Method 1: Loop
Grad_CAM = torch.zeros(14, 14)
for k in range(512):
    Grad_CAM += alpha[k] * F[k, :, :]

# Method 2: Vectorized (faster)
Grad_CAM = torch.einsum('k,khw->hw', alpha, F)
# 'k'=channels, 'h'=height, 'w'=width

# Shape: [14, 14] raw heatmap

# Apply ReLU (remove negative values)
Grad_CAM = torch.relu(Grad_CAM)

# Normalize to [0, 1] for visualization
Grad_CAM = Grad_CAM - Grad_CAM.min()
Grad_CAM = Grad_CAM / Grad_CAM.max()

# Upsample to original image size (224√ó224)
from torch.nn.functional import interpolate
Grad_CAM = interpolate(
    Grad_CAM.unsqueeze(0).unsqueeze(0),
    size=(224, 224),
    mode='bilinear',
    align_corners=False
).squeeze()

# Final shape: [224, 224] heatmap
```

**What we have:**
- Grad-CAM heatmap: [224, 224]
- Values range [0, 1]
- High values = important regions for class prediction

</td>
</tr>
</table>

---

### **Why ReLU? The Importance of Positive vs Negative Gradients**

<table>
<tr>
<td width="50%" valign="top">

**üìê Mathematical Interpretation**

**Gradient sign tells us:**

$$\frac{\partial S_c}{\partial \mathbf{F}_k[i,j]} \begin{cases}
> 0 & \text{Region supports class } c \\
< 0 & \text{Region suppresses class } c \\
\approx 0 & \text{Region irrelevant to class } c
\end{cases}$$

**After weighted sum, before ReLU:**

$$\text{Raw value} = \sum_{k} \alpha_k^c \cdot \mathbf{F}_k[i,j]$$

Can be negative if:
- Important channels ($\alpha_k^c$ large)
- Have low activation ($\mathbf{F}_k[i,j]$ small)
- Or: negative-weight channels are highly active

**ReLU decision:**

$$\text{Grad-CAM}[i,j] = \max\left(0, \sum_{k} \alpha_k^c \cdot \mathbf{F}_k[i,j]\right)$$

**Result:**
- Keep regions that **positively** contribute to class $c$
- Remove regions that **negatively** contribute to class $c$

</td>
<td width="50%" valign="top">

**üíª Visual Example**

**Example: Dog vs Cat Classification**

```python
# For class c = "dog"
# Before ReLU:

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  0.8   0.9   0.7  ‚îÇ ‚Üê Dog's face (high positive)
‚îÇ  0.6   0.5   0.4  ‚îÇ
‚îÇ -0.2  -0.3  -0.1  ‚îÇ ‚Üê Cat toy in corner (negative)
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

# After ReLU:

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  0.8   0.9   0.7  ‚îÇ ‚Üê Dog features highlighted
‚îÇ  0.6   0.5   0.4  ‚îÇ
‚îÇ  0.0   0.0   0.0  ‚îÇ ‚Üê Confusing regions removed
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

**Interpretation:**
- **Bright regions:** "This is why I said dog"
- **Dark regions:** Either:
  - Irrelevant (gradient ‚âà 0)
  - Contradictory (gradient < 0, removed by ReLU)

**Why this is useful:**
- Shows **positive evidence** for the prediction
- Ignores **negative evidence** (what it's not)
- Creates intuitive "what made the network confident" visualizations

</td>
</tr>
</table>


### **Grad-CAM: Mathematical Summary**

| **Step** | **Mathematical Operation** | **Result** |
|----------|---------------------------|------------|
| **1. Forward** | $\mathbf{F}, \mathbf{S} = f_{\text{model}}(\mathbf{x})$ | Feature maps $\mathbf{F} \in \mathbb{R}^{C \times H \times W}$, scores $\mathbf{S} \in \mathbb{R}^N$ |
| **2. Backward** | $\frac{\partial S_c}{\partial \mathbf{F}_k[i,j]}$ | Gradients $\nabla_{\mathbf{F}} S_c \in \mathbb{R}^{C \times H \times W}$ |
| **3. GAP gradients** | $\alpha_k^c = \frac{1}{HW} \sum_{i,j} \frac{\partial S_c}{\partial \mathbf{F}_k[i,j]}$ | Importance weights $\boldsymbol{\alpha}^c \in \mathbb{R}^C$ |
| **4. Weighted sum + ReLU** | $\text{Grad-CAM}_c = \text{ReLU}\left(\sum_k \alpha_k^c \cdot \mathbf{F}_k\right)$ | Heatmap $\in \mathbb{R}^{H \times W}$ |

**Final Output:** A class-specific heatmap showing which regions contributed to the prediction! üî•

---

### **Advantages & Limitations**

| **‚úÖ Advantages** | **‚ùå Limitations** |
|-------------------|-------------------|
| Works with **any CNN** (VGG, ResNet, Inception, etc.) | Requires **backward pass** (slower than CAM) |
| **No retraining** required | Gradients can be **noisy** |
| Can visualize **any layer** (not just last conv) | Lower resolution than input (inherits from feature map size) |
| **Class-discriminative** (specific to predicted class) | Doesn't show pixel-level details (use Guided Grad-CAM for that) |
| Mathematically grounded (gradient-based importance) | ReLU removes negative evidence (both good and limiting) |

---

### **Key Takeaways**

1. **Grad-CAM uses gradients** instead of learned weights to measure feature importance
2. **Works with any CNN** ‚Äî no architectural constraints
3. **Gradients show sensitivity:** $\frac{\partial S_c}{\partial \mathbf{F}_k}$ = "how much does feature $k$ affect class $c$?"
4. **GAP of gradients** gives per-channel importance: $\alpha_k^c$
5. **ReLU keeps positive contributions** (regions that support the prediction)
6. **Class-discriminative visualizations** without modifying the model!

---
```