# Feature Visualisation 

In this chapter we uncover:
- What patterns are individual neurons actually detecting
- Where in the image the network is focusing to make a descision
- How to understand what features the model is extracting

We'll be following 2 main approaches:
1. Feature-based explanantions $\to$ DeconvNet
2. Descision-Based Explanations $\to$ CAM

## Visualizing The Weights 

The most basic inspection approach is to directly visualize the learned weights, primarily applicable to the **first convolutional layer**.

---

### **What We See in First Layer Weights**

First layer filters operate on raw RGB pixels, making them interpretable as visual templates:

<div align="center">
<img src="../images/chap8/InitialLayer.png" width="400"/>
</div>

| **Pattern** | **What It Detects** |
|-------------|---------------------|
| **Edge detectors** | Oriented gradients (horizontal, vertical, diagonal) |
| **Color blobs** | Specific color combinations (red, green, blue) |
| **Gabor-like patterns** | Textures at various orientations and frequencies |

---

### **Implementation**

```python
import torch
import matplotlib.pyplot as plt

# Load pretrained model
model = torchvision.models.alexnet(pretrained=True)

# Extract first conv layer weights: [out_channels, in_channels, height, width]
first_layer_weights = model.features[0].weight.data  # Shape: [64, 3, 11, 11]

# Visualize first 64 filters
fig, axes = plt.subplots(8, 8, figsize=(10, 10))
for i, ax in enumerate(axes.flat):
    # Normalize to [0, 1] for display
    weight = first_layer_weights[i].permute(1, 2, 0)  # [11, 11, 3]
    weight = (weight - weight.min()) / (weight.max() - weight.min())
    ax.imshow(weight.cpu().numpy())
    ax.axis('off')
plt.tight_layout()
plt.show()
```

---

### **Limitations**

| **Problem** | **Why It Fails** |
|-------------|------------------|
| **Only Layer 1** | Deeper layers have abstract, high-dimensional features (e.g., 3×3×256 tensors) — uninterpretable blobs |
| **No context** | Shows *what* filters detect, not *where* they activate on real images |
| **Static view** | Ignores how features compose hierarchically through the network |

**Better approach:** Visualize **activations** and **gradients** instead → DeconvNet, CAM.

---
---

## Maximally Activating Patches

**Key Question:** "What kind of input patterns cause a specific neuron to activate most strongly?"

---

### **The Approach**

Since a neuron's **receptive field** determines what it can "see" (and this grows with network depth), we can find image patches that maximally excite specific neurons:

**Algorithm:**
1. Select a target neuron/feature map at any layer
2. Run thousands of images through the network
3. Record activation values for that neuron
4. Extract and visualize the top-K image patches with highest activations

<div align="center">
<img src="../images/chap8/Layer1.png" width="270"/>
<img src="../images/chap8/Layer2.png" width="370"/>
<img src="../images/chap8/Layer4.png" width="350"/>
<img src="../images/chap8/Layer6.png" width="350"/>
<p><i>Example: Layer 1, 2, 4 and 6 neuron activates </i></p>
</div>

---

### **Implementation**

```python
import torch
import numpy as np
from torchvision import models, transforms
from PIL import Image

# Load pretrained model
model = models.vgg16(pretrained=True).eval()

# Hook to capture activations
activations = {}
def get_activation(name):
    def hook(model, input, output):
        activations[name] = output.detach()
    return hook

# Register hook on target layer (e.g., conv3_3)
model.features[16].register_forward_hook(get_activation('conv3_3'))

# Process images
patches = []
for img_path in image_dataset:
    img = Image.open(img_path)
    img_tensor = transforms.ToTensor()(img).unsqueeze(0)
    
    # Forward pass
    model(img_tensor)
    
    # Extract activation for specific filter (e.g., filter 42)
    activation_map = activations['conv3_3'][0, 42]  # [H, W]
    max_row, max_col = np.unravel_index(activation_map.argmax(), activation_map.shape)
    
    # Extract corresponding receptive field patch from original image
    patch = extract_receptive_field_patch(img, max_row, max_col, layer_depth=16)
    patches.append((activation_map.max().item(), patch))

# Display top-9 patches
top_patches = sorted(patches, reverse=True)[:9]
visualize_patches(top_patches)
```

---

### **Advantages & Limitations**

| **✅ Advantages** | **❌ Limitations** |
|-------------------|-------------------|
| Works for **any layer** (not just Layer 1) | Doesn't explain **which class** the features belong to |
| Data-driven and accurate | Doesn't explain the **final decision** |
| No gradients needed (only forward passes) | Limited by **dataset coverage** (only sees patterns in training data) |
| Produces interpretable patterns at every depth | Doesn't work well for **fully connected layers** (no spatial structure) |

**Key Insight:** Shows *what* activates neurons, but not *why* the network makes specific predictions. For decision explanations, we need **CAM** (Class Activation Mapping).

---
---

## Visualizing the Representation space

**Key Question** What does the network represent as a whole?

---

## Visualizing the Representation Space

**Key Question:** "What does the network represent as a whole? How does it organize different images internally?"

---

### **The Concept**

**At each layer**, an image is transformed into a high-dimensional vector:
- **Early layers**: Low-level features (edges, textures) → vectors encode spatial patterns
- **Deep layers**: High-level features (objects, concepts) → vectors encode semantic meaning
- **Key insight**: Similar images produce similar vectors in representation space

**The Problem:** 
Fully connected layers produce high-dimensional vectors (e.g., 4096-dimensional) with **no spatial structure**—we can't use patch-based visualization anymore.

---

### **Solution 1: Nearest Neighbors in Feature Space**

**Idea:** Extract feature vectors from a chosen layer, then find images with the most similar vectors.

**Algorithm:**
1. Choose a layer (typically the last FC layer before classification)
2. Extract feature vectors for all images: $\mathbf{x}_i \in \mathbb{R}^d$ where $d$ = 4096
3. For a query image, compute distances: $\text{distance}(\mathbf{x}_{\text{query}}, \mathbf{x}_i) = \|\mathbf{x}_{\text{query}} - \mathbf{x}_i\|_2$
4. Retrieve top-K nearest neighbors (smallest distances)

<div align="center">
<img src="../images/chap8/NNRespSpace.png" width="570"/>
<p><i>Test image (left of red line), L2 nearest neighbors in feature space (right)</i></p>
</div>

**What This Reveals:**
- **Learned similarity**: What the network considers "similar" (may differ from human perception)
- **Category structure**: How well classes are separated in feature space
- **Invariances**: Network ignores pose, lighting, background variations

**Use Cases:**
- **Debugging**: Find mislabeled images (distant from their class cluster)
- **Class confusion**: Identify which classes have overlapping representations
- **Dataset bias**: Detect spurious correlations (e.g., "boats" always near water)

---

### **Implementation: Nearest Neighbors**

```python
import torch
import numpy as np
from torchvision import models, transforms
from sklearn.neighbors import NearestNeighbors

# Load pretrained model
model = models.resnet50(pretrained=True).eval()

# Remove final classification layer to get features
feature_extractor = torch.nn.Sequential(*list(model.children())[:-1])  # Output: [N, 2048, 1, 1]

# Extract features for all images
features = []
for img_path in dataset:
    img = Image.open(img_path)
    img_tensor = preprocess(img).unsqueeze(0)
    
    with torch.no_grad():
        feature = feature_extractor(img_tensor).flatten()  # [2048]
        features.append(feature.cpu().numpy())

features = np.array(features)  # [N, 2048]

# Find nearest neighbors for a query image
nn_model = NearestNeighbors(n_neighbors=10, metric='euclidean')
nn_model.fit(features)

query_feature = features[query_idx].reshape(1, -1)
distances, indices = nn_model.kneighbors(query_feature)

# Visualize results
visualize_nearest_neighbors(query_idx, indices[0])
```

---

### **Solution 2: Low-Dimensional Embeddings**

**The Problem:** Feature vectors are 2048-4096 dimensional—impossible to visualize directly.

**The Solution:** Project high-dimensional vectors into 2D/3D while **preserving local distances**.

| **Method** | **How It Works** | **Best For** |
|------------|------------------|--------------|
| **t-SNE** | Non-linear; preserves local neighborhoods via probability distributions | Exploring clusters and local structure |
| **UMAP** | Non-linear; faster than t-SNE, preserves global structure better | Large datasets, hierarchical structure |
| **PCA** | Linear projection onto principal components | Quick overview, preserving variance |

**Key Principle:** Points close in high-dimensional space should remain close in 2D.

<div align="center">
<img src="../images/chap8/2dembed.png" width="600"/>
<p><i>t-SNE visualization of ImageNet features: each color = different class</i></p>
</div>

---

### **Implementation: t-SNE Visualization**

```python
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Extract features (same as above)
features = np.array(features)  # [N, 2048]
labels = np.array(labels)       # [N] class labels

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
features_2d = tsne.fit_transform(features)  # [N, 2]

# Visualize
plt.figure(figsize=(12, 10))
scatter = plt.scatter(features_2d[:, 0], features_2d[:, 1], 
                      c=labels, cmap='tab10', alpha=0.6, s=10)
plt.colorbar(scatter)
plt.title('t-SNE Visualization of Feature Space')
plt.xlabel('t-SNE Dimension 1')
plt.ylabel('t-SNE Dimension 2')
plt.show()
```

---

### **What Embeddings Reveal**

| **Observation** | **Interpretation** |
|-----------------|-------------------|
| **Tight clusters** | Class is well-learned, features are consistent |
| **Overlapping clusters** | Model confuses these classes (e.g., "husky" vs "wolf") |
| **Outliers** | Mislabeled images or unusual examples |
| **Smooth transitions** | Network learns continuous representations (e.g., dog breeds form a continuum) |

---

### **Comparison: When to Use Each**

| **Method** | **When to Use** | **Limitation** |
|------------|-----------------|----------------|
| **Nearest Neighbors** | Find similar images, debug specific examples | Doesn't show overall structure |
| **t-SNE/UMAP** | Visualize overall dataset structure, find clusters | Slow for large datasets; can distort global structure |
| **PCA** | Quick linear projection, preserve variance | May miss non-linear relationships |

**Key Insight:** Representation space visualization shows **how the network organizes its knowledge**—revealing both its strengths (semantic clustering) and weaknesses (class confusion).

---