<a href="https://colab.research.google.com/github/y-oth/dst_assessment2/blob/main/ComparingSaliencyMethods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Quantitatively Comparing Saliency Methods: The AOPC\_MoRF Metric

In this project each group member implemented a different post-hoc interpretation method
(Grad-CAM, SmoothGrad, LRP) on the same CNN trained to classify brain tumour MRI slices.
To compare these explanations in a principled way we require a fidelity metric: a measure
of how well a saliency map identifies the specific image regions that truly drive the model’s
prediction. Importantly, the goal here is explainability fidelity, not segmentation accuracy.
We are not judging how well a method outlines the tumour anatomically, but whether it
correctly highlights the regions that this particular CNN relies on.

---

### What is a saliency map?

A saliency map is a matrix, aligned with the input image, that assigns an *importance score*  
to each pixel or region with respect to a particular prediction. Formally, for an image $x$ and
model output $f(x)$, a saliency method produces a map:

$$
R = \{ R_i \mid i \in \text{pixels of } x \},
$$

where $R_i$ measures how influential pixel $i$ is for the model’s chosen class.  
Different methods compute $R$ differently:

- **Grad-CAM**: weights feature maps by the gradient of the target class and upsamples.
- **SmoothGrad**: averages gradients under noise to reduce visual noise.
- **LRP**: redistributes the output score backwards according to conservation rules.

These maps are qualitative visualisations, but we require a **quantitative** way to assess how
faithfully they reflect what the model actually uses.

---

### The AOPC\_MoRF Metric

To evaluate fidelity, we use AOPC\_MoRF (Area Over the Perturbation Curve – Most Relevant First.
The core idea is simple:

> If a saliency map is faithful, then removing the pixels it marks as “important” should quickly
> reduce the model’s confidence.

Given an image $x$, its original score $f(x)$, and a ranking of regions from most to least
important, we progressively **delete** (perturb) the top–$k$ regions and record the model's
confidence on the modified images $x^{(k)}$.

The AOPC\_MoRF score for a single image is:

$$
\text{AOPC}_{\text{MoRF}}(x)
= \frac{1}{K} \sum_{k=1}^{K} \big[ f(x) - f(x^{(k)}) \big],
$$

where $K$ is the number of perturbation steps.  
A higher score indicates that deleting the most relevant regions causes a rapid drop in the class
score, meaning the explanation is more faithful to the model’s behaviour.

We adopt standard perturbations such as zeroing or mean-value replacement of selected
regions, following reproducible approaches in the literature.

---

### Why we chose AOPC\_MoRF

Following the analysis in *Sanity Checks for Saliency Metrics* (Tomsett et al., 2020), we select
AOPC\_MoRF as our primary comparison metric for three reasons:

1. **Direct fidelity assessment**  
   It directly tests whether the highlighted regions are genuinely decision-critical for the CNN.

2. **Method comparability**  
   Grad-CAM, SmoothGrad and LRP all produce scalar importance maps, allowing a unified
   ranking and deletion-based evaluation.

3. **Relative robustness in prior research**  
   Among the evaluated metrics, AOPC\_MoRF demonstrated the most stable cross-method
   behaviour, whereas alternatives such as LeRF and single-pixel faithfulness exhibited
   sensitivity and instability.

---

### Limitations and interpretation caution

It is important to emphasise that AOPC\_MoRF is not a universally reliable or perfectly stable
metric. The Sanity Checks paper highlights:

- high variance across images,
- sensitivity to the perturbation strategy,
- moderate inter-method reliability,
- and the risk of evaluating off-manifold images.

This metric evaluates fidelity to the model, not correctness relative to ground
truth. A saliency map can score highly while still highlighting spurious regions the model has
learned. This is an inherent limitation of post-hoc explainability and not specific to any one
method.

---

### Summary

Given these considerations, AOPC\_MoRF serves as a practical and defensible choice for
comparing our three saliency approaches in the tumour-classification context. It measures how
tightly an explanation aligns with the model’s actual decision process, while acknowledging that
no single metric captures explanation quality in a complete or reliable manner. We therefore use
AOPC\_MoRF as our **primary fidelity metric**, interpreted with transparency about its limitations
and with a focus on relative — not absolute — comparisons.


## Grad-CAM Implementation:

In [None]:
##now we implement the Method for Grad-CAM

### Results:

## SmoothGrad Implementation:

In [None]:
## SmoothGrad Implementation:


### Results:

## LRP Implementation:

In [None]:
## LRP

### Results:

## Comparison:

In [None]:
### all the metrics in one output

# Conclusion: