# HW2 Interpretability Notebook (Complete)

This notebook is a complete, runnable companion to the IEEE report. It covers all required parts of Homework 2:

1. Tabular EDA, preprocessing, MLP training, deterministic evaluation.
2. LIME and SHAP local explanations for three samples, SHAP force plots, and cross-method comparison.
3. NAM modeling and feature-function interpretability.
4. Bonus GRACE-style contrastive perturbation analysis.
5. Vision interpretability with VGG16: Grad-CAM, Guided Backprop, Guided Grad-CAM, SmoothGrad, adversarial FGSM, and feature visualization with TV + random shifts.

All generated artifacts are saved to `report/figures/` and summarized in `report/figures/metrics_summary.json`.


## Reproducibility Notes

- Use the same Python environment as the report build.
- This notebook reuses the project modules in `code/`.
- Seed control is enforced to keep outputs stable.

Activate your project virtual environment before running (e.g. `source .venv/bin/activate` from `HomeWorks/HW2/code`, or your preferred venv path).


In [None]:
import json
import random
import sys
from pathlib import Path

import numpy as np
import torch

ROOT = Path.cwd().resolve()
if ROOT.name == "notebooks":
    ROOT = ROOT.parent
# If run from repo root, point to HomeWorks/HW2
CODE_DIR = ROOT / "code"
if not (CODE_DIR / "generate_report_plots.py").exists():
    hw2 = ROOT / "HomeWorks" / "HW2"
    if (hw2 / "code" / "generate_report_plots.py").exists():
        ROOT = hw2
        CODE_DIR = ROOT / "code"

REPORT_DIR = ROOT / "report"
FIG_DIR = REPORT_DIR / "figures"
METRICS_JSON = FIG_DIR / "metrics_summary.json"

if str(CODE_DIR) not in sys.path:
    sys.path.insert(0, str(CODE_DIR))

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

print("Root:", ROOT)
print("Code dir:", CODE_DIR)
print("Figures dir:", FIG_DIR)


## Theoretical Foundation (Tabular)

For binary classification with logit score $f_\theta(x)$, probability is

$$
P(Y=1\mid x)=\sigma(f_\theta(x)),\quad \sigma(z)=\frac{1}{1+e^{-z}}.
$$

Training minimizes BCE risk

$$
\hat\theta=\arg\min_\theta\frac{1}{N}\sum_{i=1}^N\ell\big(y_i,f_\theta(x_i)\big).
$$

LIME fits a local weighted surrogate around a point $x$:

$$
\xi(x)=\arg\min_{g\in\mathcal{G}}\mathcal{L}(f,g,\pi_x)+\Omega(g).
$$

SHAP allocates additive contributions:

$$
f(x)\approx \phi_0+\sum_j\phi_j,
$$

with Shapley axioms (efficiency, symmetry, dummy, additivity), giving a principled attribution decomposition.

NAM enforces structural interpretability through additive subnetworks:

$$
f(x)=\sum_{j=1}^d g_j(x_j),\quad \hat y=\sigma(f(x)).
$$


## Generate All Artifacts (Tabular + Vision)

This single run produces all report figures and updates `metrics_summary.json`.


In [None]:
from generate_report_plots import (
    METRICS_JSON as PIPELINE_METRICS_JSON,
    _ensure_dirs,
    _set_seed,
    generate_tabular_figures,
    generate_vision_figures,
)

_set_seed(SEED)
_ensure_dirs()

tabular_summary = generate_tabular_figures()
vision_summary = generate_vision_figures()
combined_summary = {"tabular": tabular_summary, "vision": vision_summary}
PIPELINE_METRICS_JSON.write_text(json.dumps(combined_summary, indent=2), encoding="utf-8")

print("Saved:", PIPELINE_METRICS_JSON)


In [None]:
from pprint import pprint

summary = json.loads(METRICS_JSON.read_text(encoding="utf-8"))
print("Top-level keys:", list(summary.keys()))
print("\nTabular metrics:")
pprint(summary["tabular"]["mlp"])
pprint(summary["tabular"]["nam"])
print("\nVision keys:", list(summary["vision"].keys()))


## Tabular EDA Outputs

The following figures cover the assignment EDA requirements: class balance, correlation matrix, pairplot, and outlier dispersion.


In [None]:
from IPython.display import Image as DisplayImage, display

def show_fig(name, width=900):
    path = FIG_DIR / name
    print(path)
    display(DisplayImage(filename=str(path), width=width))

show_fig("class_distribution.png", width=760)
show_fig("eda_correlation_matrix.png", width=760)
show_fig("eda_pairplot.png", width=900)
show_fig("eda_outlier_boxplots.png", width=900)


### EDA Interpretation

The class distribution shows moderate imbalance (about 34.8% positives), so threshold-aware metrics such as recall and F1 are mandatory next to accuracy. The correlation matrix and pairplot indicate no single dominant collinearity pattern in this dataset, which means model decisions are not trivially reducible to one redundant feature pair; instead, multiple weak-to-moderate signals combine nonlinearly. Outlier plots show limited but nonzero tails (especially for insulin-like dimensions), motivating robust preprocessing and caution when interpreting local linear surrogate coefficients near atypical points.


## Core Predictive Diagnostics

This block covers optimization behavior and thresholded/threshold-free evaluation.


In [None]:
show_fig("training_loss_curves.png", width=900)
show_fig("confusion_matrix_comparison.png", width=760)
show_fig("roc_pr_comparison.png", width=900)
show_fig("calibration_comparison.png", width=760)
show_fig("threshold_sensitivity.png", width=900)
show_fig("permutation_importance_comparison.png", width=800)


### Diagnostic Interpretation

Training curves show stable convergence with MLP reaching a lower validation-loss basin than NAM, consistent with its stronger test metrics. Confusion matrices reveal similar true-negative behavior but slightly better minority-class recovery for MLP, which drives higher recall/F1. ROC/PR dominance for MLP confirms better ranking quality, while reliability curves and Brier scores indicate slightly better probability calibration. Threshold sensitivity shows both models improve substantially around $t\approx 0.2$, proving decision-threshold choice is part of the operational model. Permutation importance shows glucose as dominant global signal, with smaller marginal reliance for other features.


## Local Explainability: LIME, SHAP, Force Plots, and Correlation Linkage


In [None]:
show_fig("lime_shap_agreement.png", width=760)
show_fig("lime_shap_compare_sample_0.png", width=900)
show_fig("lime_shap_compare_sample_1.png", width=900)
show_fig("lime_shap_compare_sample_2.png", width=900)
show_fig("shap_force_sample_0.png", width=900)
show_fig("shap_force_sample_1.png", width=900)
show_fig("shap_force_sample_2.png", width=900)
show_fig("correlation_vs_shap_importance.png", width=760)


### Local Explanation Interpretation

Agreement diagnostics show strong rank-level consistency between LIME and SHAP across the three analyzed samples, while per-sample bar charts show expected differences in exact magnitudes due to different estimators (local weighted surrogate fit versus Shapley-based additive credit). Force plots validate additive contribution flow from baseline to final prediction for each sample and provide intuitive sign-consistent evidence about why borderline points remain near decision boundaries. The correlation-versus-SHAP comparison confirms that raw correlation with target and learned attribution are related but not equivalent: SHAP reflects model-conditional contribution, not just pairwise data statistics.


## NAM and Bonus GRACE-Style Contrastive Analysis


In [None]:
show_fig("nam_feature_functions.png", width=900)
show_fig("grace_counterfactual_shap_shift.png", width=760)

grace = summary["tabular"].get("grace_counterfactual", {})
print("GRACE summary:")
print(grace)


### NAM + GRACE Interpretation

NAM feature-function plots provide intrinsic interpretability because each subplot is a direct estimate of one additive component $g_j(x_j)$, making local slopes and regime changes directly inspectable without post-hoc approximation. The GRACE-style contrastive plot shows that controlled perturbation of a selected feature produces coherent movement in both predicted probability and SHAP attribution mass, which is exactly the expected behavior for contrastive reasoning: meaningful feature edits should create aligned changes in outputs and explanations.


## Vision Theory Summary

Grad-CAM for class $c$ uses

$$
\alpha_k^c = \frac{1}{Z}\sum_i\sum_j\frac{\partial y^c}{\partial A_{ij}^k},\qquad
L_{\text{Grad-CAM}}^c=\mathrm{ReLU}\left(\sum_k\alpha_k^cA^k\right).
$$

Guided Backprop keeps positive gradient flow through ReLU gates. Guided Grad-CAM fuses localization and fine details. SmoothGrad estimates

$$
\hat S(x)=\frac{1}{K}\sum_{k=1}^K \nabla_x y^c(x+\epsilon_k),\quad \epsilon_k\sim\mathcal N(0,\sigma^2I),
$$

which reduces high-frequency gradient variance.

FGSM adversarial perturbation:

$$
x_{adv}=x+\epsilon\,\mathrm{sign}(\nabla_x \mathcal{L}(f(x),y)).
$$

Activation maximization for class visualization:

$$
x^*=\arg\max_x y^c(x)-\lambda_{TV}TV(x),
$$

and random shifts improve translation-consistent structure.


## Vision Outputs


In [None]:
show_fig("vgg16_six_image_predictions.png", width=900)
show_fig("gradcam_demo.png", width=600)
show_fig("gradcam_overlay_demo.png", width=900)
show_fig("guided_gradcam_example.png", width=900)
show_fig("smoothgrad_guided_comparison.png", width=900)
show_fig("smoothgrad_guided_backprop.png", width=900)
show_fig("smoothgrad_guided_gradcam.png", width=900)
show_fig("smoothgrad_sample_sweep.png", width=900)
show_fig("smoothgrad_convergence_metrics.png", width=900)
show_fig("adversarial_fgsm_comparison.png", width=980)
show_fig("feature_visualization_hen.png", width=900)


### Vision Interpretation

The six-image panel provides class-diverse probes for explainability stress tests. Grad-CAM and overlay plots confirm coherent class-localized activation concentration. Guided Grad-CAM sharpens details while preserving localization, and SmoothGrad demonstrates the expected variance-reduction behavior as $K$ grows (supported by cosine, entropy, and total variation trends in the summary metrics). FGSM comparison shows prediction and saliency can shift under bounded adversarial perturbation, highlighting robustness concerns in explanation reliability. Feature-visualization results show that TV regularization plus random shifts transform noisy maximization artifacts into more stable, structured class-relevant patterns.


## Requirement Coverage Check (Programmatic)

This quick check verifies that all expected report artifact files exist after notebook execution.


In [None]:
required_figures = [
    "class_distribution.png",
    "eda_correlation_matrix.png",
    "eda_pairplot.png",
    "eda_outlier_boxplots.png",
    "training_loss_curves.png",
    "confusion_matrix_comparison.png",
    "roc_pr_comparison.png",
    "calibration_comparison.png",
    "threshold_sensitivity.png",
    "permutation_importance_comparison.png",
    "lime_shap_agreement.png",
    "lime_shap_compare_sample_0.png",
    "lime_shap_compare_sample_1.png",
    "lime_shap_compare_sample_2.png",
    "shap_force_sample_0.png",
    "shap_force_sample_1.png",
    "shap_force_sample_2.png",
    "correlation_vs_shap_importance.png",
    "grace_counterfactual_shap_shift.png",
    "nam_feature_functions.png",
    "vgg16_six_image_predictions.png",
    "gradcam_demo.png",
    "gradcam_overlay_demo.png",
    "guided_gradcam_example.png",
    "smoothgrad_guided_comparison.png",
    "smoothgrad_guided_backprop.png",
    "smoothgrad_guided_gradcam.png",
    "smoothgrad_sample_sweep.png",
    "smoothgrad_convergence_metrics.png",
    "adversarial_fgsm_comparison.png",
    "feature_visualization_hen.png",
]

missing = [f for f in required_figures if not (FIG_DIR / f).exists()]
print("Total required figures:", len(required_figures))
print("Missing:", len(missing))
if missing:
    print(missing)
else:
    print("All required figures are present.")
print("Metrics summary exists:", METRICS_JSON.exists())


## Final Notes

- This notebook is intentionally aligned with the report pipeline for strict reproducibility.
- For PDF regeneration run:

```bash
cd ../report
make pdf
```

- The appendix-level theory in the report expands the mathematical assumptions for each method (LIME, SHAP, NAM, calibration, FGSM, activation maximization, SmoothGrad convergence).
