Skip to content

um-dsp/NeuroTrace

Repository files navigation

NeuroTrace

This repository is an implementation of our paper NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples. NeuroTrace supports:

  • IPG graph extraction from benign + adversarial/poisoned examples
  • Graph re-use via loading precomputed .pt files (no meta files)
  • GNN training / validation / testing across three experiment modes

1) Quickstart (recommended)

From the repository root:

conda install -n NeuroTrace python=3.10
conda activate NeuroTrace
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install git+https://github.com/fra31/auto-attack

3) Download paper graphs from Google Drive (Section 3)

4) Run using precomputed graphs (LOAD)

python main.py attack_on_attack
--graphs-policy load
--graph-root ./graphs
--results-root ./results
--attacks FGSM,PGD,APGD-DLR,SQUARE,SPSA,SIA
--epochs 30 --batch-size 16


If you **don’t** have precomputed graphs, use `--graphs-policy generate` instead.

---


## 2) Dataset (graphs) download + placement

Google Drive:

https://drive.google.com/file/d/1JgT8JfQ4_Eie13W-9gCmai66zm16ce-B/view?usp=sharing


### 2.1 What you download
We distribute **precomputed IPG graphs** as a compressed archive (zip or tar.gz).

### 2.2 Where to extract
Extract the archive into the repository so you get a top-level folder like `./graphs/`:

```bash
unzip graphs.zip -d .

Place the graph folder in the directory of NeuroTrace (location of main.py)

2.3 Expected folder layout after extraction

Your --graph-root should contain one folder per TAG (attack or mixed setting), and inside each tag:

graphs/
  FGSM/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  PGD/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  APGD-DLR/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  SQUARE/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  SPSA/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  SIA/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  ALL_MIXED/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  WHITE2BLACK_MIXED/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

  BLACK2WHITE_MIXED/
    train/benign.pt
    train/perturbed.pt
    val/benign.pt
    val/perturbed.pt

3) Graph policy: generate vs load

--graphs-policy load

  • Will not generate anything
  • Expects the 4 files to exist for each tag:
    • train/benign.pt
    • train/perturbed.pt
    • val/benign.pt
    • val/perturbed.pt
  • If missing, the run fails with an error.

--graphs-policy generate

  • Builds graphs from the victim model + loaders
  • Saves graphs into:
    • <graph-root>/<TAG>/train/*.pt
    • <graph-root>/<TAG>/val/*.pt
  • Then trains/evaluates the GNN.

4) Outputs

4.1 Results directory layout

Results are stored under:

<results-root>/<TAG>/
  <TAG>_metrics.csv
  <TAG>_summary.csv
  <TAG>_model_hgt.pt
  (optional) <TAG>_per_attack_eval.csv

5) The three experiment modes

Mode 1 — attack_on_attack

Train & test separately per attack (loops over attacks).

python main.py attack_on_attack \
  --graphs-policy load \
  --graph-root ./graphs \
  --results-root ./results \
  --attacks FGSM,PGD,APGD-DLR,SQUARE,SPSA,SIA \
  --epochs 30 --batch-size 16 \
  --seed 0

Outputs: per-attack folders under <results-root>/<ATTACK>/....


Mode 2 — all_attacks

Train one model on a mixture of all attacks, then evaluate per attack.

python main.py all_attacks \
  --graphs-policy load \
  --graph-root ./graphs \
  --results-root ./results \
  --attacks FGSM,PGD,APGD-DLR,SQUARE,SPSA,SIA \
  --mixed-tag ALL_MIXED \
  --epochs 30 --batch-size 16 \
  --seed 0

Outputs include:

  • results/ALL_MIXED/ALL_MIXED_metrics.csv
  • results/ALL_MIXED/ALL_MIXED_summary.csv
  • results/ALL_MIXED/ALL_MIXED_model_hgt.pt
  • results/ALL_MIXED/ALL_MIXED_per_attack_eval.csv

Mode 3 — white_black

Transfer experiments between white-box and black-box attack groups.

python main.py white_black \
  --graphs-policy load \
  --graph-root ./graphs \
  --results-root ./results \
  --whitebox-attacks FGSM,PGD,APGD-DLR \
  --blackbox-attacks SQUARE,SPSA,SIA \
  --direction both \
  --epochs 30 --batch-size 16 \
  --seed 0

Directions:

  • white2black
  • black2white
  • both

Outputs appear under:

  • <results-root>/WHITE2BLACK_MIXED/...
  • <results-root>/BLACK2WHITE_MIXED/...

6) Hardware / runtime notes

  • Graph extraction is expensive → GPU strongly recommended for --graphs-policy generate
  • For stable results:
    • Set --seed
    • Use a consistent PyTorch + PyG stack
    • Prefer --graphs-policy load when using our released graphs

Suggested: 64 GB RAM and a GPU with ~24 GB VRAM.

About

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages