PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

Overview

This is the official repository for PanopTag, a novel deep learning approach for simultaneous jet flavor tagging in high-energy particle physics. PanopTag processes all jets in a collision event jointly using a transformer-based architecture with an EdgeConv+ISAB encoders and a DETR-inspired decoder. The model significantly outperforms existing baselines that process one jet at a time.

Abstract

Jet tagging, identifying the origin of jets produced in particle collisions, is a critical classification task in high-energy physics. Despite the revolutionary impact of deep learning on jet tagging over the past decade, the paradigm has remained unchanged. In particular, jets are classified independently, one at a time. This single-jet approach ignores correlations, overlaps, and wider event context between jets. We introduce PanopTag, a new paradigm for jet tagging that departs from traditional single-jet tagging approaches. Rather than classifying jets independently, PanopTag simultaneously tags all jets by employing an encoder-decoder architecture that uses jet kinematics as queries to cross-attend to particle flow object embeddings. We evaluate PanopTag on heavy-flavor $(b/c)$-tagging and demonstrate remarkable performance improvements over state-of-the-art single-jet baselines that are only accessible by exploiting event-level features and correlations between jets.

Key Highlights

Usage

Prepare your data (npz format with PFC and jet information):

# Data should contain:
# - pfcs_list: [n_events] array of PFC features [n_pfcs, 14]
# - pfcs_mask_list: [n_events] array of PFC masks [n_pfcs]
# - jets_list: [n_events] array of jet kinematics [n_jets, 4]
# - jets_mask_list: [n_events] array of jet masks [n_jets]
# - jets_label: [n_events] array of jet labels [n_jets, 3] (one-hot)

Train the model:

python main.py \
  --data_npz path/to/data.npz \
  --epochs 31 \
  --batch_size 256 \
  --lr 2e-4 \
  --output_dir ./panoptag_model

Model evaluation: The training script automatically:
- Saves best model based on validation accuracy
- Computes test set performance metrics
- Saves predictions and attention weights
- Outputs test results to {output_dir}/test_results.npz
Test results include:
- logits_list: Raw model outputs [n_events, n_jets, 3]
- probabilities_list: Softmax probabilities [n_events, n_jets, 3]
- predictions_list: Argmax predictions [n_events, n_jets]
- true_labels_list: Ground truth labels [n_events, n_jets, 3]
- attention_list: Attention weights for interpretability
- Full PFC and jet information for analysis

Command Line Arguments

--data_npz              Path(s) to .npz training data (required)
--train_split           Training data fraction (default: 0.8)
--val_split             Validation data fraction (default: 0.1)
--test_split            Test data fraction (default: 0.1)
--seed                  Random seed (default: 42)
--epochs                Number of epochs (default: 31)
--batch_size            Batch size (default: 256)
--lr                    Learning rate (default: 2e-4)
--wd                    Weight decay (default: 1e-3)
--dim_hidden            Hidden dimension (default: 256)
--num_heads             Number of attention heads (default: 32)
--num_inds              Number of inducing points (default: 48)
--enc_depth             Encoder depth (default: 3)
--dec_depth             Decoder depth (default: 3)
--num_workers           DataLoader workers (default: 2)
--device                Device (default: cuda if available, else cpu)
--output_dir            Output directory for models (default: ./panoptag_model)
--num_local             EdgeConv layers for local structure (default: 2)
--k                     Number of neighbors for k-NN (default: 20)
--warmup_epochs         Linear warmup epochs (default: 1)
--restart_interval      Cosine annealing restart interval (default: 16)

Dependencies

pytorch, torchvision
numpy
tqdm
pandas

TODO

Add dataset creation code
Add Pythia + Delphes simulation code
Add plotting macros

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
losses.py		losses.py
main.py		main.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

Overview

Abstract

Key Highlights

Usage

Command Line Arguments

Dependencies

TODO

About

Uh oh!

Releases

Packages

Languages

License

umarsqureshi/PanopTag

Folders and files

Latest commit

History

Repository files navigation

PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

Overview

Abstract

Key Highlights

Usage

Command Line Arguments

Dependencies

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages