Skip to content

(Official) PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

License

Notifications You must be signed in to change notification settings

umarsqureshi/PanopTag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

PanopTag Architecture

Overview

This is the official repository for PanopTag, a novel deep learning approach for simultaneous jet flavor tagging in high-energy particle physics. PanopTag processes all jets in a collision event jointly using a transformer-based architecture with an EdgeConv+ISAB encoders and a DETR-inspired decoder. The model significantly outperforms existing baselines that process one jet at a time.

Abstract

Jet tagging, identifying the origin of jets produced in particle collisions, is a critical classification task in high-energy physics. Despite the revolutionary impact of deep learning on jet tagging over the past decade, the paradigm has remained unchanged. In particular, jets are classified independently, one at a time. This single-jet approach ignores correlations, overlaps, and wider event context between jets. We introduce PanopTag, a new paradigm for jet tagging that departs from traditional single-jet tagging approaches. Rather than classifying jets independently, PanopTag simultaneously tags all jets by employing an encoder-decoder architecture that uses jet kinematics as queries to cross-attend to particle flow object embeddings. We evaluate PanopTag on heavy-flavor $(b/c)$-tagging and demonstrate remarkable performance improvements over state-of-the-art single-jet baselines that are only accessible by exploiting event-level features and correlations between jets.

Key Highlights

b-jet Tagging Performance c-jet Tagging Performance

Performance Summary

Usage

  1. Prepare your data (npz format with PFC and jet information):

    # Data should contain:
    # - pfcs_list: [n_events] array of PFC features [n_pfcs, 14]
    # - pfcs_mask_list: [n_events] array of PFC masks [n_pfcs]
    # - jets_list: [n_events] array of jet kinematics [n_jets, 4]
    # - jets_mask_list: [n_events] array of jet masks [n_jets]
    # - jets_label: [n_events] array of jet labels [n_jets, 3] (one-hot)
  2. Train the model:

    python main.py \
      --data_npz path/to/data.npz \
      --epochs 31 \
      --batch_size 256 \
      --lr 2e-4 \
      --output_dir ./panoptag_model
  3. Model evaluation: The training script automatically:

    • Saves best model based on validation accuracy
    • Computes test set performance metrics
    • Saves predictions and attention weights
    • Outputs test results to {output_dir}/test_results.npz
  4. Test results include:

    • logits_list: Raw model outputs [n_events, n_jets, 3]
    • probabilities_list: Softmax probabilities [n_events, n_jets, 3]
    • predictions_list: Argmax predictions [n_events, n_jets]
    • true_labels_list: Ground truth labels [n_events, n_jets, 3]
    • attention_list: Attention weights for interpretability
    • Full PFC and jet information for analysis

Command Line Arguments

--data_npz              Path(s) to .npz training data (required)
--train_split           Training data fraction (default: 0.8)
--val_split             Validation data fraction (default: 0.1)
--test_split            Test data fraction (default: 0.1)
--seed                  Random seed (default: 42)
--epochs                Number of epochs (default: 31)
--batch_size            Batch size (default: 256)
--lr                    Learning rate (default: 2e-4)
--wd                    Weight decay (default: 1e-3)
--dim_hidden            Hidden dimension (default: 256)
--num_heads             Number of attention heads (default: 32)
--num_inds              Number of inducing points (default: 48)
--enc_depth             Encoder depth (default: 3)
--dec_depth             Decoder depth (default: 3)
--num_workers           DataLoader workers (default: 2)
--device                Device (default: cuda if available, else cpu)
--output_dir            Output directory for models (default: ./panoptag_model)
--num_local             EdgeConv layers for local structure (default: 2)
--k                     Number of neighbors for k-NN (default: 20)
--warmup_epochs         Linear warmup epochs (default: 1)
--restart_interval      Cosine annealing restart interval (default: 16)

Dependencies

  • pytorch, torchvision
  • numpy
  • tqdm
  • pandas

TODO

  • Add dataset creation code
  • Add Pythia + Delphes simulation code
  • Add plotting macros

About

(Official) PanopTag: Simultaneously Tagging All Jets in a Particle Collision Event

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages