LHCb RUN-II topological trigger upgrading
- If you are interested in details, see paper
For RUN-II new scheme is applied:
- HLT1 track
- HLT1 2-body
- HLT2 n-body
- Sample: one proton-proton bunches collision, called Event (40MHz)
- Event consists of the secondary vertices (SVR) or tracks, where particles are produced
- Features: an SVR, tracks and its products physical characteristics reconstructed from the detectors (momentum, mass, angles, impact parameter)
- Training data are set of SVRs for HLT2 n-body and HLT1 2-body or trakcs for HLT1 track all events
- Monte Carlo 2015 data (signal data) were simulated for various types of interesting events (different decays):
- all decays are used in HLT1 2-body and HLT1 track training
- six types of decays are used for HLT2 n-body training and all for testing
- Minimum bias data (real data for a small period of time) are used as background data
- Event is interesting from physical point of view if it contains at least one SVR, where searched decay happens
- SumPT (sumpt): sum of transverse momentums (pt) for all tracks in the SVR;
- MCOR (mcor): "corrected" mass of the SVR;
- IPChi2 (ipchi2): impact parameter chi2 of the SVR;
- MinPT (minpt): the minimum of tracks pt in the SVR;
- FDChi2 (fdchi2): flight distance chi2 of the SVR from the p-p collision;
- NIPChi2LT16 (nlt16): number of tracks in the primary vertex with IPChi2 < 16;
- N (n): number of tracks in the SVR;
- NHLT1 (n1trk): number of tracks passing HLT1 (high level trigger first stage);
- VChi2 (chi2): vertex chi2 of the SVR;
- Eta (eta): pseudorapidity;
- PT (pt): transverse momentum;
- M (m): mass of the SVR;
- MinFDR (fdr): min radial (x-y plane) flight distance to any p-p collision;
- SumIPchi2 (sumipchi2): sum of IPchi2 for all tracks in the SVR;
- Output rate is fixed, thus, false positive rate (FPR) for events is fixed
- Goal is to improve efficiency for each type of signal events
- We improve true positive rate (TPR) for fixed FPR for events
There are two possibilities to speed up prediction operation for production stage:
- Bonsai boosted decision tree format (BBDT)
- Features hashing using bins before training
- Converting decision trees to n-dimensional table (lookup table)
- Table size is limited in RAM, thus count of bins for each features should be small
- Discretization reduces quality
- Post-pruning (MatrixNet includes several thousand trees)
- Train MatrixNet with several thousands trees
- Reduce this amount of trees to a hundred
- Greedily choose trees in a sequence from the initial ensemble to minimize a modified loss function (exploss for background and logloss for signal)
- Change values in leaves (tree structure is preserved)
- download root files to folder datasets
- run preprocessing to create .csv files with tracks and SVRs
- HLT1 track creates models and plots for HLT1 track trigger
- HLT1 2-body creates models and plots for HLT1 2-body trigger
- [HLT2 n-body] (https://github.com/tata-antares/LHCb-topo-trigger/blob/master/HLT2.ipynb) creates models and plots for HLT2 n-body trigger
- BBDT and post-pruning creates production models with BBDT format and post pruning for HLT2 n-body
- BBDT format creates .bbdt files (lookup table) from trained trees