# B-tagging



### Authors: 
* Tatiana Likhomanenko (contact)
* Alexey Rogozhnikov
* Denis Derkach

In [1]:
from IPython.display import Image
import pandas

# Opposite side tagging (OS) tagging (Current version)

https://github.com/tata-antares/tagging_LHCb/blob/master/old-tagging.ipynb

We first tested the current algorithm (OS taggers: muon, electron, kaon, vertex). 
TMVA original method was compared with XGBoost.

* isotonic symmetric calibration
* use different train-test divisions to calculate $D^2$
* compute mean and std 
* detail see below (the same formulas)

### Electron, muon, kaon and vertex taggers

In [2]:
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging-parts.csv').drop(['AUC, with untag', '$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$"
0,vtx_xgboost,18.1978,0.0495,0.0506,0.0011,0.9212,0.0198
1,vtx_tmva,18.1978,0.0495,0.042,0.0009,0.7649,0.0171
2,$K$_xgboost,17.0796,0.0479,0.0546,0.0011,0.9332,0.0182
3,$K$_tmva,17.0796,0.0479,0.0476,0.0013,0.8129,0.0219
4,$e$_xgboost,1.6424,0.0149,0.1745,0.0063,0.2865,0.0107
5,$e$_tmva,1.6424,0.0149,0.1716,0.0072,0.2818,0.012
6,$\mu$_xgboost,5.4385,0.0271,0.1709,0.0036,0.9294,0.0202
7,$\mu$_tmva,5.4385,0.0271,0.1649,0.003,0.8967,0.0168


### Combined taggers

We then tested a combination with two calibrations for individual taggers: 

* isotonic regression 
* logistic regression. 

Combination was calibrated using isotonic regression.

In [3]:
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging.csv').drop(['$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag"
0,iso-xgb_combined,32.735,0.0664,0.0703,0.0018,2.3003,0.0583,56.7023
1,iso-tmva_combined,32.735,0.0664,0.0665,0.0019,2.1778,0.0611,56.5846
2,log-xgb_combined,32.735,0.0664,0.0736,0.0008,2.4104,0.0277,56.6686
3,log-tmva_combined,32.735,0.0664,0.0682,0.0008,2.2317,0.0278,56.5878


# Opposite side tagging (OS) tagging (NEW)

## Data: 

* real data $B^+ \to J/\psi K^+$, $B^- \to J/\psi K^-$ (RECO 14)
* apply sPlot to obtain sWeight ~ P(B)
* set `event_id` - EventNumber & RunNumber

## Tracking inclusive OS tagging
https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging.ipynb


### Selections

* Input: all possible tracks for all B-events.
* (PIDNNp < tr) & (PIDNNpi < tr) & (ghostProb < 0.4), tr=0.6
* ((PIDNNk > trk) | (PIDNNm > trm) | (PIDNNe > tre)), trk=0.7, trm=0.4, tre=0.6


### $\epsilon_{tag}$ calculation

$$N (\text{B events, passed selection}) = \sum_{\text{B events, passed selection}} sw_i$$

$$N (\text{all B events}) = \sum_{\text{all B events}} sw_i,$$

where $sw_i$ - sPLot weight

$$\epsilon_{tag} = \frac{N (\text{passed selection})} {N (\text{all events})}$$

$$\Delta\epsilon_{tag} = \frac{\sqrt{N (\text{passed selection})}} {N (\text{all events})}$$



### Data for training

* `data_sw_passed` - tracks with **B-sWeight > 1**, are used for training
* `data_sw_not_passed` - tracks with **B-sWeight <= 1**, are tagged after training

### B mass before sWeight cut

In [4]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass.png')

### B mass after sWeight cut

In [5]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_selected.png')

### Number of tracks for events

In [6]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/tracks_number.png')

### PIDNN distributions after selection

In [7]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected.png')

In [8]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected_hist.png')

### Training 

#### Features (sig = signal, part = tagger track):

* `cos_diff_phi` = $\cos(\phi^{sig} - \phi^{\rm part})$
* `diff_pt` = $\max(p_T)^{part} - p_T(B^{sig})$
* `partPt`= $p_T^{part}$
* `max_PID_e_mu` = $\max(PIDNN(e), PIDNN(\mu))^{part}$
* `partP` = $p^{part}$
* `nnkrec` = Number of reconstructed vertices
* `diff_eta` = $(\eta^{sig} - \eta^{\rm part})$
* `EOverP` = E/P (from CALO)
* `sum_PID_k_mu` = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(\mu))$
* `ptB` = $p_T^{sig}$

* `sum_PID_e_mu` = $\sum\limits_{i\in part}(PIDNN(e)+PIDNN(\mu))$
* `sum_PID_k_e` = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(e))$
* `proj` = $(\vec{p}^{sig},\vec{p}^{part})$
* `PIDNNe` = $PIDNN(e)$
* `PIDNNk` = $PIDNN(K)$
* `PIDNNm` = $PIDNN(\mu)$
* `phi` = $\phi^{sig}$
* `IP` = number of IPs in the event

* `max_PID_k_mu` = $max(PIDNN(K)+PIDNN(\mu))$
* `IPerr` = error of IP
* `IPs` = IP/IPerr
* `veloch` = dE/dx track charge from the VELO system
* `max_PID_k_e`  = $max(PIDNN(K)+PIDNN(e))$
* `diff_phi`  = $(\phi^{sig} - \phi^{\rm part})$
* `ghostProb` = ghost probability 
* `IPPU` =  impact parameter with respect to any other reconstructed primary vertex.
* `eta` = pseudorapity of signal particle
* `partlcs` = chi2PerDoF for a track 

#### Classifier

Try to define B sign using track sign (to define they have the same signs or opposite).

`target` = `signB` * `signTrack` > 0

* classifier returns 
$$P(\text{track same sign as B| B sign}) = $$
$$ =P(\text{B same sign as track| track sign})$$
* 2-folding training on the full training sample to use full sample for futher analysis (folding scheme provides not overfitted model, details: http://yandex.github.io/rep/metaml.html#module-rep.metaml.folding)

#### Calibration of $P(\text{track same sign as B| B sign})$

* use 2-folding logistic calibration for track classifier's prediction
* compare with isotonic calibration (bad)
* compare with absent calibration (bad, have shift predictions)

#### Computation of $p(B^+)$ using $P(\text{track same sign as B| B sign})$

Compute $p(B^+)$ using this probabilistic model representation (similar to the previous tagging combination):

$$ \frac{P(B^+)}{P(B^-)} = \prod_{track, vertex} \frac{P(\text{track/vertex}|B^+)} {P(\text{track/vertex} |B^-)} = \alpha
\qquad $$
$$\Rightarrow\qquad P(B^+) = \frac {\alpha}{1+\alpha},   \qquad \qquad    [1] $$
where

$$
\frac{P(B^+)}{P(B^-)} = \prod_{track, vertex} 
\begin{cases}
\frac{P(\text{track/vertex same sign as } B| B)}{P(\text{track/vertex opposite sign as } B| B)}, \text{if track/vertex}^+ \\ \\
\frac{P(\text{track/vertex opposite sign as } B| B)}{P(\text{track/vertex same sign as } B| B)}, \text{if track/vertex}^- 
\end{cases}
$$

$$p_{mistag} = min(p(B^+), p(B^-))$$

#### Intermediate estimation $<D^2>$ for tracking

Do calibration of $p(B^+)$ and compute $<D^2>$ :

* use Isotonic calibration (generalization of bins fitting by linear function) - piecewise-constant monotonic function
* randomly divide events into two parts (1-train, 2-calibrate)
* symmetric isotonic fitting on train and $<D^2>$ computation on test
* take mean and std for computed $<D^2>$

$<D^2>$  formula for sample:
$$<D^2> =  \frac{\sum_i[2(p^{mistag}_i - 0.5)]^2 * sw_i}{\sum_i sw_i} = $$
$$ =  \frac{\sum_i[2(p_i(B^+) - 0.5)]^2 * sw_i}{\sum_i sw_i}$$

Formula is symmetric and it is not necessary to compute mistag probability

## Vertex OS tagging
https://github.com/tata-antares/tagging_LHCb/blob/master/vertex-based-tagging.ipynb


### Selections

* All selection in c++ code are removed except DaVinci probability cuts


### Data for training

* `data_sw_passed` - tracks with **B-sWeight > 1**, are used for training
* `data_sw_not_passed` - tracks with **B-sWeight <= 1**, are tagged after training

### Training 

#### Features:

* `mult` = multiplicity in the event
* `nnkrec` = number of reconstructed vertices 
* `ptB` = signal B transverse momentum 
* `vflag` = number of tracks in the vertex
* `ipsmean` = mean distance to IP of the tracks
* `ptmean`  = mean pt of the tracks
* `vcharge` = charge of the vertex weigthed by pt
* `svm` = mass of the vertex 
* `svp` = momentum of the vertex
* `BDphiDir` = distance betwen B and vertex direction
* `svtau`  = lifetime of the vertex
* `docamax` = mean DOCA of the tracks

#### Classifier

Try to define B sign using vertex sign (to define they have the same signs or opposite).

`target` = `signB` * `signTrack` > 0

* classifier returns 
$$P(\text{vertex same sign as B| B sign}) = $$
$$ = P(\text{B same sign as vertex| vertex sign})$$
* 2-folding training on the full training sample to use full sample for futher analysis (folding scheme provides not overfitted model, details: http://yandex.github.io/rep/metaml.html#module-rep.metaml.folding)

#### Calibration of $P(\text{vertex same sign as B| B sign})$

* use 2-folding isotonic calibration for vertex classifier's prediction
* compare with logistic calibration (bad)

#### Computation of $p(B^+)$ using $P(\text{vertex same sign as B| B sign})$

The same formula as for tracks.

#### Intermediate estimation $<D^2>$ for vertices

Do calibration of $p(B^+)$ and compute $<D^2>$ :

* use Isotonic calibration (generalization of bins fitting by linear function) - piecewise-constant monotonic function
* randomly divide events into two parts (1-train, 2-calibrate)
* symmetric isotonic fitting on train and $<D^2>$ computation on test
* take mean and std for computed $<D^2>$

$<D^2>$  formula for sample:
$$<D^2> =  \frac{\sum_i[2(p^{mistag}_i - 0.5)]^2 * sw_i}{\sum_i sw_i} = $$
$$ =  \frac{\sum_i[2(p_i(B^+) - 0.5)]^2 * sw_i}{\sum_i sw_i}$$

Formula is symmetric and it is not necessary to compute mistag probability

## Preliminary estimation

### $\epsilon$ calculation

$$\epsilon = <D^2> * \epsilon_{tag}$$

$$\Delta \epsilon = \sqrt{ \left(\frac{\Delta <D^2>}{<D^2>}\right)^2 + \left(\frac{\Delta \epsilon_{tag} }{\epsilon_{tag}} \right)^2 }$$

* Combine track-based and vertex-based tagging using formula [1]
* symmetric isotonic calibration on random subsample with $D^2$ calculation
* take mean and std for computed $<D^2>$

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging.ipynb

In [9]:
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging.csv').drop(['$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag"
0,Inclusive tagging,77.78995,0.10233,0.03449,0.00046,2.68331,0.03576,57.92576


## Full estimation of systematic error

https://github.com/tata-antares/tagging_LHCb/blob/master/tagging-full-systematic.ipynb

* set random state
* train the best model (track and vertex taggers with 2-folding with fixed random state)
* do calibration for track and vertex taggers with 2-folding with fixed random state
* compute $p(B^+)$
* do calibration with isotonic 2-folding (random state is fixed)
* compute $<D^2>$

This procedure is repeated (from the scratch) for 30 different random states and then we compute **mean** and **std** for these 30 values of $<D^2>$.

In [10]:
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging-systematic.csv')

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag","$\Delta$ AUC, with untag"
0,NEW + full systematic,77.78995,0.10233,0.03492,0.0001,2.71629,0.00877,57.9586,0.02456


## Calibration checking

In [11]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_iso_calibrated.png')

### Isotonic transformation

In [12]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/iso_transformation.png')

### Check calibration of mistag
* axis x: predicted mistag probability 
$$p_{mistag} = min(p(B^+), p(B^-))$$
* axis y: true mistag probability (computed for bin)
$$p_{mistag} = \frac{N_{wrong}} {N_{wrong} + N_{right}}$$

$$\Delta p_{mistag} = \frac{\sqrt{N_{wrong} N_{right}}} {(N_{wrong} + N_{right})^{1.5}}$$

#### before calibration

In [13]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_uniform.png')

In [14]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_percentile.png')

#### Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)


In [15]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_iso_uniform.png')

In [16]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_iso_percentile.png')

# UPDATE 3:

# Inclusive tagging(NEW)

## Tracking inclusive OS tagging 
https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-OS.ipynb


### Selections

* Input: all possible tracks for all B-events.
* (PIDNNp < tr) & (PIDNNpi < tr) & (ghostProb < 0.4), tr=0.6
* ((PIDNNk > trk) | (PIDNNm > trm) | (PIDNNe > tre)), trk=0., trm=0., tre=0.

### B mass before sWeight cut

In [17]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_OS.png')

### B mass after sWeight cut

In [18]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_selected_OS.png')

### Number of tracks for events

In [19]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/tracks_number_OS.png')

### PIDNN distributions after selection

In [20]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected_OS.png')

## Preliminary estimation (track OS + vertex OS)

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-OS.ipynb

In [21]:
pandas.set_option('display.precision', 5)
pandas.read_csv('img/eff_OS.csv').drop(['$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag"
0,"Inclusive tagging, PID less",99.11029,0.11551,0.03808,0.00044,3.77408,0.04411,60.79013


### Check calibration of mistag

In [22]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_iso_calibrated_OS.png')

#### before calibration

In [23]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_percentile_OS.png')

#### Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)


In [24]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_iso_percentile_OS.png')

## Tracking inclusive SS tagging without cuts
https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-SS.ipynb


### Selections

* Input: all possible tracks for all B-events.
* (IPs < 3) & (abs(diff_eta) < 0.6) & (abs(diff_phi) < 0.825) & (ghostProb < 0.4)
* ((PIDNNk > {trk}) | (PIDNNm > {trm}) | (PIDNNe > {tre}) | (PIDNNpi > {trpi}) | (PIDNNp > {trp})), trk=0, trm=0, tre=0, trpi=0, trp=0

### B mass before sWeight cut

In [25]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_SS.png')

### B mass after sWeight cut

In [26]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_selected_SS.png')

### Number of tracks for events

In [27]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/tracks_number_SS.png')

### PIDNN distributions after selection

In [28]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected_SS.png')

## Preliminary estimation (track SS only)
https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-SS.ipynb

In [29]:
pandas.set_option('display.precision', 5)
pandas.read_csv('img/eff_tracking_SS.csv').drop(['$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag"
0,"Inclusive tagging, PID less",72.39764,0.09872,0.03077,0.00035,2.22756,0.02573,57.419


### Check calibration of mistag

In [30]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_iso_calibrated_SS.png')

#### before calibration

In [31]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_percentile_SS.png')

#### Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)

In [32]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_iso_percentile_SS.png')

## Tracking inclusive OS+SS tagging
https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-PID-less.ipynb


### Selections

* Input: all possible tracks for all B-events.
* (ghostProb < 0.4)
* ((PIDNNk > {trk}) | (PIDNNm > {trm}) | (PIDNNe > {tre}) | (PIDNNpi > {trpi}) | (PIDNNp > {trp})), trk=0, trm=0, tre=0, trpi=0, trp=0

### B mass before sWeight cut

In [33]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_less_PID.png')

### B mass after sWeight cut

In [34]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_selected_less_PID.png')

### Number of tracks for events

In [35]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/tracks_number_less_PID.png')

### PIDNN distributions after selection

In [36]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected_less_PID.png')

## Preliminary estimation (track: OS+SS, vertex: OS)

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-all.ipynb

In [37]:
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging-PID-less.csv').drop(['$\Delta$ AUC, with untag'], axis=1)

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag"
0,"Inclusive tagging, PID less",99.98595,0.11601,0.05873,0.00043,5.87239,0.04359,64.08899


### Check calibration of mistag

In [38]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_iso_calibrated_PID_less.png')

#### before calibration

In [39]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_percentile_PID_less.png')

#### Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)

In [40]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bprob_calibration_check_iso_percentile_PID_less.png')

## TODO in progress

* convert python models to xml format (for production stage)
* add feature idea (Stefania)
* get all vertices from DaVinci