In [1]:
from IPython.display import Image
import pandas

# Opposite side tagging (OS) tagging (Current version)

https://github.com/tata-antares/tagging_LHCb/blob/master/old-tagging.ipynb

In [None]:
pandas.read_csv('img/old-tagging.csv')

# Opposite side tagging (OS) tagging (NEW)

## Data: 

* real data $B^+ \to J\psi K^+$, $B^- \to J\psi K^-$ 
* apply sPlot to obtain sWeight ~ P(B)
* set `event_id` - EventNumber & RunNumber

## Tracking inclusive OS tagging
https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging.ipynb


### Selections

* Input: all possible tracks for all B-events.
* (PIDNNp < tr) & (PIDNNpi < tr) & (ghostProb < 0.4), tr=0.6
* ((PIDNNk > trk) | (PIDNNm > trm) | (PIDNNe > tre)), trk=0.7, trm=0.4, tre=0.6



### $\epsilon_{tag}$ calculation

$$N (\text{passed selection}) = \sum_{\text{passed selection}} sw_i$$

$$N (\text{all events}) = \sum_{\text{all events}} sw_i,$$

where $sw_i$ - sPLot weight

$$\epsilon_{tag} = \frac{N (\text{passed selection})} {N (\text{all events})}$$

$$\Delta\epsilon_{tag} = \frac{\sqrt{N (\text{passed selection})}} {N (\text{all events})}$$


### Data for training

* `data_sw_passed` - tracks with **B-sWeight > 1**, are used for training
* `data_sw_not_passed` - tracks with **B-sWeight <= 1**, are tagged after training

### B mass before sWeight cut

In [2]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass.png')

### B mass after sWeight cut

In [3]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/Bmass_selected.png')

### Number of tracks for events

In [4]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/tracks_number.png')

In [5]:
Image(url='https://raw.githubusercontent.com/tata-antares/tagging_LHCb/master/img/PID_selected.png')

### PIDNN distributions after selection

### Training 

#### Features:

* `cos_diff_phi` = $\cos(\phi^{sig} - \phi^{\rm part})$
* `diff_pt` = $\max(p_T)^{part} - p_T(B^{sig})$
* `partPt`= $p_T^{part}$
* `max_PID_e_mu` = $\max(PIDNN(e), PIDNN(\mu))^{part}$
* `partP` = $p^{part}$
* `nnkrec` = Number of reconstructed vertices
* `diff_eta` = $\eta^{sig} - \eta^{\rm part})$
* `EOverP` = E/P (from CALO)
* `sum_PID_k_mu` = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(\mu))$
* `ptB` = $p_T^{sig}$
* `sum_PID_e_mu` = $\sum\limits_{i\in part}(PIDNN(e)+PIDNN(\mu))$
* `sum_PID_k_e` = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(e))$
* `proj` = $(\vec{p}^{sig},\vec{p}^{part})$
* `PIDNNe` = $PIDNN(e)$
* `PIDNNk` = $PIDNN(K)$
* `PIDNNm` = $PIDNN(\mu)$
* `phi` = $\phi^{sig}$
* `IP` = $IP$
* `max_PID_k_mu` = $max(PIDNN(K)+PIDNN(\mu))$
* `IPerr` = error of IP
* `IPs` = No of IPs
* `ID` = ID of the track (depends on the track container, where the track comes from)
* `veloch` = The dE/dx charge from the VELO system
* `max_PID_k_e`  = $max(PIDNN(K)+PIDNN(e))$
* `diff_phi`  = $(\phi^{sig} - \phi^{\rm part})$
* `ghostProb` = ghost probability (NN)
* `IPPU` =  impact parameter with respect to any other reconstructed primary vertex.
* `eta` = pseudorapity of signal particle
* `partlcs` = chi2PerDoF for a track 

#### Classifier

Try to define B sign using track sign (to define they have the same signs or opposite).

`target` = `signB` * `signTrack` > 0

* classifier returns $P(\text{track same sign as B| B sign}) = P(\text{B same sign as track| track sign})$
* 2-folding training on the full training sample to use full sample for futher analysis (folding scheme provides not overfitted model, details: http://yandex.github.io/rep/metaml.html#module-rep.metaml.folding)

#### Calibration of $P(\text{track same sign as B| B sign})$

* use 2-folding logistic calibration for track classifier's prediction
* compare with isotonic calibration (bad)
* compare with absent calibration (bad, have shift predictions)


#### Computation of $p(B^+)$ using $P(\text{track same sign as B| B sign})$

Compute $p(B^+)$ using this probabilistic model representation (similar to previous tagging combination):

$$ \frac{P(B^+)}{P(B^-)} = \prod_{track, vertex} \frac{P(\text{track/vertex}|B^+)} {P(\text{track/vertex} |B^-)} = P
\qquad \Rightarrow\qquad P(B^+) = \frac {P}{1+P},   \qquad \qquad    [1] $$
where

$$
\frac{P(B^+)}{P(B^-)} =  
\begin{cases}
\frac{P(\text{track/vertex same sign as } B| B)}{P(\text{track/vertex opposite sign as } B| B)}, \text{if track/vertex}^+ \\ \\
\frac{P(\text{track/vertex opposite sign as } B| B)}{P(\text{track/vertex same sign as } B| B)}, \text{if track/vertex}^- 
\end{cases}
$$

#### Intermediate estimation $<D^2>$ for tracking

Do calibration of $p(B^+)$ and compute $<D^2>$ :

* use Isotonic calibration (generalization of bins fitting by linear function) - piecewise-constant monotonic function
* randomly divide events into two parts (1-train, 2-calibrate)
* Isotonic fitting on train and $<D^2>$ computation on test
* take mean and std for computed $<D^2>$

$<D^2>$  formula for sample:
$$<D^2> =  \frac{\sum_i[2(p^{mistag}_i - 0.5)]^2 * sw_i}{\sum_i sw_i}
=  \frac{\sum_i[2(p_i(B^+) - 0.5)]^2 * sw_i}{\sum_i sw_i}$$

Formula is symmetric and it is not necessary to compute mistag probability

## Vertex OS tagging
https://github.com/tata-antares/tagging_LHCb/blob/master/vertex-based-tagging.ipynb


### Selections

* All selection in c++ code are removed except DaVinci probability cuts


### Data for training

* `data_sw_passed` - tracks with **B-sWeight > 1**, are used for training
* `data_sw_not_passed` - tracks with **B-sWeight <= 1**, are tagged after training

### Training 

#### Features:

* `mult` = multiplicity in the event
* `nnkrec` = number of reconstructed vertices 
* `ptB` = signal B transverse momentum 
* `vflag` = number of tracks in the vertex
* `ipsmean` = mean distance to IP of the tracks
* `ptmean`  = mean pt of the tracks
* `vcharge` = charge of the vertex weigthed by pt
* `svm` = mass of the vertex 
* `svp` = momentum of the vertex
* `BDphiDir` = distance betwen B and vertex direction
* `svtau`  = lifetime of the vertex
* `docamax` = mean DOCA of the tracks
             
#### Classifier

Try to define B sign using vertex sign (to define they have the same signs or opposite).

`target` = `signB` * `signTrack` > 0

* classifier returns $P(\text{vertex same sign as B| B sign}) = P(\text{B same sign as vertex| vertex sign})$
* 2-folding training on the full training sample to use full sample for futher analysis (folding scheme provides not overfitted model, details: http://yandex.github.io/rep/metaml.html#module-rep.metaml.folding)

#### Calibration of $P(\text{vertex same sign as B| B sign})$

* use 2-folding isotonic calibration for vertex classifier's prediction
* compare with isotonic calibration (bad)

#### Computation of $p(B^+)$ using $P(\text{vertex same sign as B| B sign})$

The same formula as for tracks.

#### Intermediate estimation $<D^2>$ for vertices

Do calibration of $p(B^+)$ and compute $<D^2>$ :

* use Isotonic calibration (generalization of bins fitting by linear function) - piecewise-constant monotonic function
* randomly divide events into two parts (1-train, 2-calibrate)
* Isotonic fitting on train and $<D^2>$ computation on test
* take mean and std for computed $<D^2>$

$<D^2>$  formula for sample:
$$<D^2> =  \frac{\sum_i[2(p^{mistag}_i - 0.5)]^2 * sw_i}{\sum_i sw_i}
=  \frac{\sum_i[2(p_i(B^+) - 0.5)]^2 * sw_i}{\sum_i sw_i}$$

Formula is symmetric and it is not necessary to compute mistag probability

## Preliminary estimation

* Combine track-based and vertex-based tagging using formula [1]
* Isotonic calibration on random subsample with $D^2$ calculation
* take mean and std for computed $<D^2>$

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging.ipynb

In [6]:
pandas.read_csv('img/new-tagging.csv')

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag","$\Delta$ AUC, with untag"
0,Inclusive tagging,77.789955,0.102331,0.034975,0.000439,2.720682,0.034318,57.982113,0


## Full estimation of systematic error

https://github.com/tata-antares/tagging_LHCb/blob/master/tagging-full-systematic.ipynb

* set random state
* train the best model (track and vertex taggers with 2-folding with fixed random state)
* do calibration for track and vertex taggers with 2-folding with fixed random state
* compute $p(B^+)$
* do calibration with isotonic 2-folding (random state is fixed)
* compute $<D^2>$

This procedure is repeated (from the scratch) for 30 different random states and then we compute **mean** and **std** for these 30 values of $<D^2>$.

In [7]:
pandas.read_csv('img/new-tagging-systematic.csv')

Unnamed: 0,name,"$\epsilon_{tag}, \%$","$\Delta \epsilon_{tag}, \%$",$D^2$,$\Delta D^2$,"$\epsilon, \%$","$\Delta \epsilon, \%$","AUC, with untag","$\Delta$ AUC, with untag"
0,NEW + full systematic,77.789955,0.102331,0.034897,8.3e-05,2.714597,0.007391,57.949854,0.025131


## TODO in progress

* convert python models to xml format (for production stage)
* add feature idea (Stephania)
* get all vertices from DaVinci