PCA estimation + validation, cleanups #122

kmcdermo · 2018-01-19T07:33:45Z

This PR is a mix of things, namely cleanups and PCA estimation and validation. For the cleanups, I followed up on issues raised in PR #119 (validation and removing the ifdef for the Phi-Q arrays), as well as some miscellaneous cleanup in the validation.

I added a new flag called "quality_val" that needs to be specified at run time if we want to run the printout-only validation. In the benchmarking, we actually run the printout validation which includes remapping hits (expensive) as well as track matching, but suppress the printouts anyways. So by disabling this section during the benchmarks, we can run them even faster now.

The main addition, however, is the inclusion of a PCA estimator, which we use to propagate the backward fit tracks to the PCA. Since this PR is 90% complete (see to-do list below), I do not have any fancy diagrams or presentations to show yet. The main idea is that we can project a straight line back in r-z (using theta) to compute the perpendicular distance between the line and a reference point (for now, the origin) to yield r-z coordinates of the PCA. Then, we propagate all tracks to their respective z-PCA using helixAtZ propagation.

For validation of this process, I dump the results of the PCA computation into the fitTracks_ collection, so as to be compared separately from the candidateTracks_. And since this was always motivated by comparing with CMSSW track parameters at the PCA, I use the track parameter matching for the fit tracks (although the default for candidate tracks is also the track param matching).

I have a number of problems I still want to fix (the biggest being that BkFitInputTracks() crashes on standard building when multithreaded) before this can be merged. However, I wanted to show off the validation by comparing built to fit track efficiencies:

Things to do and discuss:

~~drop sim track validation from benchmarking: I don't really check it any more, although we could use it as a diagnostic by running it over the high stats 10-muon sample~~ UPDATE: use 10mu sample for root val
~~fix the standard building with the backwards fit: @osschar , I have traced it down to BkFitInputTracks() ... but only when multithreaded...~~ DONE
~~turn off param b-field for fitting and see effect on validation (already saw that this helped in the last PR)~~ disabled for now
move CMSSW validation for candidate tracks to hit matching (default is track param based): we have seen that the track param matching fails anyhow for low pt tracks as dphi goes to NaN. We also know (for now) the hits are the same between the candidate tracks and the fit tracks (could add hit rejection during backward fit at some point) DONE
tune windows for chi2 (mom. theta, 1/pt) and dphi for fit tracks, or move to 3-parameter chi2 (mom. theta, 1/pt, mom. phi)
use external reference point other than origin to define PCA. Have demonstrated internally with @cerati @slava77 @osschar that we can further improve track parameter matching by storing the point used by CMSSW per event to define the reference for the PCA. --> UPDATE: I made a few short slides about various PCA estimates and their uses: pca.pdf

* remove KNC from benchmarks * move plotting code from main benchmark script to subscripts * update and simplify web scripts * add Config::quality_val flag for doing printout validation

* Properly use make distclean * Adjust KNL and SNB ranges * Add a common SSH variable

kmcdermo · 2018-01-19T15:38:08Z

I should note that above, there is some improvement at low pT, although not as much as I would have liked... retrying with the param b-field turned off for the backwards fit again improves things...

eff vs pt:

build tracks:
https://kmcdermo.web.cern.ch/kmcdermo/pr122/CMSSWVAL/build/SNB_CMSSW_TTbar_PU70_eff_pt_build_pt0.0_CMSSWVAL.png
fit tracks PR 120 w/ param b-field (i.e. as it is now):
https://kmcdermo.web.cern.ch/kmcdermo/pr122/CMSSWVAL/fit/SNB_CMSSW_TTbar_PU70_eff_pt_fit_pt0.0_CMSSWVAL.png
fit tracks PR 120 w/o param b-field: https://kmcdermo.web.cern.ch/kmcdermo/pr122/no_param_b_field/CMSSWVAL/fit/SNB_CMSSW_TTbar_PU70_eff_pt_fit_pt0.0_CMSSWVAL.png

As you can see, the matching of fit tracks with the param b-field are improved at low pT, and w/o the param b-field are improved significantly (10-20%).

Looking at vs. eta:

build tracks:
https://kmcdermo.web.cern.ch/kmcdermo/pr122/CMSSWVAL/build/SNB_CMSSW_TTbar_PU70_eff_eta_build_pt0.0_CMSSWVAL.png
fit tracks PR 120 w/ param b-field (i.e. as it is now):
https://kmcdermo.web.cern.ch/kmcdermo/pr122/CMSSWVAL/fit/SNB_CMSSW_TTbar_PU70_eff_eta_fit_pt0.0_CMSSWVAL.png
fit tracks PR 120 w/o param b-field: https://kmcdermo.web.cern.ch/kmcdermo/pr122/no_param_b_field/CMSSWVAL/fit/SNB_CMSSW_TTbar_PU70_eff_eta_fit_pt0.0_CMSSWVAL.png

It is clear that b-field is hurting us for some reason, particularly at low pT near the transition region... although it helps in the very forward reasons as one might expect where the b-field can be different by as much as 10% compared to the central region.

** Aside: the HEAD of devel matches exactly with build track validation, as expected.

* Set hit based matching as default for CMSSW validation with built tracks * Fit tracks use track parameter matching (2-param chi2 + dphi: to be optimized) * Use ROOTVAL with 10mu sample * Update scripts to reflect sample names

kmcdermo · 2018-01-23T22:19:40Z

I am done making commits for this PR for now. I will upload the full set of validation plots once done (of course, without KNL). There is quite a bit to be discussed, but some of it best tabled until the right pieces are in place.

Things to do:

Investigate why parameterized magnetic field seems to hurt more than help...
Decide which PCA estimation we wish to use. I uploaded a pdf in the PR description depicting what we could do and what we would need for input.
Optimize cuts for track param matching in bins of pt/eta for 2-param chi2 (1/pt, eta) + dphi. OR investigate possibility of 3-param chi2 (include phi). This optimization should be performed once we have decided which PCA estimation to use, as I do not want to do this optimization iteratively.
Potentially move BkFit back inside building. After discussing with Matevz, this is really the right thing to do, although there is no real harm as it is now, as we are first trying to validate this routine. This would mean copying out whatever is stored in matriplex format to the ev.candidateTracks_ vector after building, and then after the BkFit (while still inside the TBB tasks), copy out matriplex info again to ev.fitTracks_. This may require abstracting the copy I/O. We could also include the BkFit + I/O gymnastics in the benchmark timing as well.

kmcdermo · 2018-01-24T00:30:25Z

Full plots are here: https://kmcdermo.web.cern.ch/kmcdermo/pr122/full_benchmarks/

I will post an analysis of this some time tomorrow.

dan131riley · 2018-01-24T16:01:29Z

mkFit/MkBuilder.cc

    //   printf("  %4d with q=%+d chi2=%7.3f pT=%7.3f eta=% 7.3f x=%.3f y=%.3f z=%.3f nHits=%2d  label=%4d findable=%d\n",
    //          i, t.charge(), t.chi2(), t.pT(), t.momEta(), t.x(), t.y(), t.z(), t.nFoundHits(), t.label(), t.isFindable());
    // }
  }
 }
+
+void MkBuilder::BackwardFitFV()


I don't think this routine is necessary, MkBuilder::BackwardFit() should work for the FV case, all it should care about is that the EventOfCombCandidates is filled.

It can be dropped, for sure. I tested without, and it returns the same results:
https://kmcdermo.web.cern.ch/kmcdermo/pr122/noFVbkFit/CMSSWVAL/

kmcdermo added 5 commits January 18, 2018 19:23

Validation cleanup:

38fe02c

* remove KNC from benchmarks * move plotting code from main benchmark script to subscripts * update and simplify web scripts * add Config::quality_val flag for doing printout validation

Validation cleanup: address issues raised in PR 119

f9d7420

* Properly use make distclean * Adjust KNL and SNB ranges * Add a common SSH variable

Make LOH_USE_PHI_Q_ARRAYS a configurable parameter (remove ifdefs)

ed99696

add PCA calc for all finding methods

7346a88

compiles for validation with fit cmssw tracks

c678ec7

kmcdermo added 4 commits January 22, 2018 19:59

fix STD building with help from MT

83859fc

validation cleanups:

281ec69

* Set hit based matching as default for CMSSW validation with built tracks * Fit tracks use track parameter matching (2-param chi2 + dphi: to be optimized) * Use ROOTVAL with 10mu sample * Update scripts to reflect sample names

disable param b-field in bkwd fit for now

a9a2265

Sorta hack to get bkfit to play nice with I/O in BH

ce10ce3

small fixes for scripts

eaf6fac

dan131riley reviewed Jan 24, 2018

View reviewed changes

remove unnecessary BackwardFitFV()

8819f26

osschar merged commit 67185c2 into trackreco:devel Jan 30, 2018

kmcdermo mentioned this pull request Jan 30, 2018

Validation tasks for follow-up #125

Closed

kmcdermo deleted the canonize-prop-kpm branch July 18, 2018 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA estimation + validation, cleanups #122

PCA estimation + validation, cleanups #122

kmcdermo commented Jan 19, 2018 •

edited

Loading

kmcdermo commented Jan 19, 2018

kmcdermo commented Jan 23, 2018 •

edited

Loading

kmcdermo commented Jan 24, 2018 •

edited

Loading

dan131riley Jan 24, 2018

kmcdermo Jan 24, 2018

PCA estimation + validation, cleanups #122

PCA estimation + validation, cleanups #122

Conversation

kmcdermo commented Jan 19, 2018 • edited Loading

kmcdermo commented Jan 19, 2018

kmcdermo commented Jan 23, 2018 • edited Loading

kmcdermo commented Jan 24, 2018 • edited Loading

dan131riley Jan 24, 2018

Choose a reason for hiding this comment

kmcdermo Jan 24, 2018

Choose a reason for hiding this comment

kmcdermo commented Jan 19, 2018 •

edited

Loading

kmcdermo commented Jan 23, 2018 •

edited

Loading

kmcdermo commented Jan 24, 2018 •

edited

Loading