Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA estimation + validation, cleanups #122

Merged
merged 11 commits into from
Jan 30, 2018

Conversation

kmcdermo
Copy link
Collaborator

@kmcdermo kmcdermo commented Jan 19, 2018

This PR is a mix of things, namely cleanups and PCA estimation and validation. For the cleanups, I followed up on issues raised in PR #119 (validation and removing the ifdef for the Phi-Q arrays), as well as some miscellaneous cleanup in the validation.

I added a new flag called "quality_val" that needs to be specified at run time if we want to run the printout-only validation. In the benchmarking, we actually run the printout validation which includes remapping hits (expensive) as well as track matching, but suppress the printouts anyways. So by disabling this section during the benchmarks, we can run them even faster now.

The main addition, however, is the inclusion of a PCA estimator, which we use to propagate the backward fit tracks to the PCA. Since this PR is 90% complete (see to-do list below), I do not have any fancy diagrams or presentations to show yet. The main idea is that we can project a straight line back in r-z (using theta) to compute the perpendicular distance between the line and a reference point (for now, the origin) to yield r-z coordinates of the PCA. Then, we propagate all tracks to their respective z-PCA using helixAtZ propagation.

For validation of this process, I dump the results of the PCA computation into the fitTracks_ collection, so as to be compared separately from the candidateTracks_. And since this was always motivated by comparing with CMSSW track parameters at the PCA, I use the track parameter matching for the fit tracks (although the default for candidate tracks is also the track param matching).

I have a number of problems I still want to fix (the biggest being that BkFitInputTracks() crashes on standard building when multithreaded) before this can be merged. However, I wanted to show off the validation by comparing built to fit track efficiencies:


Things to do and discuss:

  • drop sim track validation from benchmarking: I don't really check it any more, although we could use it as a diagnostic by running it over the high stats 10-muon sample UPDATE: use 10mu sample for root val
  • fix the standard building with the backwards fit: @osschar , I have traced it down to BkFitInputTracks() ... but only when multithreaded... DONE
  • turn off param b-field for fitting and see effect on validation (already saw that this helped in the last PR) disabled for now
  • move CMSSW validation for candidate tracks to hit matching (default is track param based): we have seen that the track param matching fails anyhow for low pt tracks as dphi goes to NaN. We also know (for now) the hits are the same between the candidate tracks and the fit tracks (could add hit rejection during backward fit at some point) DONE
  • tune windows for chi2 (mom. theta, 1/pt) and dphi for fit tracks, or move to 3-parameter chi2 (mom. theta, 1/pt, mom. phi)
  • use external reference point other than origin to define PCA. Have demonstrated internally with @cerati @slava77 @osschar that we can further improve track parameter matching by storing the point used by CMSSW per event to define the reference for the PCA. --> UPDATE: I made a few short slides about various PCA estimates and their uses: pca.pdf

	   * remove KNC from benchmarks
	   * move plotting code from main benchmark script to subscripts
	   * update and simplify web scripts
	   * add Config::quality_val flag for doing printout validation
    * Properly use make distclean
    * Adjust KNL and SNB ranges
    * Add a common SSH variable
@kmcdermo
Copy link
Collaborator Author

I should note that above, there is some improvement at low pT, although not as much as I would have liked... retrying with the param b-field turned off for the backwards fit again improves things...

eff vs pt:

As you can see, the matching of fit tracks with the param b-field are improved at low pT, and w/o the param b-field are improved significantly (10-20%).

Looking at vs. eta:

It is clear that b-field is hurting us for some reason, particularly at low pT near the transition region... although it helps in the very forward reasons as one might expect where the b-field can be different by as much as 10% compared to the central region.


** Aside: the HEAD of devel matches exactly with build track validation, as expected.

	   * Set hit based matching as default for CMSSW validation with built tracks
	   * Fit tracks use track parameter matching (2-param chi2 + dphi: to be optimized)
	   * Use ROOTVAL with 10mu sample
	   * Update scripts to reflect sample names
@kmcdermo
Copy link
Collaborator Author

kmcdermo commented Jan 23, 2018

I am done making commits for this PR for now. I will upload the full set of validation plots once done (of course, without KNL). There is quite a bit to be discussed, but some of it best tabled until the right pieces are in place.

Things to do:

  • Investigate why parameterized magnetic field seems to hurt more than help...

  • Decide which PCA estimation we wish to use. I uploaded a pdf in the PR description depicting what we could do and what we would need for input.

  • Optimize cuts for track param matching in bins of pt/eta for 2-param chi2 (1/pt, eta) + dphi. OR investigate possibility of 3-param chi2 (include phi). This optimization should be performed once we have decided which PCA estimation to use, as I do not want to do this optimization iteratively.

  • Potentially move BkFit back inside building. After discussing with Matevz, this is really the right thing to do, although there is no real harm as it is now, as we are first trying to validate this routine. This would mean copying out whatever is stored in matriplex format to the ev.candidateTracks_ vector after building, and then after the BkFit (while still inside the TBB tasks), copy out matriplex info again to ev.fitTracks_. This may require abstracting the copy I/O. We could also include the BkFit + I/O gymnastics in the benchmark timing as well.

@kmcdermo
Copy link
Collaborator Author

kmcdermo commented Jan 24, 2018

Full plots are here: https://kmcdermo.web.cern.ch/kmcdermo/pr122/full_benchmarks/

I will post an analysis of this some time tomorrow.

// printf(" %4d with q=%+d chi2=%7.3f pT=%7.3f eta=% 7.3f x=%.3f y=%.3f z=%.3f nHits=%2d label=%4d findable=%d\n",
// i, t.charge(), t.chi2(), t.pT(), t.momEta(), t.x(), t.y(), t.z(), t.nFoundHits(), t.label(), t.isFindable());
// }
}
}

void MkBuilder::BackwardFitFV()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this routine is necessary, MkBuilder::BackwardFit() should work for the FV case, all it should care about is that the EventOfCombCandidates is filled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be dropped, for sure. I tested without, and it returns the same results:
https://kmcdermo.web.cern.ch/kmcdermo/pr122/noFVbkFit/CMSSWVAL/

@osschar osschar merged commit 67185c2 into trackreco:devel Jan 30, 2018
@kmcdermo kmcdermo deleted the canonize-prop-kpm branch July 18, 2018 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants