-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovering short tracks continuing saga: adding overlapping hits + outlier rejection #223
Comments
Investigating on the short-track inefficiency, I have tracked built track candidates that are found and matched by CMSSW, and looked for those tracks in MkFit. Aim: identify potential For N(layers)=4:
For N(layers)=10:
Few observations:
|
Following the discussion in today's meeting, we have a new proposal for how we want to attack duplicate hits. We want to hear from @cerati and @makortel on this new proposal: New proposal:
In order to distinguish hits in different modules, we will need a change in the data format of the Hit class, adding a member that encodes the module number. This needs to happen for both the pixels and strips. Will require a refresh on the binary files. Some discussion on re-ordering stereo layers... can happen simultaneously or sequentially. One possible addendum to this proposal is to perform outlier rejection by storing the chi2 of each hit added to track or r-phi/r-z distance from the track, and do a cheap rejection after 1. and 3. In the meanwhile, we are making a few plots to check whether the overlapping pixels or strips is more important for efficiency. We will eventually need volunteers to tackle the segments of this proposal, as they are mostly orthogonal. |
I was thinking if we can figure out if the hit is on another module without the detailed layer info and the format update.
|
First of all I still think that it's worth exploring ways to increase the efficiency by avoiding adding spurious hits. For instance avoiding two consecutive '-1' hits or a sequence of alternate '>0' and '-1' hits. The proposed plan for overlapping hits sounds very ambitious, so I would like we figure out if all steps are needed and if we can stage the development. So here are my thoughts:
The final fit is complicated topic. CMSSW does many things in this step (outlier rejection based on chi2, outlier rejection based on pixel templates, evaluation of residuals for alignment, ...). I do not think we want to replace it. We may convince people that we can make a good enough estimation of the track parameters at the interaction point for most of the tracks, so that the slow CMSSW final fit becomes useful only for a subset of tracks (muons, bjets, alignment). In order to do the chi2-based outlier rejection CMSSW has to store the fwd and bwd track states at each hit, and I think this would kill us. We could come up with poor man approaches, like combining somehow the fwd and bwd chi2 (statisticians may not like it, but that's fine as long as we make it work...). |
@cerati : allow to me to add some clarification and some context to the discussion today
Sure, we don't necessarily need to do the update. We even don't even need to trigger a seed region rebuilding, if we see that we recover enough efficiency. The idea for doing the update was that it helps narrow the search for more duplicates in the next layers of the pixels.
So the pickup of strips in backwards fit is actually the original proposal in this issue. However, as @osschar was pointing out, we already have the select hit indices + chi2 testing ready to go in the forward propagation, and we would be effectively relaunching the track finding on the backward fit if we did the pickup for strips on the bwd fit, where there are many more strip layers than pixel layers (so would be quite expensive). As @mmasciov has been showing, there is plenty of duplicate hits in very long tracks (i.e. those that reach the outer layers), and removing them from CMSSW tracks lowers CMSSW's efficiency. So there is a need to add overlaps in the outer layers it would seem.
We mean that we can decide if we want to basically merge the secondary hits into the main hit array, and perform the prop+update with them. May not be needed, but can be an option (since we are proposing to NOT do the prop+update with secondary overlap hits in the forward propagation).
Yes, this is similar to what I originally proposed as well, basically a single bit that is written out in the binary file that determines an "inner" and "outer" overlap. @osschar thinks we may be able to use a short or some shortened int type to encode a module number (by removing bits that encode layer number). |
re: final fit+outliers @cerati My understanding from @makortel was that if we give CMSSW our tracks after the building, the fitting in CMSSW handles the outlier rejection for us. Are you saying this not actually the case because we are not passing the fwd+bwd hit chi2s? I guess a simple test is just to see for mkFit tracks if nHits before CMSSW fitting vs after CMSSW fitting is different. I think we are proposing a poor-man's outlier rejection with a final fit our side because we know how well fitting scales. |
I have not seen the evidence for the impact of strip duplicate hits on the efficiency (I know @mmasciov has been posting stuff on the skype chat but I was not able to figure out how it relates to this). The CMSSW final fit/outlier rejection happens regardless of what we do in mkFit. I am talking about ways to have the outlier rejection in a final fit within mkFit - my main point is that it is probably doable but it is not straightforward to do since our fitter stores significantly less information than the CMSSW one, so we cannot replicate what CMSSW does. I agree it would be nice that we take care also of the fit (and make it faster), but if a CMSSW final fit will stay for other reasons (alignment, proper handling of dead modules, etc) then we should not waste time working on our own version. |
One addition to the list is the pixel cluster position estimation based on the templates (in offline reco, HLT uses generic CPE).
Why should we specifically care about alignment?
I don't remember on the top of my head how the final fit deals with dead modules (i.e. whether dead modules are dealt with solely in track building or in final fit, or in both), so it should be checked. |
My point is that replacing the CMSSW final fit is something we should definitely consider but before investing time working on it we need to know which outputs we need to produce (and if we can/want to produce those outputs). These outputs are not just the track parameters at the beamline. Another example is propagation to calorimeter in PF, does it use the outermost state from the track final fit? By the way you are probably right that the dead modules are probably identified just in the building. |
As discussed extensively on the group chat, we are proposing a three step plan for our algorithm to improve efficiency of short-tracks (on top of later tunings of layer window settings, chi2, etc).
The proposal is the following:
For starters, we will rely on CMSSW to give us the final fit with outlier rejection, so we should focus on implementing 2. As @mmasciov pointed out, this means that the mtv-like-val in standalone validation will still be sub-optimal, but hopefully improved. We can always run MTV on CMSSW side with our tracks after the final fit in CMSSW to see how we do. However, the hope is that by adding some extra hits even without outlier rejection, we raised our shared hits fraction to help recover some efficiency for short tracks.
To implement 2 properly, we will perhaps need to extend the current max length of the hit array from 32 to something greater so as not to overwrite the last hit index in the array which defeats the whole purpose of appending the list (although now that we stopped appending -1s an nauseam, probably okay...).
We would then have to adapt the current backward fit in MkBuilder to have a window search in between the propagate and update step. This would require that the search is only over hits that are on the opposing overlapping section, so the hits would need an extra bit to say which side of the overlap they are on.
For 3., we can think on the technical implementation. We could still do this in MkBuilder, or pass the completed built tracks to MkFitter (although all the issues of fitting with different nhits / cand will need to be re-addressed). Perhaps it is better to just re-use the final candidates again from the backward propagation to do the final, final fit with outlier rejection.
In any case, in order to remain vectorized, can perform the propagation + update regardless of hit goodness, but choose to store the update based on an evaluation goodness of hit on each layer in outlier rejection (and if it is a hit to be rejected, replace it in the hit list with a -4 or something and simply store the previous updated parameters).
This is tied to Issues #195, #193, #196, and #71 (although more indirectly).
The text was updated successfully, but these errors were encountered: