vhbb#561 #564

veelken · 2016-11-07T18:25:57Z

Dear all,

we would like to propose 3 modifications to vhbb.py and vhbbobj.py for the ttH, H->tautau analysis. We believe the modifications will not introduce any problem for anyone, except for an increase in filesize for some of the samples. Please let us know what you think.

Cheers,

Karl and Christian
for ttH, H->tautau

vhbb.py

In the VHbbAnalyzer config we would like to set the flag passall=True to disable the numJets >= 2 cut
The motivation for this modification is that in the ttH multilepton and ttH, H->tautau analyses we measure the electron charge misidentification rate in Z->ee events. Because the electron charge misidentification rate is very small, we need the full event statistics.
We would like to add
JetAna.lepSelCut = lambda lep : (abs(lep.pdgId()) == 11 and lep.relIso03 < 0.4) or (abs(lep.pdgId()) == 13 and lep.relIso04 < 0.4)
so that jets don't get cleaned with respect to leptons that pass only the miniIso, but not the standard isolation cuts. In case the jet collection is cleaned with respect to leptons passing the miniIso, about 1% of b-jets get cleaned, which we would like to recover.
vhbbobj.py

We would like to replace the line
NTupleVariable("eleooEmooP", lambda x : abs(1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy()) if abs(x.pdgId())==11 and x.ecalEnergy()>0.0 else 9e9 , help="Electron 1/E - 1/P"),
by
NTupleVariable("eleooEmooP", lambda x : (1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy()) if abs(x.pdgId())==11 and x.ecalEnergy()>0.0 else 9e9 , help="Electron 1/E - 1/P"),
i.e. remove the abs function. In the ttH multilepton and ttH, H->tautau analyses events with negative 1/E - 1/P values get cut and the presence of the abs in the computation of eleooEmooP means that we cannot do that, causing a problem for us in terms of synchronization with other groups. In our opinion, it is safe to remove the abs function, as it can always be applied later on analysis level.

JetAna.lepSelCut = lambda lep : (abs(lep.pdgId()) == 11 and lep.relIso03 < 0.4) or (abs(lep.pdgId()) == 13 and lep.relIso04 < 0.4) for jet cleanining, i.e. don't clean jet collection with respect to leptons that pass only mini-isolation and not "standard" isolation

"eleooEmooP", lambda x : abs(1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy()) by "eleooEmooP", lambda x : 1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy() i.e. keep sign of (E-P)/E for electron ID variable

…ts are kept in Ntuples (needed for measurement of electron charge misidentification rate and of jet->lepton fake-rate in ttH, H->tautau analysis)

- resolved merge conflicts - added triggers for ttH, H->tautau analysis for full 2016 dataset Conflicts: VHbbAnalysis/Heppy/python/TriggerTable.py VHbbAnalysis/Heppy/python/TriggerTableData.py

veelken · 2016-12-21T11:24:11Z

Hi Andrea,
I merged the trigger changes from Michele with mine. Please merge this PR now.
Thank you,
Christian

arizzi · 2016-12-21T11:36:17Z

this PR makes passall=true that is not ok. If there is a specific class of events you want to save we have to let them pass, we cannot passall=true for space reason especially when running on fully hadronic stuff

veelken · 2016-12-21T12:17:44Z

Hi Andrea,

the effect of the passall=true flag is that events with less than 2 jets no longer get cut. The nJets >= 2 cut is very loose for fully hadronic events. I expect that the nJets >= 2 cut mainly removes Z->ll and W->lnu events. Unfortunately, we do need an inclusive sample of Z->ee events for the purpose of estimating backgrounds, arising from electron charge misidentification, in the ttH, H->tautau analysis. I would prefer that we keep the event processing simple and not run different configs (with and without the nJet >= 2 cut on different samples). If disk space is a problem, we can store the VHbb Ntuples in Tallinn if you like (we have enough disk space).

arizzi · 2016-12-21T12:53:23Z

you only need Zee? then why having passall=true? can't we just whitelist Vtype=1 ?

veelken · 2016-12-21T13:05:48Z

Hi Andrea, The “problem” is that we need to compare Z->ee data (single and double electron datasets) will the sum of SM MC (DYJets, but also WJets, TTbar, diboson). Maybe it is easier to discuss this on Skype… samples with tt+jets, ttH and H->bb will pass the nJets >= 2 cut anyway, so setting passall=true will not increase the size of those samples, isn’t it ? Cheers, Christian

…

On Dec 21, 2016, at 2:53 PM, arizzi ***@***.***> wrote: you only need Zee? then why having passall=true? can't we just whitelist Vtype=1 ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#564 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEwCTpZXVX1YUJ5e7beFqTUPKVBX8_DJks5rKSFDgaJpZM4KrjGC>.

degrutto · 2016-12-21T13:14:25Z

HI Andrea, Christian,

sorry to chime in,
to have passall=true is actually a recurrent question/desire that I have heard by many people using the vhbb ntuples (and many never said on github)

May I ask you @veelken if you know for example for QCD bkgs how much would be the addition of events on tape?

I think this is the only sample that we are afraid of exploding, @arizzi true? Maybe also the data (MET dataset? BtagCVS?

 Michele

arizzi · 2016-12-21T13:42:44Z

michele, asking with no motivation is not going to go anywhere. What's the reason for passall=true from others?

jpata · 2016-12-21T13:47:31Z

ciao, I tend to agree with @arizzi that we have to be conservative about space for the following reasons:

at T2_CH_CSCS where I monitor the space, vhbb ntuples are among the largest individual user datasets, in the scale of 10s of TB (subset of the full vhbb datasets)
Inflating the file sizes with events used only rarely directly affects analysis downstream, where the mostly-useless events have to be filtered every time, with costs in IO, CPU, analysis job reliability.

Possibly for the special cases one can consider making a separate crab run, but we have to keep throwing events at every possible stage we can.

degrutto · 2016-12-21T13:51:34Z

the most common comment is to make life easier for signal and bkg cut flow/efficiency studies
(so I guess is more relevant for signal samples)

btw it would be interesting to quantify for QCD

veelken · 2016-12-21T14:02:21Z

Hi Michele, I haven’t checked the effect on QCD multijet MC. I would expect that most QCD events actually pass nJets >= 2 anyway, as the pT and eta cuts on the jets are rather loose (pT > 25 GeV && abs(eta) < 4.7). Cheers, Christian P.S. If you want me to compare the size of the VHbb Ntuples with and without the nJets >= 2 cut for one QCD MC sample, I am happy to do it. Just let me know on which sample I should run on

…

On Dec 21, 2016, at 3:51 PM, michele de gruttola ***@***.***> wrote: the most common comment is to make life easier for signal and bkg cut flow/efficiency studies (so I guess is more relevant for signal samples) btw it would be interesting to quantify for QCD — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#564 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEwCTuXcwZofWC2WUmpUqYgiCC0xhAipks5rKS7ngaJpZM4KrjGC>.

degrutto · 2016-12-21T14:52:37Z

Hi,
actually it is very easy to do just looking at the sample we have for V24
so

QCDHT200-300: root -l root://stormgf1.pi.infn.it:1094///store/user/arizzi/VHBBHeppyV24/QCD_HT200to300_TuneCUETP8M1_13TeV-madgraphMLM-pythia8/VHBB_HEPPY_V24_QCD_HT200to300_TuneCUETP8M1_13TeV-madgraphMLM-Py8__spr16MAv2-puspr16_80r2as_2016_MAv2_v0-v1/160909_064004/0000/tree_4.root
tree->GetEntries()/Count->Integral()
(const double)8.96610860519146069e-01

but for the lower bin we consider
root -l root://stormgf1.pi.infn.it:1094///store/user/arizzi/VHBBHeppyV24/QCD_HT100to200_TuneCUETP8M1_13TeV-madgraphMLM-pythia8/VHBB_HEPPY_V24_QCD_HT100to200_TuneCUETP8M1_13TeV-madgraphMLM-Py8__spr16MAv2-puspr16_80r2as_2016_MAv2_v0-v1/160909_063817/0000/tree_10.root
root [1] tree->GetEntries()/Count->Integral()
(const double)8.72890038750593900e-02

so this one will be problematic

veelken · 2016-12-21T16:09:11Z

Hi Michele,

thank you for the numbers. I didn't know it is that easy to get them!
How shall we proceed now ?
Andrea, would it be ok with you to merge PR #561 and then we submit the crab jobs for the QCD MC samples with passall set to false ?

Cheers,

Christian

arizzi · 2016-12-21T19:19:17Z

no, we are not going to do it.
The DY sample (i.e. a big one) reduction is 0.13.
As joosep explained we do not waste resources because people doesn't like to get the ratios out of the histos. We cannot pay a factor 5-10x on the ntuple sizes. We currently distribute 2-3 copies of the ntuples so having space in a given T2 for one version of the ntuple is not the point (we need several site, plus we need one storing the whole history of VHbb ntuples for reproducibility, reanalysis for combination etc etc..)
Let me add that our event size is already too big and that considering many variables that we compute are jet based I see no point in storing events without any jet.

PS: QCD events are not passing for other reasons (not the 2 jets requirements) and the passall will let those go through too.

veelken · 2016-12-22T08:49:40Z

Hi,

I have changed passall=False so that PR #561 can be merged.
Would it be an option that we build two versions of vhbbHeppy for the ReReco data and MC , one with passall=True and one with passall=False, so that a few VHbb Ntuples could centrally be produced with passall set to true ?
The alternative is of course that people working on ttH, H->tautau organize the production of samples with passall=True by themselves.
What do you think ?

Cheers,

Christian

arizzi · 2016-12-22T20:42:06Z

Can you perhaps clarify the details of the selection you would apply on the passall=true events? Which vtype? Any ll mass cut? Ciao Andrea Il 22 dic 2016 09:50, "Christian Veelken" <notifications@github.com> ha scritto: Hi, I have changed passall=False so that PR #561 <#561> can be merged. Would it be an option that we build two versions of vhbbHeppy for the ReReco data and MC , one with passall=True and one with passall=False, so that a few VHbb Ntuples could centrally be produced with passall set to true ? The alternative is of course that people working on ttH, H->tautau organize the production of samples with passall=True by themselves. What do you think ? Cheers, Christian — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#564 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEyiluRP-6YU4-3gCrJQ0uGl8QP9q0Myks5rKjmkgaJpZM4KrjGC> .

veelken · 2017-01-05T11:58:36Z

Hi Andrea,

we don't apply a cut on vtype in the ttH, H->tautau analysis. We do apply a cut 60 < mll < 120 GeV in some of our control regions/auxiliary measurements, but not in all. The best option in my opinion would still be to set passall=true based on the sample name.

I noticed that PR #561 is not merged yet. Can you please merge it now ?
(passall is set to false by default now)
What is the status/plan/timescale for the VHbb Ntuple production for the ReReco data and MC ?

Cheers,

Christian

arizzi · 2017-01-06T15:45:11Z

what does "mll" means for VTypes where there are not two leptons selected?!?

veelken · 2017-01-06T15:51:25Z

Hi Andrea,

we compute mll by looping over the selLeptons branch, apply some lepton selection criteria and then add the lepton four-vectors for pairs of leptons that pass the lepton selection criteria.

As I mentioned before, I think it is best not to use vtype and mll, but set passall=true based on the sample name.

Cheers,

Christian

arizzi · 2017-01-06T16:06:09Z

well, "it is better" is a relative concept, for sure it is not better for who has to babysit tens of thousands of jobs and add the complication of different settings.
I'm not sure why you do not use vtype to classify Zee events of a control region. I mean VHbb ntuples are based on the vtype to setup cuts and fill variables, so just asking "remove any selection you do because we do not like the vtype" is not helping here. At some point we can decide to have different production campaign if different analysis have different needs. The ttH bb guys are already running their own campaign with additional MEM stuff, so it could be better to prepare a ttHtt.py and ttHtt-data.py config that you run for ttHtt with passall=true and we clean up vhbb.py from what we do not need in H->bb related analysis. We should understand what is the cost of different choices (cpu,diskspace,people time)

veelken · 2017-01-06T16:18:27Z

Hi Andrea,

sorry for not being more clear about it: Zee is only one of the control regions we need for ttH, H->tautau. We need other control regions to measure tight/loose lepton ratios and these control regions use events with single leptons (these control regions are dominated by QCD; we don't need QCD MC for these measurements though, only data).
The selection of the control regions to measure tight/loose lepton ratios is work in progress and I would very much prefer to avoid hardcoding the cuts at Ntuple production time - at least for this round of the VHbb Ntuple production.

Cheers,

Christian

arizzi · 2017-01-06T17:32:02Z

this doesn't help. On data the passall=false is even more important because that's where we get most of the reduction.
For VH channels the ntuple production always assumes that lepton selection has been already defined/optimized in dedicated sample production (e.g. with passall=true) or in previous studies. This is needed because we have to avoid events sharing between analyses in order to keep them stat-independent.

arizzi · 2017-01-10T15:45:33Z

VHbbAnalysis/Heppy/test/vhbb.py

@@ -263,6 +263,7 @@
 from PhysicsTools.Heppy.analyzers.objects.JetAnalyzer import JetAnalyzer
 JetAna = JetAnalyzer.defaultConfig
 JetAna.calculateSeparateCorrections = True # CV: needed for ttH prompt lepton MVA
+JetAna.lepSelCut = lambda lep : (abs(lep.pdgId()) == 11 and lep.relIso03 < 0.4) or (abs(lep.pdgId()) == 13 and lep.relIso04 < 0.4)


what is this meant for? we already have a selection for the selectedLeptons used here.
The only difference seem to be you do not OR with miniIsoltion, is that the purpose? remove the miniIsolation or?

veelken · 2017-01-10T16:17:55Z

Hi Andrea,

yes, the motivation for this change is to not clean the jets wrt leptons that pass the miniIsolation, but fail the standard isolation. As we studied with Lorenzo, the effect of cleaning the jets wrt leptons passing miniIsolation or standard isolation is small, on the level of 1%. The main motivation for restoring the "old" jet cleaning behavior is to avoid differences in synchronization with other groups.

Christian Veelken added 5 commits September 6, 2016 16:54

Merged vhbbHeppy80X from repository vhbb

af6609b

Merge https://github.com/vhbb/cmssw into from-CMSSW_8_0_19

fb407db

add cut

66b34dc

JetAna.lepSelCut = lambda lep : (abs(lep.pdgId()) == 11 and lep.relIso03 < 0.4) or (abs(lep.pdgId()) == 13 and lep.relIso04 < 0.4) for jet cleanining, i.e. don't clean jet collection with respect to leptons that pass only mini-isolation and not "standard" isolation

replace

2be6612

"eleooEmooP", lambda x : abs(1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy()) by "eleooEmooP", lambda x : 1.0/x.ecalEnergy() - x.eSuperClusterOverP()/x.ecalEnergy() i.e. keep sign of (E-P)/E for electron ID variable

set passall flag to True in VHbbAnalyzer, so that events with <= 2 je…

8de3ed3

…ts are kept in Ntuples (needed for measurement of electron charge misidentification rate and of jet->lepton fake-rate in ttH, H->tautau analysis)

veelken mentioned this pull request Nov 7, 2016

Proposal for modifying vhbb.py and vhbbobj.pj #561

Closed

Christian Veelken added 2 commits November 28, 2016 13:46

added HLT_Ele25_eta2p1_WPTight_Gsf_v* trigger path

bf6691e

Merge https://github.com/vhbb/cmssw into from-CMSSW_8_0_19

6481e0c

- resolved merge conflicts - added triggers for ttH, H->tautau analysis for full 2016 dataset Conflicts: VHbbAnalysis/Heppy/python/TriggerTable.py VHbbAnalysis/Heppy/python/TriggerTableData.py

set passall=false

bbfcf59

arizzi reviewed Jan 10, 2017

View reviewed changes

arizzi merged commit c731df8 into vhbb:vhbbHeppy80X Jan 10, 2017

arizzi added this to the V25 milestone Jan 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vhbb#561 #564

vhbb#561 #564

veelken commented Nov 7, 2016 •

edited by arizzi

Loading

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 21, 2016 via email

degrutto commented Dec 21, 2016

arizzi commented Dec 21, 2016

jpata commented Dec 21, 2016 •

edited

Loading

degrutto commented Dec 21, 2016

veelken commented Dec 21, 2016 via email

degrutto commented Dec 21, 2016

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 22, 2016

arizzi commented Dec 22, 2016 via email

veelken commented Jan 5, 2017

arizzi commented Jan 6, 2017

veelken commented Jan 6, 2017

arizzi commented Jan 6, 2017

veelken commented Jan 6, 2017

arizzi commented Jan 6, 2017

arizzi Jan 10, 2017

veelken commented Jan 10, 2017

vhbb#561 #564

vhbb#561 #564

Conversation

veelken commented Nov 7, 2016 • edited by arizzi Loading

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 21, 2016 via email

degrutto commented Dec 21, 2016

arizzi commented Dec 21, 2016

jpata commented Dec 21, 2016 • edited Loading

degrutto commented Dec 21, 2016

veelken commented Dec 21, 2016 via email

degrutto commented Dec 21, 2016

veelken commented Dec 21, 2016

arizzi commented Dec 21, 2016

veelken commented Dec 22, 2016

arizzi commented Dec 22, 2016 via email

veelken commented Jan 5, 2017

arizzi commented Jan 6, 2017

veelken commented Jan 6, 2017

arizzi commented Jan 6, 2017

veelken commented Jan 6, 2017

arizzi commented Jan 6, 2017

arizzi Jan 10, 2017

Choose a reason for hiding this comment

veelken commented Jan 10, 2017

veelken commented Nov 7, 2016 •

edited by arizzi

Loading

jpata commented Dec 21, 2016 •

edited

Loading