## Produce plots comparing distribution shapes with ROOT

Running CutLang provides ```histoOut-<adlname>.root``` files which contain histograms available for each analysis region.
This notebook provides code based on ```PyROOT``` to make plots comparing distributions between two files.

Note that the code only compares the shapes of the distributions (i.e.: integral of each histogram equals to 1).  

The code only requires the following input
  * Addresses of the two files to compare
  * A list of regions and histograms you would like to draw.

In [None]:
# Let's start with importing the needed modules
from ROOT import gStyle, TFile, TH1, TH1D, TH2D, TCanvas, TLegend, TColor, TLatex

# Now let's set some ROOT styling parameters:
# You do not need to know what they mean, but can directly use these settings

gStyle.SetOptStat(0)
gStyle.SetPalette(1)

gStyle.SetTextFont(42)

gStyle.SetTitleStyle(0000)
gStyle.SetTitleBorderSize(0)
gStyle.SetTitleFont(42)
gStyle.SetTitleFontSize(0.055)

gStyle.SetTitleFont(42, "xyz")
gStyle.SetTitleSize(0.5, "xyz")
gStyle.SetLabelFont(42, "xyz")
gStyle.SetLabelSize(0.45, "xyz")


In [None]:
# Let's open two histoOut files produced by CutLang, whose histograms we will compare. 
# These could, for example. be a signal file and a background file
fs = TFile("../../src/CMS-B2G-16-024_histoouts/histoOut-CMS-B2G-16-024_RunIIFall15MiniAODv2_TprimeTprime_M-1200_TuneCUETP8M1_13TeV-madgraph-pythia8_flat.root")
fb = TFile("../../src/CMS-B2G-16-024_histoouts/histoOut-CMS-B2G-16-024_RunIIFall15MiniAODv2_TT_TuneCUETP8M1_13TeV-powheg-pythia8_flat.root")

In [None]:
# Now let's draw some histograms. 
# We will compare distributions for different variables.
# You can try this with different histograms in different regions.
# Provide the region and histo names in the following list:
histoinfos = [
# ["<regionname>", "<histoname>"]
    ["boostedH1b", "hST"],
    ["boostedH2b", "hST"],
#    ["boostedHAK82b", "hmAK8jet2b"],
#    ["boostedHCRttjets", "hST"],
#    ["boostedHCRWjets", "hST"],
#    ["boostedW", "hdRlepWjet2"],
#    ["boostedW", "hnbjets"],
#    ["boostedW", "hnWjets"],
#    ["boostedWtau", "hWjetsm"],
#    ["boostedWm", "hWjetstau21"],
#    ["boostedW1", "hminmlj"],
#    ["boostedW2", "hminmlb"],
#    ["boostedW3", "hminmlb"],
#    ["boostedW4", "hminmlb"],
#    ["boostedW5", "hminmlj"],
#    ["boostedW6", "hminmlb"],
#    ["boostedW7", "hminmlb"],
#    ["boostedW8", "hminmlb"],
#    ["boostedWCRWjets0W", "hminmlj"],
#    ["boostedWCRWjetsW", "hminmlj"],
#    ["boostedWCRttjets1W", "hminmlb"],
#    ["boostedWCRttjets2W", "hminmlb"],    
]


In [None]:
# Make lists for convases and legends. This is just a fix for jupyter root!!!
canvases = []
texts = []
legends = []

# Loop over the histograms and prepare the plots:
for hinfo in histoinfos:
    # In which region would you like to draw? You can change the region name. 
    region = hinfo[0]
    # Which histogram would you like to plot?
    hname = hinfo[1]
    print(region+"/"+hname)
    # Get the histograms from the file:
    hsg = fs.Get(region+"/"+hname)
    hbg = fb.Get(region+"/"+hname)

    hsg.Integral()

    # Format the histograms: scaling, lines, colors, axes titles, etc..  
    # You do not need to learn the commands here unless you are really curious.
    # Otherwise just execute the cell.

    # Our purpose in this exercise is to compare the shapes of signal and background distributions.
    # To do this comparison best, the area integral under histograms being compared should be the same.
    # Therefore we scale the hisgograms so that the area integral under the histograms equals 1. 
    hsg.Scale(1./hsg.Integral())
    hbg.Scale(1./hbg.Integral())
    if hsg.GetMaximum() > hbg.GetMaximum(): 
        hbg.SetMaximum(hsg.GetMaximum()*1.1)
        
    # Histogram style settings:
    hsg.SetLineWidth(2)
    hbg.SetLineWidth(2)

    # Set the colors:
    # Color numbers can be retrived from https://root.cern.ch/doc/master/classTColor.html
    # (check for color wheel)
    hbg.SetFillColor(400-7) # kYellow - 7
    hsg.SetLineColor(600) # kBlue
    hbg.SetLineColor(400+2) # kYellow + 2

    # Titles, labels.
    # It is enough to set such variables ONLY FOR THE FIRST HISTOGRAM YOU WILL DRAW
    # i.e., the one you will call by .Draw().  The rest you will draw by .Draw("same") will only 
    # contribute with the historam curve.
    hbg.SetTitle("")
    hbg.GetXaxis().SetTitle(hsg.GetTitle())
    hbg.GetXaxis().SetTitleOffset(1.25)
    hbg.GetXaxis().SetTitleSize(0.05)
    hbg.GetXaxis().SetLabelSize(0.045)
    hbg.GetXaxis().SetNdivisions(8, 5, 0)
    hbg.GetYaxis().SetTitle("probability of events")
    hbg.GetYaxis().SetTitleOffset(1.4)
    hbg.GetYaxis().SetTitleSize(0.05)
    hbg.GetYaxis().SetLabelSize(0.045)
    
    # Write the region name on the histogram
    t = TLatex(0.60, 0.85, region)
    t.SetTextSize(0.042)
    t.SetTextFont(42)
    t.SetNDC()
    texts.append(t)

    # Make a plot legend
    # Change the entry names to reflect processes you are plotting!
    l = TLegend(0.55, 0.70, 0.88, 0.82)
    l.SetBorderSize(0)
    l.SetFillStyle(0000)
    l.AddEntry(hsg,"TT m=1200", "l")
    l.AddEntry(hbg,"tt+jets", "f")
    legends.append(l)
    
    # Now we make a canvas and draw our histograms
    c = TCanvas("c_"+region+"_"+hname, "c_"+region+"_"+hname, 620, 500)
    c.SetBottomMargin(0.15)
    c.SetLeftMargin(0.15)
    c.SetRightMargin(0.15)
    #c.SetLogy(1)
    hbg.Draw("hist")
    hsg.Draw("hist same")
    c.Draw()
    l.Draw("same")
    t.Draw("same")
    canvases.append(c)
    
# Draw the canvases:
for c in canvases:
    c.Draw()
