# HIP RDataFrame Demonstration 3a: Loading external files for filtering

In this demonstration we will go through:
1. How to initialize files external to the program so that the RDF functions can use them
2. How to apply a Golden JSON based cut on events

-- Made by Nico Toikka

## Initialization

Import the only library you'll ever need.

In [1]:
import ROOT

Welcome to JupyROOT 6.30/04


ROOT and RDFs can automatically use multithreading. You can set it up with the following command, and if no number is specified ROOT will default to all threads that are available to it, so be careful!

In [2]:
ROOT.EnableImplicitMT(4)

## Loading a JSON

Since RDF cannot see what happens on the Python side of things, we need to load the JSON that we use for filtering in C++. Let's do that using [nlohmanns JSON package for C++](https://github.com/nlohmann/json), which is included by default in the LCG stack we are using in SWAN.

In [3]:
JSON_initialization = """
#include <nlohmann/json.hpp>
#include <fstream>
#include <string>
    
nlohmann::json golden_json;

void init_json(std::string jsonFile) {
    std::cout << "Initializing JSON file" << std::endl;
    std::ifstream f(jsonFile);
    golden_json = nlohmann::json::parse(f);
}
"""

ROOT.gInterpreter.Declare(JSON_initialization)

True

With the function `init_json` we can now initialize the JSON file to the C++ side of things from Python. We'll be looking at the data taking era 2023D, so let's use the Golden JSON corresponding to it.

**Note** C++ functions declared with gInterpreter will be under the `ROOT` module in Python.

In [4]:
golden_json = "/eos/user/c/cmsdqm/www/CAF/certification/Collisions23/Cert_Collisions2023_eraD_369803_370790_Golden.json"

ROOT.init_json(golden_json)

Initializing JSON file


## Using the JSON in a function

Now we have the JSON file existing in the C++ side. Let's define a function that takes in the run and luminosityBlock variables and from those says if the event passes the Golden JSON.

In [5]:
golden_function = """
bool isGoodLumi(int run, int lumi) {    
    for (auto& lumiRange : golden_json[std::to_string(run)]) {
       if (lumi >= lumiRange[0] && lumi <= lumiRange[1]) {
           return true;
       }
    }
    return false;
}
"""

ROOT.gInterpreter.Declare(golden_function)

True

Great! Let's do a test with Python to see that the function is working.

In [6]:
print(ROOT.isGoodLumi(369927, 200), ROOT.isGoodLumi(369927, 500))

True False


## Using the JSON in an RDF

Everything done previously was just function definitions with extra steps. A bit annoing, yes, but currently necessary and likely the cleanest way to do things. There are also options for loading in C++ modules directly, or using Python functions with Numba (see [RDF documentation](https://root.cern/doc/master/classROOT_1_1RDataFrame.html#python)). Personally I have found Numba to have not many examples/support, and defining things in the notebook was best for illustration, although I would recommend a separate C++ file with the functions that you want to use in your RDF.

Anyhow, let's now create a dataframe and see how many events pass the golden JSON filter. This time we'll use a file from a ZeroBias dataset, which is stored in `eos`.

In [7]:
file = "/eos/cms/store/data/Run2023D/ZeroBias/NANOAOD/PromptReco-v2/000/370/749/00000/768e67bf-0a90-43b1-b6df-6e8904f3e008.root"
tree = "Events"
df = ROOT.RDataFrame(tree, file)

In [8]:
df_filtered = df.Filter("isGoodLumi(run, luminosityBlock)", "Golden JSON")
df_filtered.Report().Print()

Golden JSON: pass=319293     all=396102     -- eff=80.61 % cumulative eff=80.61 %




OK, so we can see the effect on the number of events. Let's look at the jet pt distribution to see how the filtering affects it. While we're at it, let's choose events that passed the ZeroBias trigger, just to be safe.

In [9]:
# Further filtering
df = df.Filter("HLT_ZeroBias", "ZB trigger")
df_filtered = df_filtered.Filter("HLT_ZeroBias", "ZB trigger")

# Histograms
jet_pt = df.Histo1D(("Jet_pt", "Jet p_{t};p_{t} (GeV);Count", 500, 15, 500), "Jet_pt")
jet_pt_filtered = df_filtered.Histo1D(("Jet_pt_filtered", "Filtered jet p_{t};p_{t} (GeV);Count", 500, 15, 500), "Jet_pt")

In [10]:
%jsroot on
canvas = ROOT.TCanvas("canv", "canv", 400, 400)
jet_pt.Draw()
jet_pt_filtered.Draw("same plc pmc")
canvas.BuildLegend()
canvas.Draw()

And out of curiosity let's do a report of the cuts, now including the ZeroBias trigger

In [11]:
df.Report().Print()

ZB trigger: pass=117002     all=396102     -- eff=29.54 % cumulative eff=29.54 %


In [12]:
df_filtered.Report().Print()

Golden JSON: pass=319293     all=396102     -- eff=80.61 % cumulative eff=80.61 %
ZB trigger: pass=94408      all=319293     -- eff=29.57 % cumulative eff=23.83 %
