# Numba - Python JIT

Sometimes defining complex algorithms with columar operations on awkward arrays can be cumbersome. 

Using a for-loop style programming can be much easier for some recursive or iterative algorithms.
The problem is that accessing awkward arrays event by event, can be very slow as the operation cannot be "lowered" in the C++ backend of awkward.

The **numba** library can be very useful in this scenario: https://numba.pydata.org

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. 


In [1]:
from numba import njit

import random

@njit
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

def monte_carlo_pi_nonumba(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [8]:
%%timeit
monte_carlo_pi_nonumba(100_000)

27 ms ± 286 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [9]:
%%timeit
monte_carlo_pi(100_000)

1.18 ms ± 8.49 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Example with awkward arrays

Awkward array can be used in numba functions easily.  https://awkward-array.org/doc/main/user-guide/how-to-use-in-numba-features.html

The ArrayBuilder interface is available to build awkward array on the fly in the numba compiled function

In [2]:
import uproot
import awkward as ak
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
import yaml
import numpy as np 
with open("datasets.yaml") as f:
    datasets = yaml.safe_load(f)
    
events = uproot.open(f"{datasets['DYJetsToLL']['files'][0]}:Events", num_workers=4)
df = events.arrays(entry_stop=20000)


In [34]:
def sum_all_jet_pt(jets_pt):
    out = np.zeros(len(jets_pt))
    for iev in range(len(jets_pt)):
        for pt in jets_pt[iev]:
            out[iev]+= pt
    return out

@njit
def sum_all_jet_pt_numba(jets_pt):
    out = np.zeros(len(jets_pt))
    for iev in range(len(jets_pt)):
        for pt in jets_pt[iev]:
            out[iev]+= pt
    return out

In [35]:
%%timeit 
sum_all_jet_pt(df.Jet_pt)

1.72 s ± 7.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [36]:
%%timeit 
sum_all_jet_pt_numba(df.Jet_pt)

1.31 ms ± 9.86 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [37]:
%%timeit
ak.sum(df.Jet_pt, axis=1)

996 μs ± 1.95 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


It is not useful when the operation is easy enought to be performed directly with awkward native functions. Those are already written in C++ and optimized

In [38]:
@njit
def sum_jets_with_electron_or_muon(jet_pt, ele_pt, mu_pt, builder):
    for iev in range(len(jet_pt)):
        builder.begin_list()
        if len(jet_pt[iev])>0:
            if len(ele_pt[iev])>0:
                builder.append(jet_pt[iev][0]+ele_pt[iev][0])
            elif len(mu_pt[iev])>0:
                builder.append(jet_pt[iev][0]+mu_pt[iev][0])
        builder.end_list()
    return builder

In [52]:
out = sum_jets_with_electron_or_muon(df.Jet_pt, df.Electron_pt, df.Muon_pt, ak.ArrayBuilder())
out

<ArrayBuilder [[], [85.6], [], ... [115], [55.4]] type='20000 * var * float64'>

In [41]:
%%timeit
sum_jets_with_electron_or_muon(df.Jet_pt, df.Electron_pt, df.Muon_pt, ak.ArrayBuilder())

4.52 ms ± 37.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Also in this case there is a simple columnar expression for the function above, but it is slightly slower.

In [55]:
%%timeit
has_ele = ak.num(df.Electron_pt)>0
ak.firsts(df.Jet_pt) +  ak.firsts(ak.where(has_ele, df.Electron_pt, df.Muon_pt))

7.32 ms ± 140 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Combinations
Writing algoritms on combinations of objects can become cumbersome with columanr expression (event thought there are ak.cartesian and ak.combinations). Often the analysis is clearer if expressed with event loops. 

Numba can be quite useful in this context.

# Profiling