# Introduction to Awkward Jagged Arrays

In events produced by collisions in particle accelerators, not all events are the same. One event may have, for example, two muons, another five, and another none.
The event information is stored in tuples in a root file. Standard libraries such as NumPy do not handle this “uneven” or jagged data structure well.

### 1. The problem with NumPy

In [1]:
import awkward as ak
import numpy as np

In [2]:
# generates a ValueError
np.array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

The code fails because NumPy requires all arrays to be rectangular (or “square”). That is, each row must have the same number of columns. Since we have lists of different lengths here ([], [5.5], etc.), NumPy cannot create an array and throws a ValueError.

### 2. The solution: awkward-array

This is where the awkward-array library stands out. It was designed specifically to handle data with nested and variable-length structures, which is exactly what we need.

In [3]:
ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])

It works perfectly! awkward creates an ak.Array that preserves the original data structure.

### 3. Basic Awkward Array Manipulation

The ak.Array behave very similarly to NumPy arrays, but with "superpowers" for jagged data.

In [4]:
array = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
array.tolist()

[[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]]

### 4. Slicing (Data selection)

In [6]:
# Access the third event (index 2)
#array[2]
# <Array [3.3, 4.4] type='2 * float64'>

# Access the second particle (index 1) of the last event (index -1)
#array[-1, 1]
# 7.7

# Access the first particle (index 0) of all events from the third onwards
#array[2:, 0]
# <Array [3.3, 5.5, 6.6] type='3 * float64'>

# Access particles from index 1 onwards, for events from the third onwards.
array[2:, 1:]
# <Array [[4.4], [], [7.7, 8.8, 9.9]] type='3 * var * float64'>

We can also select events using a list of Booleans (Boolean mask) or a list of indexes.

In [7]:
# Seleccionar el primer, tercer y quinto evento
array[[True, False, True, False, True]]
# <Array [[0, 1.1, 2.2], [3.3, 4.4], [6.6, 7.7, ...]] type='3 * var * float64'>

# Seleccionar eventos por índice (se pueden repetir)
#array[[2, 3, 3, 1]]
# <Array [[3.3, 4.4], [5.5], [5.5], []] type='4 * var * float64'>

### 4. Advanced Functions and Cuts

This is where you see the true power of awkward for data analysis.

The ak.num, is one of the most useful functions: it counts the number of elements per event. For example, the number of muons in each collision.

In [20]:
ak.num(array)

We can use ak.num to create cuts at the event level. For example, select only events that have at least one particle.

In [8]:
ak.num(array) > 0

### 5. Combining cuts

Now we can combine these ideas to make complex selections.

In [22]:
array[ak.num(array) > 0, 0]

In [23]:
array[ak.num(array) > 1, 1]

### 5. Particle-level cuts

We can also create masks to filter particles within each event.

In [9]:
cut = array * 10 % 2 == 0

array[cut]

This array, cut, is not just an array of booleans. It’s a jagged array of booleans. All of its nested lists fit into array’s nested lists, so it can deeply select numbers, rather than selecting lists.

### 6. Application: Read ROOT File Data with uproot

As we saw in the previous tutorial, uproot is a library that allows us to read and write ROOT files natively in Python, without needing to have ROOT installed. It integrates perfectly with awkward-array.

In [10]:
import uproot

In [11]:
file = uproot.open("root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root")

file.classnames()

{'Events;75': 'TTree', 'Events;74': 'TTree'}

In [12]:
tree = file["Events"]

### 7. Reading data and applying cuts

Now let's use what we learned from awkward to analyze real data.

In [13]:
muon_pt = tree["Muon_pt"].array(entry_stop=10)
muon_pt

Particle-level cut: Select individual muons with p_T  > 20 GeV.

In [14]:
particle_cut = muon_pt > 20

muon_pt[particle_cut]

Cut at event level: Select events that have at least one muon with p_T20 GeV.

In [15]:
event_cut = ak.any(muon_pt > 20, axis=1)

muon_pt[event_cut]

Important note: The order of the cuts matters. It is generally more efficient to apply event cuts first to reduce the number of events to be processed. The following code shows a common but subtle sequence:

In [41]:
event_cut = ak.max(muon_pt > 20, axis=1)

muon_pt[event_cut]

The final result is correct: muons with p_T20 from events that had at least one muon with p_T > 20.

In [42]:
cleaned = muon_pt[particle_cut]

final_result = cleaned[event_cut]

final_result.tolist()

[[32.911224365234375, 23.72175407409668],
 [57.6067008972168, 53.04507827758789],
 [23.906352996826172]]

### 8. Combinatorics in Awkward Array

A very common task is to combine particles within an event, for example, to calculate the invariant mass of all muon pairs. awkward has very powerful combinatorial functions.

ak.cartesian, creates the Cartesian product of two lists, event by event.

In [16]:
numbers = ak.Array([[1, 2, 3], [], [5, 7], [11]])
letters = ak.Array([["a", "b"], ["c"], ["d"], ["e", "f"]])

pairs = ak.cartesian((numbers, letters))
pairs

The result is an array of “records” (similar to a dict). We can access each part by its name (‘0’, “1”).

In [48]:
pairs["0"]

In [49]:
pairs["1"]

There’s also ak.unzip, which extracts every field into a separate array (opposite of ak.zip).

In [59]:
lefts, rights = ak.unzip(pairs)
lefts
rights

ak.combinations,Creates combinations of elements from the same array. This is perfect for making pairs of particles.

In [17]:
pairs = ak.combinations(numbers, 2)
pairs

In [18]:
lefts, rights = ak.unzip(pairs)

lefts * rights

This last step is the basis for calculating complex observables. For example, instead of lefts * rights, you could have a function that calculates the invariant mass from the quadrivectors of lefts and rights.