# MySchema

This is a rendered copy of myschema.ipynb. You can optionally run it interactively on [binder at this link](https://mybinder.org/v2/gh/yihui-lai/coffea/5a506e83975baa75fdf3ac92de720aef79aa1446)

The interpretation of the TTree data is configurable via schema objects. Schema teachs the event processor how to group variables into collections, so operations can be run over entire collection at once:

In this demo, we will create our own schema and implement our own [behaviors](https://awkward-array.readthedocs.io/en/latest/ak.behavior.html). 

First, Let's look at the root file with `NanoAODSchema` and see what's inside of it. The events object can be instantiated as follows:


In [1]:
from coffea.nanoevents import NanoEventsFactory, BaseSchema, NanoAODSchema

fname = "https://raw.githubusercontent.com/CoffeaTeam/coffea/master/tests/samples/nano_dy.root"
#fname = "data/nano_dy.root"
events = NanoEventsFactory.from_root(
           fname, 
           schemaclass=NanoAODSchema
         ).events()
print(events.Electron.fields)

['deltaEtaSC', 'dr03EcalRecHitSumEt', 'dr03HcalDepth1TowerSumEt', 'dr03TkSumPt', 'dr03TkSumPtHEEP', 'dxy', 'dxyErr', 'dz', 'dzErr', 'eCorr', 'eInvMinusPInv', 'energyErr', 'eta', 'hoe', 'ip3d', 'jetPtRelv2', 'jetRelIso', 'mass', 'miniPFRelIso_all', 'miniPFRelIso_chg', 'mvaFall17V1Iso', 'mvaFall17V1noIso', 'mvaFall17V2Iso', 'mvaFall17V2noIso', 'pfRelIso03_all', 'pfRelIso03_chg', 'phi', 'pt', 'r9', 'sieie', 'sip3d', 'mvaTTH', 'charge', 'cutBased', 'cutBased_Fall17_V1', 'jetIdx', 'pdgId', 'photonIdx', 'tightCharge', 'vidNestedWPBitmap', 'vidNestedWPBitmapHEEP', 'convVeto', 'cutBased_HEEP', 'isPFcand', 'lostHits', 'mvaFall17V1Iso_WP80', 'mvaFall17V1Iso_WP90', 'mvaFall17V1Iso_WPL', 'mvaFall17V1noIso_WP80', 'mvaFall17V1noIso_WP90', 'mvaFall17V1noIso_WPL', 'mvaFall17V2Iso_WP80', 'mvaFall17V2Iso_WP90', 'mvaFall17V2Iso_WPL', 'mvaFall17V2noIso_WP80', 'mvaFall17V2noIso_WP90', 'mvaFall17V2noIso_WPL', 'seedGain', 'genPartIdx', 'genPartFlav', 'cleanmask', 'jetIdxG', 'photonIdxG', 'genPartIdxG']


Now we can copy the skeleton of a schema class:

In [2]:
class MySchema(BaseSchema):
    """
    my schema
    """
    def __init__(self, base_form):
        super().__init__(base_form)
        self._form["contents"] = self._build_collections(self._form["contents"])

    def _build_collections(self, branch_forms):
        output = {}
        return output

    @property
    def behavior(self):
        """
        Behaviors necessary to implement this schema
        """
        behavior = {}
        return behavior

As you can see, this schema is so simple and it is not useful currently. If we call the `events` again with our own schema, we'll find it contains nothing.

In [3]:
events = NanoEventsFactory.from_root(
           fname, 
           schemaclass=MySchema
         ).events()
events.fields

[]

## Create collections

In schema, the `branch_forms` is a python dictionary used to define branch grouping. 

By default (`BaseSchema`), it will be completely flat:
```python
branch_form={
  "particle_pt":{},
  "particle_eta":{},
  "particle_phi":{},
  "particle_mass":{},
  ...
}
```

What we want is to put some branches into the same collection:

```python
new_branch_form={
  "particle": schemas.zip_forms({
      "pt" : branch_form["particle_pt"],
      "eta" : branch_form["particle_eta"],
      "phi" : branch_form["particle_phi"],
      "mass" : branch_form["particle_mass"],
  })
}
```
So when we want to call `particle_pt`, we actually do `particle.pt`.

All of this is to be implemented in the `Schema._build_collections` method. 

For example, let's add the `Electron` collection to our schema. To do this we also need to import `zip_forms`.

In [4]:
#from coffea.nanoevents.schemas import zip_forms
from coffea.nanoevents.schemas.base import zip_forms #depends on the version of coffea that you are using
class MySchema(BaseSchema):
    """
    my schema
    """
    def __init__(self, base_form):
        super().__init__(base_form)
        self._form["contents"] = self._build_collections(self._form["contents"])

    def _build_collections(self, branch_forms):
        output = {}
        output["Electron"] = zip_forms(
            {
                "pt" : branch_forms["Electron_pt"],
                "eta" : branch_forms ["Electron_eta"] , 
                "phi": branch_forms["Electron_phi"],
                "mass": branch_forms["Electron_mass"],
                #"xx": branch_forms["Electron_xx"],
            },
            "Electron",
        )
        return output

    @property
    def behavior(self):
        """
        Behaviors necessary to implement this schema
        """
        behavior = {}
        return behavior

Now we successfully created a schema with one collection `Electron`. It will be able to recognize branchs with name `Electron_pt, Electron_eta, Electron_phi, Electron_mass`.
Try to call the `events` again.

In [5]:
events = NanoEventsFactory.from_root(
           fname, 
           schemaclass=MySchema
         ).events()
print(events.fields)
print(events.Electron.fields)

['Electron']
['pt', 'eta', 'phi', 'mass']


We can use the mask and do selection on the whole collection at once now:

In [6]:
mask = (events.Electron.pt>3) & (events.Electron.pt<60)
good_elec = events.Electron[mask]
print(good_elec.pt)
print(good_elec.eta)

[[], [29.6], [51.7], [10.7, 8.6], [], [9.91, ... [], [15.6], [], [7.68], [], []]
[[], [1.83], [-0.904], [-2.19, 1.65], [], ... [], [-0.0595], [], [0.381], [], []]


However, if you require some branchs that your root file doesn't contain, errors will be returned. 
For example, uncomment the following line in `MySchema`:
```python
"xx": branch_forms["Electron_xx"],
```
Run the above code again, you will see:
```bash
KeyError: 'Electron_xx'
```

## Create behavior

Aside from collections, we can also add `behavior` to collections. This means additional awkward arrays are generated on-the-fly via predefined algorithm.

A bunch of common physics behaviors are already provided in coffea, and you can find them in [methods](https://github.com/CoffeaTeam/coffea/tree/a95401cad91e88ceac47a4c693068bc4cbc7d338/coffea/nanoevents/methods).

To write our own coffea behavior, first we need to define the `behavior`. 
In the following code, we definded `MyBehavior`. It only has one function `plus1()`, which returns the `particle.pt +1` when you call `particle.plus1`.

we also need to add the [`record_name`](https://github.com/CoffeaTeam/coffea/blob/a95401cad91e88ceac47a4c693068bc4cbc7d338/coffea/nanoevents/schemas/base.py#L24) to the collection in the `schema._build_collections` to tell the collection which `behavior` it should use.




In [7]:
import awkward 
#import awkward1 as awkward #depend on the version
mybehavior={}
@awkward.mixin_class(mybehavior)
class MyBehavior:
    """
    A test
    """
    @property
    def plus1(self):
        """
        pt, eta, phi, mass
        exp(pt)
        """
        return self.pt+1 

class MySchema(BaseSchema):
    """
    my schema
    """
    def __init__(self, base_form):
        super().__init__(base_form)
        self._form["contents"] = self._build_collections(self._form["contents"])

    def _build_collections(self, branch_forms):
        output = {}
        output["Electron"] = zip_forms(
            {
                "pt" : branch_forms["Electron_pt"],
                "eta" : branch_forms ["Electron_eta"] , 
                "phi": branch_forms["Electron_phi"],
                "mass": branch_forms["Electron_mass"],
                #"xx": branch_forms["Electron_xx"],
            },
            "Electron",
            "MyBehavior",
        )
        return output

    @property
    def behavior(self):
        """
        Behaviors necessary to implement this schema
        """
        behavior = {}
        behavior.update(mybehavior)
        return behavior

Now try our self-defined behavior:

In [8]:
events = NanoEventsFactory.from_root(
           fname, 
           schemaclass=MySchema
         ).events()
print(events.Electron.pt)
print(events.Electron.plus1)

[[], [29.6], [60.1, 51.7], [10.7, 8.6], [], ... [], [15.6], [], [7.68], [], []]
[[], [30.6], [61.1, 52.7], [11.7, 9.6], [], ... [], [16.6], [], [8.68], [], []]
