How to work with NanoEvents#
NanoEvents turns columnar ROOT or Parquet files into Pythonic objects with Awkward Array behaviors. This guide walks through exploring branches, creating selections, and reducing data inside a coffea processor.
Inspect collections interactively#
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
events = NanoEventsFactory.from_root(
{"nano_dy.root": "Events"},
schemaclass=NanoAODSchema,
entry_stop=10_000,
).events()
print(events.fields) # top-level collections
print(events.Muon.fields) # attributes on the Muon collection
print(events.Muon.pt.type) # awkward type
Use this pattern in notebooks to discover the structure of a sample before writing a processor.
Columnar selections#
Selections stay lazy until you materialize them. Compose masks with vectorized operations.
import awkward as ak
tight_muons = events.Muon[
(events.Muon.tightId)
& (events.Muon.pt > 25)
& (abs(events.Muon.eta) < 2.4)
]
os_pairs = (
(tight_muons[:, :, None].charge + tight_muons[:, None, :].charge) == 0
)
tight_muons retains the Awkward structure, so per-event lengths remain variable.
Use vector behaviors#
NanoAODSchema associates Lorentz-vector behaviors.
lead, trail = ak.unzip(ak.combinations(tight_muons, 2))
dimuon = lead + trail
mass = dimuon.mass # automatically computed invariant mass
pt = dimuon.pt
Behaviors follow you into the processor environment, enabling the same concise syntax.
Access metadata inside processors#
events.metadata carries dataset-level information from the fileset.
from coffea import processor
class ExampleProcessor(processor.ProcessorABC):
...
def process(self, events):
year = events.metadata["year"]
is_mc = events.metadata.get("is_mc", False)
Enroll cross sections, era flags, and other attributes when preparing the fileset.
Convert to pandas or numpy#
Use Awkward utilities when you require flat arrays.
import awkward as ak
flat_mass = ak.to_numpy(ak.flatten(mass, axis=None))
df = ak.to_dataframe({"mass": mass, "pt": pt})
ak.to_dataframe preserves jagged offsets by creating a multi-index; flatten the data before conversion if you prefer a simple index.
Keep processing columnar#
Avoid explicit Python loops over events or particles. Coffea’s executors thrive on vectorized operations because they minimize interpreter overhead and play well with batching. If you must fall back to a loop, wrap the hot section in a numba.njit-decorated function—see the Awkward Array numba guide—so it compiles to machine code while preserving chunk-level parallelism.
Tips & tricks#
Call
ak.num(collection, axis=1)to see how many objects each event contains.If a branch is missing, confirm that it is interpretable by NanoEvents; warnings of the schema often identify incompatible forms.
Apply selections with boolean masks before combinations to avoid forming unnecessary pairings.