NanoEventsFactory#

class coffea.nanoevents.NanoEventsFactory(schema, mapping, partition_key, mode='eager')[source]#

Bases: object

A factory class to build NanoEvents objects.

For most users, it is advisable to construct instances via methods like from_root so that the constructor args are properly set.

Attributes Summary

access_log

List of accessed branches, populated when columns are lazily loaded.

buffer_cache

The buffer cache used to store loaded buffers, if available.

file_handle

The file handle used to open the source file, if available.

Methods Summary

events()

Build events

from_parquet(file, *[, mode, entry_start, ...])

Quickly build NanoEvents from a parquet file

from_preloaded(array_source, *[, ...])

Quickly build NanoEvents from a pre-loaded array source

from_root(file, *[, mode, treepath, ...])

Quickly build NanoEvents from a root file

Attributes Documentation

access_log#

List of accessed branches, populated when columns are lazily loaded.

buffer_cache#

The buffer cache used to store loaded buffers, if available.

file_handle#

The file handle used to open the source file, if available.

Methods Documentation

events()[source]#

Build events

Returns:

Events materialised according to the configured backend. In "dask" mode a dask_awkward.Array is returned (optionally paired with a report). In "virtual" or "eager" mode an awkward.Array is returned.

Return type:

awkward.Array or dask_awkward.Array or tuple

classmethod from_parquet(file, *, mode='virtual', entry_start=None, entry_stop=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, parquet_options={}, storage_options=None, access_log=None)[source]#

Quickly build NanoEvents from a parquet file

Parameters:
  • file (str or pathlib.Path or pyarrow.NativeFile or io.IOBase) – The filename or already opened file using e.g. pyarrow.NativeFile().

  • mode ({"eager", "virtual", "dask"}, default "virtual") – Backend to use when interpreting parquet data.

  • entry_start (int or None, optional) – Starting entry (only used in eager or virtual mode). Defaults to 0.

  • entry_stop (int or None, optional) – Stopping entry (only used in eager or virtual mode). Defaults to end of dataset.

  • buffer_cache (dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.

  • schemaclass (BaseSchema) – A schema class deriving from BaseSchema and implementing the desired view of the file

  • metadata (dict, optional) – Arbitrary metadata to add to the base.NanoEvents object

  • parquet_options (dict, optional) – Any options to pass to pyarrow.parquet.ParquetFile

  • storage_options (dict, optional) – Options to pass to fsspec when opening the file. Only used when file is a string path.

  • access_log (list, optional) – Pass a list instance to record which branches were lazily accessed by this instance

Returns:

Factory configured from file that can materialise NanoEvents.

Return type:

NanoEventsFactory

classmethod from_preloaded(array_source, *, entry_start=None, entry_stop=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, access_log=None)[source]#

Quickly build NanoEvents from a pre-loaded array source

Parameters:
  • array_source (Mapping[str, awkward.Array]) – A mapping of names to awkward arrays, it must have a metadata attribute with uuid, num_rows, and path sub-items.

  • entry_start (int or None, optional) – Start index for slicing the array source. Defaults to 0.

  • entry_stop (int or None, optional) – Stop index for slicing the array source. Defaults to the full length.

  • buffer_cache (dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.

  • schemaclass (BaseSchema) – A schema class deriving from BaseSchema and implementing the desired view of the file

  • metadata (dict, optional) – Arbitrary metadata to add to the base.NanoEvents object

  • access_log (list, optional) – Pass a list instance to record which branches were lazily accessed by this instance

Returns:

Factory configured from array_source that can materialise NanoEvents.

Return type:

NanoEventsFactory

classmethod from_root(file, *, mode='virtual', treepath=uproot._util.unset, entry_start=None, entry_stop=None, steps_per_file=uproot._util.unset, preload=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, uproot_options={}, iteritems_options={}, access_log=None, use_ak_forth=True, known_base_form=None, decompression_executor=None, interpretation_executor=None)[source]#

Quickly build NanoEvents from a root file

Parameters:
  • file (a string or dict input to uproot.open() or uproot.dask() or a ``uproot.reading.ReadOnlyDirectory``) – The filename or dict of filenames including the treepath (as it would be passed directly to uproot.open() or uproot.dask()) already opened file using e.g. uproot.open().

  • mode – Nanoevents will use “eager”, “virtual”, or “dask” as a backend.

  • treepath (str, optional) – Name of the tree to read in the file. Used only if file is a uproot.reading.ReadOnlyDirectory or a string that does not contain tree information that uproot can parse on its own.

  • entry_start (int, optional (eager and virtual mode only)) – Start at this entry offset in the tree (default 0)

  • entry_stop (int, optional (eager and virtual mode only)) – Stop at this entry offset in the tree (default end of tree)

  • steps_per_file (int, optional) – Partition files into this many steps (previously “chunks”)

  • (None (preload) – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to tree.arrays as the filter_branch argument, or an iterable of branch name strings to preload.

  • Callable – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to tree.arrays as the filter_branch argument, or an iterable of branch name strings to preload.

  • Iterable[str]) (or) – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to tree.arrays as the filter_branch argument, or an iterable of branch name strings to preload.

  • buffer_cache (dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.

  • schemaclass (BaseSchema) – A schema class deriving from BaseSchema and implementing the desired view of the file

  • metadata (dict, optional) – Arbitrary metadata to add to the base.NanoEvents object

  • uproot_options (dict, optional) – Any options to pass to uproot.open or uproot.dask

  • iteritems_options (dict, optional (eager and virtual mode only)) – Any options to pass to tree.iteritems when iterating over the tree’s branches to extract the form.

  • access_log (list, optional) – Pass a list instance to record which branches were lazily accessed by this instance

  • use_ak_forth (bool, default True) – Toggle using awkward_forth to interpret branches in the ROOT file.

  • known_base_form (dict or None, optional) – If the base form of the input file is known ahead of time we can skip opening a single file and parsing metadata.

  • decompression_executor (Any, optional) – Executor with a submit method used for decompression tasks. See https://uproot.readthedocs.io/en/latest/uproot._dask.dask.html.

  • interpretation_executor (Any, optional) – Executor with a submit method used for interpretation tasks. See scikit-hep/uproot5.

Returns:

Factory configured from file that can materialise NanoEvents.

Return type:

NanoEventsFactory