NanoEventsFactory#
- class coffea.nanoevents.NanoEventsFactory(schema, mapping, partition_key, mode='eager')[source]#
Bases:
objectA factory class to build NanoEvents objects.
For most users, it is advisable to construct instances via methods like
from_rootso that the constructor args are properly set.Attributes Summary
List of accessed branches, populated when columns are lazily loaded.
The buffer cache used to store loaded buffers, if available.
The file handle used to open the source file, if available.
Methods Summary
events()Build events
from_parquet(file, *[, mode, entry_start, ...])Quickly build NanoEvents from a parquet file
from_preloaded(array_source, *[, ...])Quickly build NanoEvents from a pre-loaded array source
from_root(file, *[, mode, treepath, ...])Quickly build NanoEvents from a root file
Attributes Documentation
- access_log#
List of accessed branches, populated when columns are lazily loaded.
- buffer_cache#
The buffer cache used to store loaded buffers, if available.
- file_handle#
The file handle used to open the source file, if available.
Methods Documentation
- events()[source]#
Build events
- Returns:
Events materialised according to the configured backend. In
"dask"mode adask_awkward.Arrayis returned (optionally paired with a report). In"virtual"or"eager"mode anawkward.Arrayis returned.- Return type:
- classmethod from_parquet(file, *, mode='virtual', entry_start=None, entry_stop=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, parquet_options={}, storage_options=None, access_log=None)[source]#
Quickly build NanoEvents from a parquet file
- Parameters:
file (
strorpathlib.Pathorpyarrow.NativeFileorio.IOBase) – The filename or already opened file using e.g.pyarrow.NativeFile().mode (
{"eager", "virtual", "dask"}, default"virtual") – Backend to use when interpreting parquet data.entry_start (
intorNone, optional) – Starting entry (only used in eager or virtual mode). Defaults to0.entry_stop (
intorNone, optional) – Stopping entry (only used in eager or virtual mode). Defaults to end of dataset.buffer_cache (
dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.schemaclass (
BaseSchema) – A schema class deriving fromBaseSchemaand implementing the desired view of the filemetadata (
dict, optional) – Arbitrary metadata to add to thebase.NanoEventsobjectparquet_options (
dict, optional) – Any options to pass topyarrow.parquet.ParquetFilestorage_options (
dict, optional) – Options to pass tofsspecwhen opening the file. Only used whenfileis a string path.access_log (
list, optional) – Pass a list instance to record which branches were lazily accessed by this instance
- Returns:
Factory configured from
filethat can materialise NanoEvents.- Return type:
- classmethod from_preloaded(array_source, *, entry_start=None, entry_stop=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, access_log=None)[source]#
Quickly build NanoEvents from a pre-loaded array source
- Parameters:
array_source (
Mapping[str,awkward.Array]) – A mapping of names to awkward arrays, it must have a metadata attribute with uuid, num_rows, and path sub-items.entry_start (
intorNone, optional) – Start index for slicing the array source. Defaults to0.entry_stop (
intorNone, optional) – Stop index for slicing the array source. Defaults to the full length.buffer_cache (
dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.schemaclass (
BaseSchema) – A schema class deriving fromBaseSchemaand implementing the desired view of the filemetadata (
dict, optional) – Arbitrary metadata to add to thebase.NanoEventsobjectaccess_log (
list, optional) – Pass a list instance to record which branches were lazily accessed by this instance
- Returns:
Factory configured from
array_sourcethat can materialise NanoEvents.- Return type:
- classmethod from_root(file, *, mode='virtual', treepath=uproot._util.unset, entry_start=None, entry_stop=None, steps_per_file=uproot._util.unset, preload=None, buffer_cache=None, schemaclass=<class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata=None, uproot_options={}, iteritems_options={}, access_log=None, use_ak_forth=True, known_base_form=None, decompression_executor=None, interpretation_executor=None)[source]#
Quickly build NanoEvents from a root file
- Parameters:
file (
a stringordict inputtouproot.open()oruproot.dask()ora ``uproot.reading.ReadOnlyDirectory``) – The filename or dict of filenames including the treepath (as it would be passed directly touproot.open()oruproot.dask()) already opened file using e.g.uproot.open().mode – Nanoevents will use “eager”, “virtual”, or “dask” as a backend.
treepath (
str, optional) – Name of the tree to read in the file. Used only iffileis auproot.reading.ReadOnlyDirectoryor a string that does not contain tree information that uproot can parse on its own.entry_start (
int,optional (eagerandvirtual mode only)) – Start at this entry offset in the tree (default 0)entry_stop (
int,optional (eagerandvirtual mode only)) – Stop at this entry offset in the tree (default end of tree)steps_per_file (
int, optional) – Partition files into this many steps (previously “chunks”)(None (preload) – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to
tree.arraysas thefilter_branchargument, or an iterable of branch name strings to preload.Callable – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to
tree.arraysas thefilter_branchargument, or an iterable of branch name strings to preload.Iterable[str]) (or) – Specifies which branches/columns to preload in bulk. Only works in eager and virtual mode. Can be a callable passed to
tree.arraysas thefilter_branchargument, or an iterable of branch name strings to preload.buffer_cache (
dict, optional) – A dict-like interface to a cache object. Only bare numpy arrays will be placed in this cache, using globally-unique keys.schemaclass (
BaseSchema) – A schema class deriving fromBaseSchemaand implementing the desired view of the filemetadata (
dict, optional) – Arbitrary metadata to add to thebase.NanoEventsobjectuproot_options (
dict, optional) – Any options to pass touproot.openoruproot.daskiteritems_options (
dict,optional (eagerandvirtual mode only)) – Any options to pass totree.iteritemswhen iterating over the tree’s branches to extract the form.access_log (
list, optional) – Pass a list instance to record which branches were lazily accessed by this instanceuse_ak_forth (
bool, defaultTrue) – Toggle using awkward_forth to interpret branches in the ROOT file.known_base_form (
dictorNone, optional) – If the base form of the input file is known ahead of time we can skip opening a single file and parsing metadata.decompression_executor (
Any, optional) – Executor with asubmitmethod used for decompression tasks. See https://uproot.readthedocs.io/en/latest/uproot._dask.dask.html.interpretation_executor (
Any, optional) – Executor with asubmitmethod used for interpretation tasks. See scikit-hep/uproot5.
- Returns:
Factory configured from
filethat can materialise NanoEvents.- Return type: