apply_to_fileset#
- coffea.dataset_tools.apply_to_fileset(data_manipulation: ProcessorABC | GenericHEPAnalysis, fileset: DataGroupSpec | dict, schemaclass: BaseSchema = <class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, uproot_options: dict[str, Any] = {}) dict[str, DaskOutputType] | tuple[dict[str, DaskOutputType], dask_awkward.Array][source]#
Apply the supplied function or processor to the supplied fileset (set of datasets).
- Parameters:
data_manipulation (
ProcessorABCorGenericHEPAnalysis) – The user analysis code to run on the input datasetfileset (
DataGroupSpec | dict) – The data to be acted upon by the data manipulation passed in. Metadata within the fileset should be dask-serializable.schemaclass (
BaseSchema, defaultNanoAODSchema) – The nanoevents schema to interpret the input dataset with.uproot_options (
dict[str,Any], default{}) – Options to pass to uproot. Pass at least {“allow_read_errors_with_report”: True} to turn on file access reports.
- Returns:
out (
dict[str,DaskOutputType]) – The output of the analysis workflow applied to the datasets, keyed by dataset name.report (
dask_awkward.Array, optional) – The file access report for running the analysis on the input dataset. Needs to be computed in simultaneously with the analysis to be accurate.