apply_to_dataset#

coffea.dataset_tools.apply_to_dataset(data_manipulation: ProcessorABC | GenericHEPAnalysis, dataset: DatasetSpec | dict, schemaclass: BaseSchema = <class 'coffea.nanoevents.schemas.nanoaod.NanoAODSchema'>, metadata: dict[Hashable, Any] = {}, uproot_options: dict[str, Any] = {}) DaskOutputType | tuple[DaskOutputType, dask_awkward.Array][source]#

Apply the supplied function or processor to the supplied dataset.

Parameters:
  • data_manipulation (ProcessorABC or GenericHEPAnalysis) – The user analysis code to run on the input dataset

  • dataset (DatasetSpec | dict) – The data to be acted upon by the data manipulation passed in.

  • schemaclass (BaseSchema, default NanoAODSchema) – The nanoevents schema to interpret the input dataset with.

  • metadata (dict[Hashable, Any], default {}) – Metadata for the dataset that is accessible by the input analysis. Should also be dask-serializable.

  • uproot_options (dict[str, Any], default {}) – Options to pass to uproot. Pass at least {“allow_read_errors_with_report”: True} to turn on file access reports.

Returns:

  • out (DaskOutputType) – The output of the analysis workflow applied to the dataset

  • report (dask_awkward.Array, optional) – The file access report for running the analysis on the input dataset. Needs to be computed in simultaneously with the analysis to be accurate.