coffea.dataset_tools#

Functions#

preprocess(fileset[, step_size, ...])

Given a list of normalized file and object paths (defined in uproot), determine the steps for each file according to the supplied processing options.

split_fileset(fileset[, strategy, datasets, ...])

Split a fileset into partial filesets so that a partial result can still be obtained if one or more of them fail during processing.

hash_fileset(chunk)

Return a stable SHA-256 hash for a fileset chunk.

apply_to_dataset(data_manipulation, dataset)

Apply the supplied function or processor to the supplied dataset.

apply_to_fileset(data_manipulation, fileset)

Apply the supplied function or processor to the supplied fileset (set of datasets).

max_chunks(fileset[, maxchunks])

Modify the input fileset so that only the first "maxchunks" chunks of each dataset will be processed.

max_chunks_per_file(fileset[, maxchunks])

Modify the input fileset so that only the first "maxchunks" chunks of each file will be processed.

slice_chunks(fileset[, theslice, bydataset])

Modify the input fileset so that only the chunks of each file or each dataset specified by the input slice are processed.

filter_files(fileset[, thefilter])

Modify the input fileset so that only the files of each dataset that pass the filter remain.

max_files(fileset[, maxfiles])

Modify the input fileset so that only the first "maxfiles" files of each dataset will be processed.

slice_files(fileset[, theslice])

Modify the input fileset so that only the files of each dataset specified by the input slice are processed.

get_failed_steps_for_dataset(dataset, report)

Modify the input dataset to only contain the files and row-ranges for failed processing jobs as specified in the supplied report.

get_failed_steps_for_fileset(fileset, ...)

Modify the input fileset to only contain the files and row-ranges for failed processing jobs as specified in the supplied report.

Classes#

ROOTFileSpec(*, object_path[, steps, ...])

ParquetFileSpec(*[, object_path, steps, ...])

CoffeaROOTFileSpec(*, object_path, steps, ...)

CoffeaROOTFileSpecOptional(*, object_path[, ...])

CoffeaParquetFileSpec(*[, object_path, ...])

CoffeaParquetFileSpecOptional(*[, ...])

InputFiles([root])

PreprocessedFiles([root])

DatasetSpec(*, files[, metadata, format, ...])

DataGroupSpec([root])

ModelFactory()

Class Inheritance Diagram#

Inheritance diagram of coffea.dataset_tools.filespec.ROOTFileSpec, coffea.dataset_tools.filespec.ParquetFileSpec, coffea.dataset_tools.filespec.CoffeaROOTFileSpec, coffea.dataset_tools.filespec.CoffeaROOTFileSpecOptional, coffea.dataset_tools.filespec.CoffeaParquetFileSpec, coffea.dataset_tools.filespec.CoffeaParquetFileSpecOptional, coffea.dataset_tools.filespec.InputFiles, coffea.dataset_tools.filespec.PreprocessedFiles, coffea.dataset_tools.filespec.DatasetSpec, coffea.dataset_tools.filespec.DataGroupSpec, coffea.dataset_tools.filespec.ModelFactory