DatasetSpec#
- class coffea.dataset_tools.DatasetSpec(*, files: InputFiles | PreprocessedFiles, metadata: dict[Hashable, Any] = {}, format: str | None = None, compressed_form: str | None = None, did: str | None = None)[source]#
Bases:
BaseModelAttributes Summary
Identify DatasetSpec criteria to be pre-joined for typetracing (necessary) and column-joining (sufficient)
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].Compute the total number of entries across all files, if available.
Compute the total number of selected entries across all files (calculated from steps), if available.
Get the steps per dataset file, if available.
Methods Summary
filter_files([filter_name, filter_callable])Filter files by a regex pattern on the file names(filter_name) or callable applied to Filespecs (filter_callable).
limit_files(max_files)Limit the number of files.
limit_steps(max_steps[, per_file])Limit the steps.
preprocess_data(data)Set and/or validate the format if manually specified
Attributes Documentation
- form#
- joinable#
Identify DatasetSpec criteria to be pre-joined for typetracing (necessary) and column-joining (sufficient)
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- num_entries#
Compute the total number of entries across all files, if available.
- num_selected_entries#
Compute the total number of selected entries across all files (calculated from steps), if available.
- steps#
Get the steps per dataset file, if available.
Methods Documentation
- filter_files(filter_name: str | None = None, filter_callable: Callable[[CoffeaROOTFileSpec | CoffeaParquetFileSpec | CoffeaROOTFileSpecOptional | CoffeaParquetFileSpecOptional], bool] | None = None) Self[source]#
Filter files by a regex pattern on the file names(filter_name) or callable applied to Filespecs (filter_callable).