DatasetSpec#

class coffea.dataset_tools.DatasetSpec(*, files: InputFiles | PreprocessedFiles, metadata: dict[Hashable, Any] = {}, format: str | None = None, compressed_form: str | None = None, did: str | None = None)[source]#

Bases: BaseModel

Attributes Summary

form

joinable

Identify DatasetSpec criteria to be pre-joined for typetracing (necessary) and column-joining (sufficient)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_entries

Compute the total number of entries across all files, if available.

num_selected_entries

Compute the total number of selected entries across all files (calculated from steps), if available.

steps

Get the steps per dataset file, if available.

Methods Summary

filter_files([filter_name, filter_callable])

Filter files by a regex pattern on the file names(filter_name) or callable applied to Filespecs (filter_callable).

limit_files(max_files)

Limit the number of files.

limit_steps(max_steps[, per_file])

Limit the steps.

post_validate()

preprocess_data(data)

set_check_format()

Set and/or validate the format if manually specified

Attributes Documentation

form#
joinable#

Identify DatasetSpec criteria to be pre-joined for typetracing (necessary) and column-joining (sufficient)

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_entries#

Compute the total number of entries across all files, if available.

num_selected_entries#

Compute the total number of selected entries across all files (calculated from steps), if available.

steps#

Get the steps per dataset file, if available.

Methods Documentation

filter_files(filter_name: str | None = None, filter_callable: Callable[[CoffeaROOTFileSpec | CoffeaParquetFileSpec | CoffeaROOTFileSpecOptional | CoffeaParquetFileSpecOptional], bool] | None = None) Self[source]#

Filter files by a regex pattern on the file names(filter_name) or callable applied to Filespecs (filter_callable).

limit_files(max_files: int | slice | None) Self[source]#

Limit the number of files.

limit_steps(max_steps: int | slice, per_file: bool = False) Self[source]#

Limit the steps. pass per_file=True to limit steps per file, otherwise limits across all files cumulatively

post_validate() Self[source]#
classmethod preprocess_data(data: Any) Any[source]#
set_check_format() bool[source]#

Set and/or validate the format if manually specified