hash_fileset#

coffea.dataset_tools.hash_fileset(chunk)[source]#

Return a stable SHA-256 hash for a fileset chunk.

The hash considers dataset names, file paths and a fixed set of output-affecting dataset-level fields — treename, preload and metadata — in a canonical sorted form, so chunks that differ in any of those fields produce different hashes. Any other dataset-level keys (e.g. preprocessing bookkeeping such as compressed_form) are ignored by the hash on purpose, so they may evolve without invalidating caches.

Parameters:

chunk (dict) – A self-contained fileset chunk such as {dataset: {"files": {path: treename, ...}, "treename": ..., "preload": [...], "metadata": {...}}, ...}. List-format files values are accepted only when accompanied by a dataset-level "treename" field (use split_fileset() with treename=... to produce such chunks from a bare list fileset).

Returns:

out – Hex string uniquely identifying this chunk’s contents.

Return type:

str