hash_fileset#
- coffea.dataset_tools.hash_fileset(chunk)[source]#
Return a stable SHA-256 hash for a fileset chunk.
The hash considers dataset names, file paths and a fixed set of output-affecting dataset-level fields —
treename,preloadandmetadata— in a canonical sorted form, so chunks that differ in any of those fields produce different hashes. Any other dataset-level keys (e.g. preprocessing bookkeeping such ascompressed_form) are ignored by the hash on purpose, so they may evolve without invalidating caches.- Parameters:
chunk (
dict) – A self-contained fileset chunk such as{dataset: {"files": {path: treename, ...}, "treename": ..., "preload": [...], "metadata": {...}}, ...}. List-formatfilesvalues are accepted only when accompanied by a dataset-level"treename"field (usesplit_fileset()withtreename=...to produce such chunks from a bare list fileset).- Returns:
out – Hex string uniquely identifying this chunk’s contents.
- Return type: