DaskExecutor#

class coffea.processor.DaskExecutor(status: bool = True, unit: str = 'items', desc: str = 'Processing', compression: int | None = 1, function_name: str | None = None, client: dask.distributed.Client | None = None, treereduction: int = 20, priority: int = 0, retries: int = 3, heavy_input: bytes | None = None, use_dataframes: bool = False, worker_affinity: bool = False)[source]#

Bases: ExecutorBase

Execute using dask futures

Parameters:
  • items (list) – List of input arguments

  • function (callable) – A function to be called on each input, which returns an accumulator instance

  • accumulator (Accumulatable) – An accumulator to collect the output of the function

  • client (distributed.client.Client) – A dask distributed client instance

  • treereduction (int, optional) – Tree reduction factor for output accumulators (default: 20)

  • status (bool, optional) – If true (default), enable progress bar

  • compression (int, optional) – Compress accumulator outputs in flight with LZ4, at level specified (default 1) Set to None for no compression.

  • priority (int, optional) – Task priority, default 0

  • retries (int, optional) – Number of retries for failed tasks (default: 3)

  • heavy_input (serializable, optional) – Any value placed here will be broadcast to workers and joined to input items in a tuple (item, heavy_input) that is passed to function.

  • function_name (str, optional) – Name of the function being passed

  • use_dataframes (bool, optional) –

    Retrieve output as a distributed Dask DataFrame (default: False). The outputs of individual tasks must be Pandas DataFrames.

    Note

    If heavy_input is set, function is assumed to be pure.

Returns:

out(accumulator, 0) after the tree reduction completes. When use_dataframes=True, the accumulator is {"out": dd.DataFrame}.

Return type:

tuple(Accumulatable, int | BaseException)

Attributes Summary

Methods Summary

__call__(items, function, accumulator)

Call self as a function.

Attributes Documentation

client: dask.distributed.Client | None = None#
compression: int | None = 1#
desc: str = 'Processing'#
function_name: str | None = None#
heavy_input: bytes | None = None#
priority: int = 0#
retries: int = 3#
status: bool = True#
treereduction: int = 20#
unit: str = 'items'#
use_dataframes: bool = False#
worker_affinity: bool = False#

Methods Documentation

__call__(items: Iterable, function: Callable, accumulator: Addable | MutableSet | MutableMapping)[source]#

Call self as a function.