triton_wrapper#

class coffea.ml_tools.triton_wrapper(model_url: str, client_args: dict | None = None, batch_size=-1)[source]#

Bases: nonserializable_attribute, numpy_call_wrapper

Wrapper for running triton inference.

The target of this class is such that all triton specific operations are wrapped and abstracted-away from the users. The user should then only need to handle awkward-level operations to mangle the arrays into the expected input format required by the the model of interest. This must be done by overriding the prepare_awkward method.

Once an instance wrapper of this class is created, it can be called on inputs like wrapper(*args), where Beyond sys.argv are the inputs to prepare_awkward (see next paragraph).

In order to actually use the class, the user must override the method prepare_awkward. The input to this method is an arbitrary number of awkward arrays or dask awkward arrays (but never a mix of dask/non-dask array). The output is two objects: a tuple a and a dictionary b such that the underlying tritonclient instance calls like client(*a,**b). The contents of a and b should be numpy-compatible awkward-like arrays: if the inputs are non-dask awkward arrays, the return should also be non-dask awkward arrays that can be trivially converted to numpy arrays via a ak.to_numpy call; if the inputs are dask awkward arrays, the return should be still be dask awkward arrays that can be trivially converted via a to_awkward().to_numpy() call.

Parameters:
  • model_url (str) – A string in the format of: triton+<protocol>://<address>/<model>/<version>

  • client_args (dict[str, str], optional) – Optional keyword arguments to pass to the underlying InferenceServerClient objects.

  • batch_size (int, default -1) – How the input arrays should be split up for analysis processing. Leave negative to have this automatically resolved.

Attributes Summary

batch_size

Getting the batch size to be used for array splitting.

batch_size_fallback

client_args

Function for adding default arguments to the client constructor kwargs.

http_client_concurrency

max_retry_attempts

pmod

Getting the protocol module based on the url protocol string.

retry_jitter_base_ms

Methods Summary

numpy_call(output_list, input_dict)

run_infer(inputs, outputs[, attempt])

Thin wrapper around tritonclient.infer to automatic retry with backoff+jitter on inference server failures

validate_numpy_input(output_list, input_dict)

Check that tritonclient can return the expected input array dimensions and available output values.

Attributes Documentation

batch_size#

Getting the batch size to be used for array splitting. If it is explicitly set by the users, use that; otherwise, extract from the model configuration hosted on the server.

batch_size_fallback = 10#
client_args#

Function for adding default arguments to the client constructor kwargs.

http_client_concurrency = 12#
max_retry_attempts = 5#
pmod#

Getting the protocol module based on the url protocol string.

retry_jitter_base_ms = 100#

Methods Documentation

numpy_call(output_list: list[str], input_dict: dict[str, array]) dict[str, array][source]#
Parameters:
  • output_list (list[str]) – List of string corresponding to the name of the outputs of interest. These strings will be automatically translated into the required tritonclient.InferRequestedOutput objects.

  • input_dict (dict[str, numpy.ndarray]) – Dictionary with the model’s input-names as the key and the appropriate numpy array as the dictionary value. This dictionary is automatically translated into a list of tritonclient.InferInput objects.

Returns:

The return will be the dictionary of numpy arrays that have the output_list arguments as keys.

Return type:

dict[str, numpy.ndarray]

run_infer(inputs, outputs, attempt=0)[source]#

Thin wrapper around tritonclient.infer to automatic retry with backoff+jitter on inference server failures

validate_numpy_input(output_list: list[str], input_dict: dict[str, array]) None[source]#

Check that tritonclient can return the expected input array dimensions and available output values. Can be useful when ensuring that data is being properly mangled for Triton. This method is called just before passing to the Triton client when an inference request is made.

If no errors are raised, it is understood that the input is validated by this function.

Parameters:
  • output_list (list[str]) – List of string corresponding to the name of the outputs of interest. These strings will be automatically translated into the required tritonclient.InferRequestedOutput objects. This is identical to the first argument the user passes in when calling the triton_wrapper instance.

  • input_dict (dict[str, numpy.ndarray]) – Dictionary with the model’s input-names as the key and the appropriate numpy array as the dictionary value. This dictionary is automatically translated into a list of tritonclient.InferInput objects. This is identical to the second argument the user passes in when calling the triton_wrapper instance.