triton_wrapper#
- class coffea.ml_tools.triton_wrapper(model_url: str, client_args: dict | None = None, batch_size=-1)[source]#
Bases:
nonserializable_attribute,numpy_call_wrapperWrapper for running triton inference.
The target of this class is such that all triton specific operations are wrapped and abstracted-away from the users. The user should then only need to handle awkward-level operations to mangle the arrays into the expected input format required by the the model of interest. This must be done by overriding the
prepare_awkwardmethod.Once an instance
wrapperof this class is created, it can be called on inputs likewrapper(*args), whereBeyond sys.argvare the inputs toprepare_awkward(see next paragraph).In order to actually use the class, the user must override the method
prepare_awkward. The input to this method is an arbitrary number of awkward arrays or dask awkward arrays (but never a mix of dask/non-dask array). The output is two objects: a tupleaand a dictionarybsuch that the underlyingtritonclientinstance calls likeclient(*a,**b). The contents of a and b should be numpy-compatible awkward-like arrays: if the inputs are non-dask awkward arrays, the return should also be non-dask awkward arrays that can be trivially converted to numpy arrays via a ak.to_numpy call; if the inputs are dask awkward arrays, the return should be still be dask awkward arrays that can be trivially converted via a to_awkward().to_numpy() call.- Parameters:
model_url (
str) – A string in the format of:triton+<protocol>://<address>/<model>/<version>client_args (
dict[str,str], optional) – Optional keyword arguments to pass to the underlyingInferenceServerClientobjects.batch_size (
int, default-1) – How the input arrays should be split up for analysis processing. Leave negative to have this automatically resolved.
Attributes Summary
Getting the batch size to be used for array splitting.
Function for adding default arguments to the client constructor kwargs.
Getting the protocol module based on the url protocol string.
Methods Summary
numpy_call(output_list, input_dict)run_infer(inputs, outputs[, attempt])Thin wrapper around tritonclient.infer to automatic retry with backoff+jitter on inference server failures
validate_numpy_input(output_list, input_dict)Check that tritonclient can return the expected input array dimensions and available output values.
Attributes Documentation
- batch_size#
Getting the batch size to be used for array splitting. If it is explicitly set by the users, use that; otherwise, extract from the model configuration hosted on the server.
- batch_size_fallback = 10#
- client_args#
Function for adding default arguments to the client constructor kwargs.
- http_client_concurrency = 12#
- max_retry_attempts = 5#
- pmod#
Getting the protocol module based on the url protocol string.
- retry_jitter_base_ms = 100#
Methods Documentation
- numpy_call(output_list: list[str], input_dict: dict[str, array]) dict[str, array][source]#
- Parameters:
output_list (
list[str]) – List of string corresponding to the name of the outputs of interest. These strings will be automatically translated into the requiredtritonclient.InferRequestedOutputobjects.input_dict (
dict[str,numpy.ndarray]) – Dictionary with the model’s input-names as the key and the appropriate numpy array as the dictionary value. This dictionary is automatically translated into a list oftritonclient.InferInputobjects.
- Returns:
The return will be the dictionary of numpy arrays that have the output_list arguments as keys.
- Return type:
dict[str,numpy.ndarray]
- run_infer(inputs, outputs, attempt=0)[source]#
Thin wrapper around tritonclient.infer to automatic retry with backoff+jitter on inference server failures
- validate_numpy_input(output_list: list[str], input_dict: dict[str, array]) None[source]#
Check that tritonclient can return the expected input array dimensions and available output values. Can be useful when ensuring that data is being properly mangled for Triton. This method is called just before passing to the Triton client when an inference request is made.
If no errors are raised, it is understood that the input is validated by this function.
- Parameters:
output_list (
list[str]) – List of string corresponding to the name of the outputs of interest. These strings will be automatically translated into the requiredtritonclient.InferRequestedOutputobjects. This is identical to the first argument the user passes in when calling thetriton_wrapperinstance.input_dict (
dict[str,numpy.ndarray]) – Dictionary with the model’s input-names as the key and the appropriate numpy array as the dictionary value. This dictionary is automatically translated into a list oftritonclient.InferInputobjects. This is identical to the second argument the user passes in when calling thetriton_wrapperinstance.