The Xentara ONNX Engine v2.0
User Manual
JSON Format for ONNX Engine Execution Providers

A JSON object describing a Execution Providers has following syntax:

"providers": [
{
"name": "CUDA",
"options": {
"device_id": "0",
"do_copy_in_default_stream": "false",
"cudnn_conv_use_max_workspace": "1"
}
},
{
"name": "TensorRT",
"options": {
"device_id": "0"
}
}
]

Execution Providers

The Onnx Engine plug-in supports plenty execution providers to accelerate the computing process.

Available execution provider list:

More supported execution providers and options can be found in Onnx website.

Execution Provider Options

All the options and the option values must be defined as string values. The option values might describe other types as boolean or integer but still must be written as string value.

For example the option value of "device_id" for CUDA, the type is integer but must be written as string. If the value of the id is 0 must be written as string "0".

CUDA options

Bellow can be found the list of the options for CUDA:

Key Options type Value Options
device_id int Default: 0
user_compute_stream
do_copy_in_default_stream bool Default: True, True/False
use_ep_level_unified_stream bool Default: False, True/False
gpu_mem_limit int Default: max
arena_extend_strategy int Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1
cudnn_conv_algo_search int Default: EXHAUSTIVE = 0 ,HEURISTIC = 1 ,DEFAULT = 2
cudnn_conv_use_max_workspace int Default: 1, 0 = false, nonzero = true
cudnn_conv1d_pad_to_nc1d int Default: 0, 0 = false, nonzero = true
enable_cuda_graph int Default: 0, 0 = false, nonzero = true
enable_skip_layer_norm_strict_mode int Default: 0, 0 = false, nonzero = true
use_tf32 int Default: 1, 0 = false, nonzero = true
gpu_external_[alloc|free|empty_cache] int Default: 0, 0 = false, nonzero = true
prefer_nhwc int Default: 0, 0 = false, nonzero = true

TensorRT options

Bellow can be found the list of the options for TensorRT:

Key Options type Value Options
device_id int Default: 0
user_compute_stream string Set custom compute stream for GPU operations.
trt_engine_cache_enable bool Default: 0 = false, nonzero = true
trt_engine_cache_path string Set path to store cached TensorRT engines.
trt_engine_cache_prefix string Set prefix for cached engine files.
trt_engine_hw_compatible bool Maximize engine compatibility across Ampere+ GPUs, True/False
trt_max_workspace_size int Default: 1073741824. (1GB) maximum workspace size for TensorRT.
trt_fp16_enable bool Enable TensorRT FP16 precision. True/False
trt_int8_enable bool Enable TensorRT INT8 precision. True/False
trt_int8_calibration_table_name string Enable TensorRT INT8 calibration table name.
trt_int8_use_native_calibration_table bool Use native TensorRT generated calibration table, True/False
trt_build_heuristics_enable bool Build engine using heuristics to reduce build time, True/False
trt_sparsity_enable bool Control if sparsity can be used by TRT, True/False
trt_dla_enable bool Default False, Enable DLA (Deep Learning Accelerator), True/False
trt_dla_core int Default: 0, Specify DLA core to execute on.
trt_max_partition_iterations int Default: 1000 ,maximum iterations for TensorRT parser to get capability.
trt_min_subgraph_size int Default: 1 ,minimum size of TensorRT subgraphs.
trt_dump_subgraphs bool Dump optimized subgraphs for debugging, True/False
trt_force_sequential_engine_build bool Force sequential engine builds under multi-GPU, True/False
trt_context_memory_sharing_enable bool Share execution context memory between TensorRT subgraph, True/False
trt_layer_norm_fp32_fallback bool Force layer norm calculations to FP32, True/False
trt_cuda_graph_enable bool Capture CUDA graph for reduced launch overhead, True/False
trt_builder_optimization_level int Default : 3 , valid range [0-5],Set optimization level for TensorRT builder.
trt_auxiliary_streams int Default :-1 ,Set number of auxiliary streams for computation.
trt_tactic_sources string Specify tactics sources for TensorRT. Example: "-CUDNN,+CUBLAS" available keys: "CUBLAS", "CUBLAS_LT", "CUDNN" or "EDGE_MASK_CONVOLUTIONS".
trt_extra_plugin_lib_paths string Add additional plug-in library paths for TensorRT.
trt_detailed_build_log bool Enable detailed logging of build steps , True/False
trt_timing_cache_enable bool Enable use of timing cache to speed up builds, True/False
trt_timing_cache_path string Set path for storing timing cache.
trt_force_timing_cache bool Force use of timing cache regardless of GPU match, True/False
trt_profile_min_shapes string
trt_profile_max_shapes string
trt_profile_opt_shapes string

Huawei CANN options

Bellow can be found the list of the options for CANN:

Key Options type Value Options
device_id int Default: 0
npu_mem_limit int Default: max
arena_extend_strategy int Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1
enable_cann_graph bool Default: true, Whether to use the graph inference engine to speed up performance, True/False
dump_graphs bool Default: false, Whether to dump the subgraph into onnx format for analysis of subgraph segmentation, True/False
dump_om_model bool Default: true, Whether to dump the off line model for Ascend AI Processor to an .om file, True/False
precision_mode string Default: force_fp16, force_fp32/cube_fp16in_fp32out, force_fp16, allow_fp32_to_fp16, must_keep_origin_dtype, allow_mix_precision/allow_mix_precision_fp16
op_select_impl_mode string Default: high_performance, high_precision
optypelist_for_implmode string Default: None , Pooling, SoftmaxV2, LRN, ROIAlign

MIGraphX options

Bellow can be found the list of the options for MIGraphX:

Key Options type Value Options
device_id int Default: 0
migraphx_int8_enable int Default: 0 = false, nonzero = true, MIGraphX INT8 precision.
migraphx_fp16_enable int Default: 0 = false, nonzero = true, MIGraphX FP16 precision.
migraphx_use_native_calibration_table int Default: 0 = false, noznero = true, MIGraphx INT8 cal table .
migraphx_int8_calibration_table_name string MIGraphx INT8 calibration table name.

ROCm options

Bellow can be found the list of the options for ROCm:

Key Options type Value Options
device_id int Default: 0
arena_extend_strategy int Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1
do_copy_in_default_stream int Default: true, 0 = false, nonzero = true
gpu_mem_limit int Default: max
has_user_compute_stream
miopen_conv_exhaustive_search
tunable_op_enable int Default: true, 0 = false, nonzero = trueSet to use TunableOp.
enable_hip_graph int

OpenVINO options

Bellow can be found the list of the options for OpenVINO:

Key Options type Value Options
device_id int Default: 0
cache_dir string Any valid string path on the hardware target
device_type string CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination.
enable_dynamic_shapes bool True/False
enable_opencl_throttling bool True/False
enable_npu_fast_compile bool True/False
num_of_threads int Any unsigned positive number other than 0.

Direct ML options

Bellow can be found the list of the options for Direct ML

Key Options type Value Options
device_id int Default value: 0
Object Members
nameA string type containing the name of the execution provider.
optionsAn optional JSON object containing a pair of String values. First string value describes the option an the second string value defines the option value.

Please remember that each element block requires two layers of {} due to the syntax restrictions of the JSON format.