|
The Xentara ONNX Engine v2.0
User Manual
|
A JSON object describing a Execution Providers has following syntax:
The Onnx Engine plug-in supports plenty execution providers to accelerate the computing process.
Available execution provider list:
More supported execution providers and options can be found in Onnx website.
All the options and the option values must be defined as string values. The option values might describe other types as boolean or integer but still must be written as string value.
For example the option value of "device_id" for CUDA, the type is integer but must be written as string. If the value of the id is 0 must be written as string "0".
Bellow can be found the list of the options for CUDA:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| user_compute_stream | ||
| do_copy_in_default_stream | bool | Default: True, True/False |
| use_ep_level_unified_stream | bool | Default: False, True/False |
| gpu_mem_limit | int | Default: max |
| arena_extend_strategy | int | Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1 |
| cudnn_conv_algo_search | int | Default: EXHAUSTIVE = 0 ,HEURISTIC = 1 ,DEFAULT = 2 |
| cudnn_conv_use_max_workspace | int | Default: 1, 0 = false, nonzero = true |
| cudnn_conv1d_pad_to_nc1d | int | Default: 0, 0 = false, nonzero = true |
| enable_cuda_graph | int | Default: 0, 0 = false, nonzero = true |
| enable_skip_layer_norm_strict_mode | int | Default: 0, 0 = false, nonzero = true |
| use_tf32 | int | Default: 1, 0 = false, nonzero = true |
| gpu_external_[alloc|free|empty_cache] | int | Default: 0, 0 = false, nonzero = true |
| prefer_nhwc | int | Default: 0, 0 = false, nonzero = true |
Bellow can be found the list of the options for TensorRT:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| user_compute_stream | string | Set custom compute stream for GPU operations. |
| trt_engine_cache_enable | bool | Default: 0 = false, nonzero = true |
| trt_engine_cache_path | string | Set path to store cached TensorRT engines. |
| trt_engine_cache_prefix | string | Set prefix for cached engine files. |
| trt_engine_hw_compatible | bool | Maximize engine compatibility across Ampere+ GPUs, True/False |
| trt_max_workspace_size | int | Default: 1073741824. (1GB) maximum workspace size for TensorRT. |
| trt_fp16_enable | bool | Enable TensorRT FP16 precision. True/False |
| trt_int8_enable | bool | Enable TensorRT INT8 precision. True/False |
| trt_int8_calibration_table_name | string | Enable TensorRT INT8 calibration table name. |
| trt_int8_use_native_calibration_table | bool | Use native TensorRT generated calibration table, True/False |
| trt_build_heuristics_enable | bool | Build engine using heuristics to reduce build time, True/False |
| trt_sparsity_enable | bool | Control if sparsity can be used by TRT, True/False |
| trt_dla_enable | bool | Default False, Enable DLA (Deep Learning Accelerator), True/False |
| trt_dla_core | int | Default: 0, Specify DLA core to execute on. |
| trt_max_partition_iterations | int | Default: 1000 ,maximum iterations for TensorRT parser to get capability. |
| trt_min_subgraph_size | int | Default: 1 ,minimum size of TensorRT subgraphs. |
| trt_dump_subgraphs | bool | Dump optimized subgraphs for debugging, True/False |
| trt_force_sequential_engine_build | bool | Force sequential engine builds under multi-GPU, True/False |
| trt_context_memory_sharing_enable | bool | Share execution context memory between TensorRT subgraph, True/False |
| trt_layer_norm_fp32_fallback | bool | Force layer norm calculations to FP32, True/False |
| trt_cuda_graph_enable | bool | Capture CUDA graph for reduced launch overhead, True/False |
| trt_builder_optimization_level | int | Default : 3 , valid range [0-5],Set optimization level for TensorRT builder. |
| trt_auxiliary_streams | int | Default :-1 ,Set number of auxiliary streams for computation. |
| trt_tactic_sources | string | Specify tactics sources for TensorRT. Example: "-CUDNN,+CUBLAS" available keys: "CUBLAS", "CUBLAS_LT", "CUDNN" or "EDGE_MASK_CONVOLUTIONS". |
| trt_extra_plugin_lib_paths | string | Add additional plug-in library paths for TensorRT. |
| trt_detailed_build_log | bool | Enable detailed logging of build steps , True/False |
| trt_timing_cache_enable | bool | Enable use of timing cache to speed up builds, True/False |
| trt_timing_cache_path | string | Set path for storing timing cache. |
| trt_force_timing_cache | bool | Force use of timing cache regardless of GPU match, True/False |
| trt_profile_min_shapes | string | |
| trt_profile_max_shapes | string | |
| trt_profile_opt_shapes | string |
Bellow can be found the list of the options for CANN:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| npu_mem_limit | int | Default: max |
| arena_extend_strategy | int | Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1 |
| enable_cann_graph | bool | Default: true, Whether to use the graph inference engine to speed up performance, True/False |
| dump_graphs | bool | Default: false, Whether to dump the subgraph into onnx format for analysis of subgraph segmentation, True/False |
| dump_om_model | bool | Default: true, Whether to dump the off line model for Ascend AI Processor to an .om file, True/False |
| precision_mode | string | Default: force_fp16, force_fp32/cube_fp16in_fp32out, force_fp16, allow_fp32_to_fp16, must_keep_origin_dtype, allow_mix_precision/allow_mix_precision_fp16 |
| op_select_impl_mode | string | Default: high_performance, high_precision |
| optypelist_for_implmode | string | Default: None , Pooling, SoftmaxV2, LRN, ROIAlign |
Bellow can be found the list of the options for MIGraphX:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| migraphx_int8_enable | int | Default: 0 = false, nonzero = true, MIGraphX INT8 precision. |
| migraphx_fp16_enable | int | Default: 0 = false, nonzero = true, MIGraphX FP16 precision. |
| migraphx_use_native_calibration_table | int | Default: 0 = false, noznero = true, MIGraphx INT8 cal table . |
| migraphx_int8_calibration_table_name | string | MIGraphx INT8 calibration table name. |
Bellow can be found the list of the options for ROCm:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| arena_extend_strategy | int | Default: kNextPowerOfTwo = 0 ,kSameAsRequested = 1 |
| do_copy_in_default_stream | int | Default: true, 0 = false, nonzero = true |
| gpu_mem_limit | int | Default: max |
| has_user_compute_stream | ||
| miopen_conv_exhaustive_search | ||
| tunable_op_enable | int | Default: true, 0 = false, nonzero = trueSet to use TunableOp. |
| enable_hip_graph | int |
Bellow can be found the list of the options for OpenVINO:
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default: 0 |
| cache_dir | string | Any valid string path on the hardware target |
| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination. |
| enable_dynamic_shapes | bool | True/False |
| enable_opencl_throttling | bool | True/False |
| enable_npu_fast_compile | bool | True/False |
| num_of_threads | int | Any unsigned positive number other than 0. |
Bellow can be found the list of the options for Direct ML
| Key Options | type | Value Options |
|---|---|---|
| device_id | int | Default value: 0 |
| name | A string type containing the name of the execution provider. |
| options | An optional JSON object containing a pair of String values. First string value describes the option an the second string value defines the option value. |
Please remember that each element block requires two layers of {} due to the syntax restrictions of the JSON format.