|
The Xentara ONNX Engine v2.0
User Manual
|
A JSON object describing a Session Options has following syntax:
reference: options
| Key Option | type | Value Options |
|---|---|---|
| session.disable_prepacking | int | Default: "0" = prepacking is enabled, "1" = prepacking is disabled. |
| session.use_env_allocators | int | A value of "1" means allocators registered in the env will be used. "0" means the allocators created in the session will be used. Use this to override the usage of env allocators on a per session level. |
| session.load_model_format | string | If unset, model type will default to ONNX unless inferred from filename ('.ort' == ORT format) or bytes to be ORT. |
| session.save_model_format | string | If unset, format will default to ONNX unless optimized_model_filepath ends in '.ort'. Set to 'ORT' (case sensitive) to save optimized model in ORT format when SessionOptions.optimized_model_path is set. |
| session.set_denormal_as_zero | int | If a value is "1", flush-to-zero and denormal-as-zero are applied. The default is "0" |
| session.disable_quant_qdq | int | Default: "0" unless the DirectML execution provider is registered, in which case it defaults to "1".It controls to run quantization model in QDQ (QuantizelinearDeQuantizelinear) format or not. |
| session.disable_double_qdq_remover | int | Default: "0": not to disable. ORT does remove the middle 2 Nodes from a Q->(QD->Q)->QD pairs. "1": disable. ORT doesn't remove the middle 2 Nodes from a Q->(QD->Q)->QD pairs. t controls whether to enable Double QDQ remover and Identical Children Consolidation. |
| session.enable_quant_qdq_cleanup | int | Default: "0" = Disable, "1" = Enables the removal of QuantizeLinear/DequantizeLinear node pairs once all QDQ handling has been completed. |
| optimization.enable_gelu_approximation | int | Default: "0" = Disable, "1" = Enable gelu approximation in graph optimization. |
| session.disable_aot_function_inlining | int | Default: "0" = Disable, "1" = Enable AheadOfTime function inlining. AOT function inlining examines the graph and attempts to inline as many locally defined functions in the modelas possible with the help of enabled execution providers. |
| optimization.disable_specified_optimizers | Specifies the config for detecting subgraphs for memory footprint reduction. The value should be a string contains int separated using commas. The default value is "0:0". | |
| session.use_device_allocator_for_initializers | int | Default: "0" = Disable. "1" Enable using device allocator for allocating initialized tensor memory. |
| session.inter_op.allow_spinning | int | Default: "1", thread will spin a number of times before blocking. "0": thread will block if found no job to run. Configure whether to allow the inter_op threads spinning a number of times before blocking. |
| session.intra_op.allow_spinning | int | Default: "1", thread will spin a number of times before blocking. "0": thread will block if found no job to run. Configure whether to allow the intra_op threads spinning a number of times before blocking. |
| session.use_ort_model_bytes_directly | int | Default: "0", copy the model bytes at the time of session creation to ensure the model bytes buffer is valid. "1" will disable copy the model bytes, and use the model bytes directly |
| session.use_ort_model_bytes_for_initializers | Key for using the ORT format model flatbuffer bytes directly for initializers. This avoids copying the bytes and reduces peak memory usage during model loading and initialization. Requires session.use_ort_model_bytes_directly to be true. | |
| session.qdqisint8allowed | int | If the ORT format model will be used on ARM platforms set to "1". For other platforms set to "0". This should only be specified when exporting an ORT format model for use on a different platform. |
| session.x64quantprecision | string | x64 SSE4.1/AVX2/AVX512(with no VNNI) has overflow problem with quantizied matrix multiplication with U8S8. Only effective with AVX2 or AVX512 platforms. |
| optimization.minimal_build_optimizations | string | "save": Save runtime optimizations when saving an ORT format model. "apply": Only apply optimizations available in a minimal build. |
| ep.nnapi.partitioning_stop_ops | This option allows to decrease CPU usage between infrequent requests and forces any TP threads spinning stop immediately when the last of concurrent Run() call returns. Spinning is restarted on the next Run() call. | |
| session.dynamic_block_base | int | The feature will not function by default, specify any positive integer. Enabling dynamic block-sizing for multithreading. With a positive value, thread pool will split a task of N iterations to blocks of size starting from N/(num_of_threads*dynamic_block_base). |
| session.force_spinning_stop | This option allows to decrease CPU usage between infrequent requests and forces any TP threads spinning stop immediately when the last of concurrent Run() call returns. | |
| session.strict_shape_type_inference | int | Default: "0": in some cases warnings will be logged but processing will continue. "1": all inconsistencies encountered during shape and type inference will result in failures. |
| session.allow_released_opsets_only | int | "1": every model using a more recent opset than the latest released one will fail. "0": the model may or may not work if onnxruntime cannot find an implementation, this option is used for development purpose. |
| session.node_partition_config_file | string | The file saves configuration for partitioning node among logic streams. |
| session.intra_op_thread_affinities | string | This Option allows setting affinities for intra op threads. Affinity string follows format: logical_processor_id,logical_processor_id;logical_processor_id,logical_processor_id for example "1,2,3;4,5". An other example with specified intervals e.g. "1-8;8-16;17-24". |
| session.debug_layout_transformation | int | Default: "0" = Disable, "1" = Enable. This option will dump out the model to assist debugging any issues with layout transformation |
| session.disable_cpu_ep_fallback | int | Default: "0" = Disable, "1" = Enable. If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot. |
| session.optimized_model_external_initializers_file_name | Use this config when serializing a large model after optimization to specify an external initializers file. | |
| session.optimized_model_external_initializers_min_size_in_bytes | Use this config to control the minimum size of the initializer when externalizing it during serialization. | |
| ep.context_enable | int | Default: "0" = Disable, "1" = Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file. |
| ep.context_file_path | string | Default: the original_file_name_ctx.onnx if not specified. Specify the file path for the Onnx model which has EP context. |
| ep.context_embed_mode | int | Default: "1" = Dump the EP context into the Onnx model, "0" = Dump the EP context into separate file, keep the file name in the Onnx model,Flag to specify whether to dump the EP context into the Onnx model. |
| mlas.enable_gemm_fastmath_arm64_bfloat16 | int | Default: "0" = Disable, Enable = "1", Gemm fastmath mode provides fp32 gemm acceleration with bfloat16 based matmu. |
| sessionOptions | A JSON object containing a pair of String values. First string value describes the option an the second string value defines the option value. |
Please remember that each element block requires two layers of {} due to the syntax restrictions of the JSON format.