The Xentara ONNX Engine v2.0
User Manual
JSON Format for ONNX Engine Session Options

A JSON object describing a Session Options has following syntax:

"sessionOptions": {
"session.dynamic_block_base": "4"
}

Session Options

reference: options

Key Option type Value Options
session.disable_prepacking int Default: "0" = prepacking is enabled, "1" = prepacking is disabled.
session.use_env_allocators int A value of "1" means allocators registered in the env will be used. "0" means the allocators created in the session will be used. Use this to override the usage of env allocators on a per session level.
session.load_model_format string If unset, model type will default to ONNX unless inferred from filename ('.ort' == ORT format) or bytes to be ORT.
session.save_model_format string If unset, format will default to ONNX unless optimized_model_filepath ends in '.ort'. Set to 'ORT' (case sensitive) to save optimized model in ORT format when SessionOptions.optimized_model_path is set.
session.set_denormal_as_zero int If a value is "1", flush-to-zero and denormal-as-zero are applied. The default is "0"
session.disable_quant_qdq int Default: "0" unless the DirectML execution provider is registered, in which case it defaults to "1".It controls to run quantization model in QDQ (QuantizelinearDeQuantizelinear) format or not.
session.disable_double_qdq_remover int Default: "0": not to disable. ORT does remove the middle 2 Nodes from a Q->(QD->Q)->QD pairs. "1": disable. ORT doesn't remove the middle 2 Nodes from a Q->(QD->Q)->QD pairs. t controls whether to enable Double QDQ remover and Identical Children Consolidation.
session.enable_quant_qdq_cleanup int Default: "0" = Disable, "1" = Enables the removal of QuantizeLinear/DequantizeLinear node pairs once all QDQ handling has been completed.
optimization.enable_gelu_approximation int Default: "0" = Disable, "1" = Enable gelu approximation in graph optimization.
session.disable_aot_function_inlining int Default: "0" = Disable, "1" = Enable AheadOfTime function inlining. AOT function inlining examines the graph and attempts to inline as many locally defined functions in the modelas possible with the help of enabled execution providers.
optimization.disable_specified_optimizers Specifies the config for detecting subgraphs for memory footprint reduction. The value should be a string contains int separated using commas. The default value is "0:0".
session.use_device_allocator_for_initializers int Default: "0" = Disable. "1" Enable using device allocator for allocating initialized tensor memory.
session.inter_op.allow_spinning int Default: "1", thread will spin a number of times before blocking. "0": thread will block if found no job to run. Configure whether to allow the inter_op threads spinning a number of times before blocking.
session.intra_op.allow_spinning int Default: "1", thread will spin a number of times before blocking. "0": thread will block if found no job to run. Configure whether to allow the intra_op threads spinning a number of times before blocking.
session.use_ort_model_bytes_directly int Default: "0", copy the model bytes at the time of session creation to ensure the model bytes buffer is valid. "1" will disable copy the model bytes, and use the model bytes directly
session.use_ort_model_bytes_for_initializers Key for using the ORT format model flatbuffer bytes directly for initializers. This avoids copying the bytes and reduces peak memory usage during model loading and initialization. Requires session.use_ort_model_bytes_directly to be true.
session.qdqisint8allowed int If the ORT format model will be used on ARM platforms set to "1". For other platforms set to "0". This should only be specified when exporting an ORT format model for use on a different platform.
session.x64quantprecision string x64 SSE4.1/AVX2/AVX512(with no VNNI) has overflow problem with quantizied matrix multiplication with U8S8. Only effective with AVX2 or AVX512 platforms.
optimization.minimal_build_optimizations string "save": Save runtime optimizations when saving an ORT format model. "apply": Only apply optimizations available in a minimal build.
ep.nnapi.partitioning_stop_ops This option allows to decrease CPU usage between infrequent requests and forces any TP threads spinning stop immediately when the last of concurrent Run() call returns. Spinning is restarted on the next Run() call.
session.dynamic_block_base int The feature will not function by default, specify any positive integer. Enabling dynamic block-sizing for multithreading. With a positive value, thread pool will split a task of N iterations to blocks of size starting from N/(num_of_threads*dynamic_block_base).
session.force_spinning_stop This option allows to decrease CPU usage between infrequent requests and forces any TP threads spinning stop immediately when the last of concurrent Run() call returns.
session.strict_shape_type_inference int Default: "0": in some cases warnings will be logged but processing will continue. "1": all inconsistencies encountered during shape and type inference will result in failures.
session.allow_released_opsets_only int "1": every model using a more recent opset than the latest released one will fail. "0": the model may or may not work if onnxruntime cannot find an implementation, this option is used for development purpose.
session.node_partition_config_file string The file saves configuration for partitioning node among logic streams.
session.intra_op_thread_affinities string This Option allows setting affinities for intra op threads. Affinity string follows format: logical_processor_id,logical_processor_id;logical_processor_id,logical_processor_id for example "1,2,3;4,5". An other example with specified intervals e.g. "1-8;8-16;17-24".
session.debug_layout_transformation int Default: "0" = Disable, "1" = Enable. This option will dump out the model to assist debugging any issues with layout transformation
session.disable_cpu_ep_fallback int Default: "0" = Disable, "1" = Enable. If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot.
session.optimized_model_external_initializers_file_name Use this config when serializing a large model after optimization to specify an external initializers file.
session.optimized_model_external_initializers_min_size_in_bytes Use this config to control the minimum size of the initializer when externalizing it during serialization.
ep.context_enable int Default: "0" = Disable, "1" = Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file.
ep.context_file_path string Default: the original_file_name_ctx.onnx if not specified. Specify the file path for the Onnx model which has EP context.
ep.context_embed_mode int Default: "1" = Dump the EP context into the Onnx model, "0" = Dump the EP context into separate file, keep the file name in the Onnx model,Flag to specify whether to dump the EP context into the Onnx model.
mlas.enable_gemm_fastmath_arm64_bfloat16 int Default: "0" = Disable, Enable = "1", Gemm fastmath mode provides fp32 gemm acceleration with bfloat16 based matmu.
Object Members
sessionOptionsA JSON object containing a pair of String values. First string value describes the option an the second string value defines the option value.

Please remember that each element block requires two layers of {} due to the syntax restrictions of the JSON format.