![]() |
ONNX-LRE
C++ API documentation
|
Configuration parameters for the inference engine. More...
#include <onnx_lre.hpp>
Public Attributes | |
| Cryption | cryption |
| Encryption settings, including password and key path, for encrypted models. | |
| ExecutionProvider | executionProvider = ExecutionProvider::UNSET |
| Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP. | |
| Precision | precision = Precision::UNSET |
| Specifies the precision type for model execution. Defaults to the best precision runtime can run. | |
| std::string | tensorRTTimingCachePath = "" |
| Optional path for storing TensorRT timing cache files. | |
| std::string | tensorRTEngineCachePath = "" |
| Optional path for storing TensorRT engine cache files. | |
| int | deviceID = 0 |
| Device ID for GPU execution. Default is 0 (first GPU). | |
| std::optional< bool > | enableCudaGraph |
| Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models. | |
| std::optional< bool > | enableSparsity |
| Flag to enable or disable sparsity. Default is false. | |
| int | auxStreams = 0 |
| Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams). | |
| void * | cudaStream = nullptr |
Configuration parameters for the inference engine.
Comprehensive set of options controlling model loading behavior, hardware acceleration settings, precision, and caching. These options significantly impact inference performance and memory usage.
Different execution providers have different performance characteristics:
Precision settings affect both memory usage and computation speed:
Example usage:
| Cryption OnnxLre::Options::cryption |
Encryption settings, including password and key path, for encrypted models.
| ExecutionProvider OnnxLre::Options::executionProvider = ExecutionProvider::UNSET |
Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.
| Precision OnnxLre::Options::precision = Precision::UNSET |
Specifies the precision type for model execution. Defaults to the best precision runtime can run.
| std::string OnnxLre::Options::tensorRTTimingCachePath = "" |
Optional path for storing TensorRT timing cache files.
| std::string OnnxLre::Options::tensorRTEngineCachePath = "" |
Optional path for storing TensorRT engine cache files.
| int OnnxLre::Options::deviceID = 0 |
Device ID for GPU execution. Default is 0 (first GPU).
| std::optional<bool> OnnxLre::Options::enableCudaGraph |
Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models.
| std::optional<bool> OnnxLre::Options::enableSparsity |
Flag to enable or disable sparsity. Default is false.
| int OnnxLre::Options::auxStreams = 0 |
Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams).
| void* OnnxLre::Options::cudaStream = nullptr |