ONNX-LRE
C++ API documentation
Loading...
Searching...
No Matches
OnnxLre::Options Struct Reference

Configuration parameters for the inference engine. More...

#include <onnx_lre.hpp>

Public Attributes

Cryption cryption
 Encryption settings, including password and key path, for encrypted models.
ExecutionProvider executionProvider = ExecutionProvider::UNSET
 Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.
Precision precision = Precision::UNSET
 Specifies the precision type for model execution. Defaults to the best precision runtime can run.
std::string tensorRTTimingCachePath = ""
 Optional path for storing TensorRT timing cache files.
std::string tensorRTEngineCachePath = ""
 Optional path for storing TensorRT engine cache files.
int deviceID = 0
 Device ID for GPU execution. Default is 0 (first GPU).
std::optional< bool > enableCudaGraph
 Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models.
std::optional< bool > enableSparsity
 Flag to enable or disable sparsity. Default is false.
int auxStreams = 0
 Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams).
void * cudaStream = nullptr

Detailed Description

Configuration parameters for the inference engine.

Comprehensive set of options controlling model loading behavior, hardware acceleration settings, precision, and caching. These options significantly impact inference performance and memory usage.

Different execution providers have different performance characteristics:

  • TensorRT: Best performance for large batch sizes, needs optimization time
  • CUDA: Good balance between performance and compatibility
  • CPU: Widest compatibility, useful for debugging

Precision settings affect both memory usage and computation speed:

  • Float32: Highest precision but uses more memory and may be slower
  • Float16: Good balance for GPU inference, ~2x faster than Float32
  • Int8: Fastest execution, smallest memory footprint, but requires calibration

Example usage:

// Basic configuration with defaults
// TensorRT with FP16 precision
options.deviceID = 0; // First GPU
// Enable caching for faster startup
options.tensorRTEngineCachePath = "/tmp/trt_cache";
options.tensorRTTimingCachePath = "/tmp/timing.cache";
// Create engine with these options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", options);
The LatentRuntimeEngine class provides a C++ interface to load and run ONNX models using ONNX Runtime...
Definition onnx_lre.hpp:253
@ Float16
16-bit floating point - reduced precision, ~50% memory reduction, faster on compatible hardware
Definition onnx_lre.hpp:185
@ TensorRT
NVIDIA TensorRT - highest performance for supported operations with optimization passes.
Definition onnx_lre.hpp:172
Configuration parameters for the inference engine.
Definition onnx_lre.hpp:233
std::string tensorRTTimingCachePath
Optional path for storing TensorRT timing cache files.
Definition onnx_lre.hpp:237
std::string tensorRTEngineCachePath
Optional path for storing TensorRT engine cache files.
Definition onnx_lre.hpp:238
ExecutionProvider executionProvider
Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.
Definition onnx_lre.hpp:235
int deviceID
Device ID for GPU execution. Default is 0 (first GPU).
Definition onnx_lre.hpp:239
Precision precision
Specifies the precision type for model execution. Defaults to the best precision runtime can run.
Definition onnx_lre.hpp:236

Member Data Documentation

◆ cryption

Cryption OnnxLre::Options::cryption

Encryption settings, including password and key path, for encrypted models.

◆ executionProvider

ExecutionProvider OnnxLre::Options::executionProvider = ExecutionProvider::UNSET

Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.

◆ precision

Precision OnnxLre::Options::precision = Precision::UNSET

Specifies the precision type for model execution. Defaults to the best precision runtime can run.

◆ tensorRTTimingCachePath

std::string OnnxLre::Options::tensorRTTimingCachePath = ""

Optional path for storing TensorRT timing cache files.

◆ tensorRTEngineCachePath

std::string OnnxLre::Options::tensorRTEngineCachePath = ""

Optional path for storing TensorRT engine cache files.

◆ deviceID

int OnnxLre::Options::deviceID = 0

Device ID for GPU execution. Default is 0 (first GPU).

◆ enableCudaGraph

std::optional<bool> OnnxLre::Options::enableCudaGraph

Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models.

◆ enableSparsity

std::optional<bool> OnnxLre::Options::enableSparsity

Flag to enable or disable sparsity. Default is false.

◆ auxStreams

int OnnxLre::Options::auxStreams = 0

Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams).

◆ cudaStream

void* OnnxLre::Options::cudaStream = nullptr

The documentation for this struct was generated from the following file:
  • include/onnx_lre/onnx_lre.hpp