Configuration parameters for the inference engine. More...

#include <onnx_lre.hpp>

Public Attributes
Cryption	cryption
	Encryption settings, including password and key path, for encrypted models.
ExecutionProvider	executionProvider = ExecutionProvider::UNSET
	Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.
Precision	precision = Precision::UNSET
	Specifies the precision type for model execution. Defaults to the best precision runtime can run.
std::string	tensorRTTimingCachePath = ""
	Optional path for storing TensorRT timing cache files.
std::string	tensorRTEngineCachePath = ""
	Optional path for storing TensorRT engine cache files.
int	deviceID = 0
	Device ID for GPU execution. Default is 0 (first GPU).
std::optional< bool >	enableCudaGraph
	Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models.
std::optional< bool >	enableSparsity
	Flag to enable or disable sparsity. Default is false.
int	auxStreams = 0
	Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams).
void *	cudaStream = nullptr

Detailed Description

Configuration parameters for the inference engine.

Comprehensive set of options controlling model loading behavior, hardware acceleration settings, precision, and caching. These options significantly impact inference performance and memory usage.

Different execution providers have different performance characteristics:

TensorRT: Best performance for large batch sizes, needs optimization time
CUDA: Good balance between performance and compatibility
CPU: Widest compatibility, useful for debugging

Precision settings affect both memory usage and computation speed:

Float32: Highest precision but uses more memory and may be slower
Float16: Good balance for GPU inference, ~2x faster than Float32
Int8: Fastest execution, smallest memory footprint, but requires calibration

Example usage:

// Basic configuration with defaults
OnnxLre::Options options;
 
// TensorRT with FP16 precision
options.executionProvider = OnnxLre::ExecutionProvider::TensorRT;
options.precision = OnnxLre::Precision::Float16;
options.deviceID = 0;  // First GPU
 
// Enable caching for faster startup
options.tensorRTEngineCachePath = "/tmp/trt_cache";
options.tensorRTTimingCachePath = "/tmp/timing.cache";
 
// Create engine with these options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", options);

Member Data Documentation

◆ cryption

Cryption OnnxLre::Options::cryption

Encryption settings, including password and key path, for encrypted models.

◆ executionProvider

ExecutionProvider OnnxLre::Options::executionProvider = ExecutionProvider::UNSET

Specifies the execution provider (e.g., CPU, CUDA, TensorRT). Defaults to the best available EP.

◆ precision

Precision OnnxLre::Options::precision = Precision::UNSET

Specifies the precision type for model execution. Defaults to the best precision runtime can run.

◆ tensorRTTimingCachePath

std::string OnnxLre::Options::tensorRTTimingCachePath = ""

Optional path for storing TensorRT timing cache files.

◆ tensorRTEngineCachePath

std::string OnnxLre::Options::tensorRTEngineCachePath = ""

Optional path for storing TensorRT engine cache files.

◆ deviceID

int OnnxLre::Options::deviceID = 0

Device ID for GPU execution. Default is 0 (first GPU).

◆ enableCudaGraph

std::optional<bool> OnnxLre::Options::enableCudaGraph

Enables CUDA Graph optimization for inference. When true, static models use CUDA Graphs for faster execution; for dynamic models, this is automatically disabled. Default is true for static models.

◆ enableSparsity

std::optional<bool> OnnxLre::Options::enableSparsity

Flag to enable or disable sparsity. Default is false.

◆ auxStreams

int OnnxLre::Options::auxStreams = 0

Number of auxiliary CUDA streams to use. Default is 0 (optimial auxiliary streams).

◆ cudaStream

void* OnnxLre::Options::cudaStream = nullptr

The documentation for this struct was generated from the following file:

include/onnx_lre/onnx_lre.hpp

Public Attributes