Configuration parameters for the inference engine. More...

#include <onnx_lre.hpp>

Public Attributes
Cryption	cryption
	Credentials for encrypted models (optional) More...

ExecutionProvider	executionProvider = ExecutionProvider::CUDA
	Hardware acceleration backend. More...

Precision	precision = Precision::Float32
	Numerical precision for calculations. More...

std::string	tensorRTTimingCachePath = ""
	Path to TensorRT timing cache file (persists optimization data) More...

std::string	tensorRTEngineCachePath = ""
	Path to TensorRT engine cache directory (stores optimized subgraphs) More...

int	deviceID = 0
	Device index for multi-GPU systems. More...

Detailed Description

Configuration parameters for the inference engine.

Comprehensive set of options controlling model loading behavior, hardware acceleration settings, precision, and caching. These options significantly impact inference performance and memory usage.

Different execution providers have different performance characteristics:

TensorRT: Best performance for large batch sizes, needs optimization time
CUDA: Good balance between performance and compatibility
CPU: Widest compatibility, useful for debugging

Precision settings affect both memory usage and computation speed:

Float32: Highest precision but uses more memory and may be slower
Float16: Good balance for GPU inference, ~2x faster than Float32
Int8: Fastest execution, smallest memory footprint, but requires calibration

Example usage:

// Basic configuration with defaults
OnnxLre::Options options;
 
// TensorRT with FP16 precision
options.executionProvider = OnnxLre::ExecutionProvider::TensorRT;
options.precision = OnnxLre::Precision::Float16;
options.deviceID = 0;  // First GPU
 
// Enable caching for faster startup
options.tensorRTEngineCachePath = "/tmp/trt_cache";
options.tensorRTTimingCachePath = "/tmp/timing.cache";
 
// Create engine with these options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", options);

Member Data Documentation

◆ cryption

Cryption OnnxLre::Options::cryption

Credentials for encrypted models (optional)

◆ executionProvider

ExecutionProvider OnnxLre::Options::executionProvider = ExecutionProvider::CUDA

Hardware acceleration backend.

◆ precision

Precision OnnxLre::Options::precision = Precision::Float32

Numerical precision for calculations.

◆ tensorRTTimingCachePath

std::string OnnxLre::Options::tensorRTTimingCachePath = ""

Path to TensorRT timing cache file (persists optimization data)

◆ tensorRTEngineCachePath

std::string OnnxLre::Options::tensorRTEngineCachePath = ""

Path to TensorRT engine cache directory (stores optimized subgraphs)

◆ deviceID

int OnnxLre::Options::deviceID = 0

Device index for multi-GPU systems.

The documentation for this struct was generated from the following file:

include/onnx_lre.hpp

Public Attributes