ONNX-LRE
C++ API documentation
OnnxLre::Options Struct Reference

Configuration parameters for the inference engine. More...

#include <onnx_lre.hpp>

Public Attributes

Cryption cryption
 Credentials for encrypted models (optional) More...
 
ExecutionProvider executionProvider = ExecutionProvider::CUDA
 Hardware acceleration backend. More...
 
Precision precision = Precision::Float32
 Numerical precision for calculations. More...
 
std::string tensorRTTimingCachePath = ""
 Path to TensorRT timing cache file (persists optimization data) More...
 
std::string tensorRTEngineCachePath = ""
 Path to TensorRT engine cache directory (stores optimized subgraphs) More...
 
int deviceID = 0
 Device index for multi-GPU systems. More...
 

Detailed Description

Configuration parameters for the inference engine.

Comprehensive set of options controlling model loading behavior, hardware acceleration settings, precision, and caching. These options significantly impact inference performance and memory usage.

Different execution providers have different performance characteristics:

  • TensorRT: Best performance for large batch sizes, needs optimization time
  • CUDA: Good balance between performance and compatibility
  • CPU: Widest compatibility, useful for debugging

Precision settings affect both memory usage and computation speed:

  • Float32: Highest precision but uses more memory and may be slower
  • Float16: Good balance for GPU inference, ~2x faster than Float32
  • Int8: Fastest execution, smallest memory footprint, but requires calibration

Example usage:

// Basic configuration with defaults
// TensorRT with FP16 precision
options.deviceID = 0; // First GPU
// Enable caching for faster startup
options.tensorRTEngineCachePath = "/tmp/trt_cache";
options.tensorRTTimingCachePath = "/tmp/timing.cache";
// Create engine with these options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", options);

Member Data Documentation

◆ cryption

Cryption OnnxLre::Options::cryption

Credentials for encrypted models (optional)

◆ executionProvider

ExecutionProvider OnnxLre::Options::executionProvider = ExecutionProvider::CUDA

Hardware acceleration backend.

◆ precision

Precision OnnxLre::Options::precision = Precision::Float32

Numerical precision for calculations.

◆ tensorRTTimingCachePath

std::string OnnxLre::Options::tensorRTTimingCachePath = ""

Path to TensorRT timing cache file (persists optimization data)

◆ tensorRTEngineCachePath

std::string OnnxLre::Options::tensorRTEngineCachePath = ""

Path to TensorRT engine cache directory (stores optimized subgraphs)

◆ deviceID

int OnnxLre::Options::deviceID = 0

Device index for multi-GPU systems.


The documentation for this struct was generated from the following file:
OnnxLre::Options::deviceID
int deviceID
Device index for multi-GPU systems.
Definition: onnx_lre.hpp:164
OnnxLre::Options::precision
Precision precision
Numerical precision for calculations.
Definition: onnx_lre.hpp:161
OnnxLre::Options::tensorRTEngineCachePath
std::string tensorRTEngineCachePath
Path to TensorRT engine cache directory (stores optimized subgraphs)
Definition: onnx_lre.hpp:163
OnnxLre::Options
Configuration parameters for the inference engine.
Definition: onnx_lre.hpp:157
OnnxLre::Precision::Float16
@ Float16
16-bit floating point - reduced precision, ~50% memory reduction, faster on compatible hardware
OnnxLre::Options::tensorRTTimingCachePath
std::string tensorRTTimingCachePath
Path to TensorRT timing cache file (persists optimization data)
Definition: onnx_lre.hpp:162
OnnxLre::LatentRuntimeEngine
Definition: onnx_lre.hpp:167
OnnxLre::Options::executionProvider
ExecutionProvider executionProvider
Hardware acceleration backend.
Definition: onnx_lre.hpp:160
OnnxLre::ExecutionProvider::TensorRT
@ TensorRT
NVIDIA TensorRT - highest performance for supported operations with optimization passes.