ONNX-LRE
C++ API documentation
Core LRE Runtime Engine API

Core runtime functions for model loading and inference. More...

Functions

 OnnxLre::LatentRuntimeEngine::LatentRuntimeEngine (const std::string &modelPath, const Options &config=Options())
 High-performance inference engine for ONNX models. More...
 
 OnnxLre::LatentRuntimeEngine::~LatentRuntimeEngine ()
 Releases all allocated resources. More...
 
size_t OnnxLre::LatentRuntimeEngine::getNumberOfInputs () const
 Returns the number of input tensors required by the model. More...
 
size_t OnnxLre::LatentRuntimeEngine::getNumberOfOutputs () const
 Returns the number of output tensors produced by the model. More...
 
const std::vector< const char * > & OnnxLre::LatentRuntimeEngine::getInputNames () const
 Retrieves the names of all model input nodes. More...
 
const std::vector< const char * > & OnnxLre::LatentRuntimeEngine::getOutputNames () const
 Retrieves the names of all model output nodes. More...
 
std::vector< std::string > OnnxLre::LatentRuntimeEngine::getInputDTypes () const
 Gets the data types of all input tensors as strings. More...
 
std::vector< std::string > OnnxLre::LatentRuntimeEngine::getOutputDTypes () const
 Gets the data types of all output tensors as strings. More...
 
const std::vector< std::vector< int64_t > > & OnnxLre::LatentRuntimeEngine::getInputShapes () const
 Retrieves the dimensional shapes of all input tensors. More...
 
const std::vector< std::vector< int64_t > > & OnnxLre::LatentRuntimeEngine::getOutputShapes () const
 Retrieves the dimensional shapes of all output tensors. More...
 
void OnnxLre::LatentRuntimeEngine::infer (const std::vector< DLManagedTensor * > &t_input_data_vec)
 Performs inference using DLPack tensor inputs. More...
 
void OnnxLre::LatentRuntimeEngine::infer (const std::vector< Ort::Value > &t_input_data_vec)
 Performs inference using ONNX Runtime tensor inputs. More...
 
void OnnxLre::LatentRuntimeEngine::infer (const std::vector< void * > &t_input_data_vec, const std::vector< int64_t * > shape, const std::string device)
 Performs inference using raw memory pointers and shapes. More...
 
Ort::Value OnnxLre::LatentRuntimeEngine::makeORTTensor (void *t_input_data_vec, const int64_t *shape, int input_index, const std::string &device)
 Creates an ONNX Runtime tensor from raw memory. More...
 
std::vector< DLManagedTensor * > OnnxLre::LatentRuntimeEngine::getOutput ()
 Retrieves inference results as DLPack tensors. More...
 
std::vector< Ort::Value > OnnxLre::LatentRuntimeEngine::getOutputOrt ()
 Retrieves and transfers ownership of inference results as ONNX Runtime tensors. More...
 
void OnnxLre::LatentRuntimeEngine::setCPUOutput (bool use_cpu)
 Controls output tensor placement between device and host memory. More...
 
bool OnnxLre::LatentRuntimeEngine::isCPUOutput ()
 Checks the current output tensor memory placement policy. More...
 
std::string OnnxLre::LatentRuntimeEngine::getMetaValue (std::string key)
 Retrieves model metadata by key. More...
 

Detailed Description

Core runtime functions for model loading and inference.

Function Documentation

◆ LatentRuntimeEngine()

LatentRuntimeEngine::LatentRuntimeEngine ( const std::string &  modelPath,
const Options config = Options() 
)
explicit

High-performance inference engine for ONNX models.

The Latent Runtime Engine (LRE) is designed to provide a seamless and efficient interface for executing machine learning models in ONNX format. It abstracts the complexities of hardware acceleration, memory management, and tensor conversions, allowing developers to focus on model inference without worrying about the underlying implementation details.

Key features:

  • Support for multiple hardware acceleration backends (TensorRT, CUDA, CPU)
  • Flexible precision modes (FP32, FP16, INT8)
  • Tensor interoperability with DLPack and raw memory
  • Configurable memory placement for inputs/outputs
  • Automatic model introspection capabilities

Example usage:

// Initialize with CUDA acceleration
opts.deviceID = 0;
// Create engine with model path and options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", opts);
// Get model input requirements
auto inputShapes = engine.getInputShapes();
// Prepare input data (example with raw pointer)
std::vector<float> inputData(calculateSize(inputShapes[0]));
// ... fill input data ...
// Run inference with raw pointers
std::vector<void*> inputs = {inputData.data()};
std::vector<int64_t*> shapes = {const_cast<int64_t*>(inputShapes[0].data())};
engine.infer(inputs, shapes, "CPU");
// Get and safely manage output tensors
std::vector<SafeDLTensor> safeOutputs;
for (auto* tensor : engine.getOutput()) {
safeOutputs.emplace_back(tensor); // Immediately wrap in smart pointer
}
// ... process outputs ...
// Clean up DLPack tensors
for (auto tensor : outputs) {
if (tensor->deleter) tensor->deleter(tensor);
}
Parameters
modelPathPath to the ONNX model file (.onnx)
configEngine configuration options (defaults to CUDA with FP32 precision)
Exceptions
std::runtime_errorIf model loading fails or requested hardware is unavailable

◆ ~LatentRuntimeEngine()

LatentRuntimeEngine::~LatentRuntimeEngine ( )

Releases all allocated resources.

Ensures proper cleanup of ONNX Runtime session, provider options, and any tensor data owned by the engine.

◆ getNumberOfInputs()

size_t LatentRuntimeEngine::getNumberOfInputs ( ) const

Returns the number of input tensors required by the model.

Returns
Number of distinct input tensors expected for inference

◆ getNumberOfOutputs()

size_t LatentRuntimeEngine::getNumberOfOutputs ( ) const

Returns the number of output tensors produced by the model.

Returns
Number of distinct output tensors generated by inference

◆ getInputNames()

const std::vector< const char * > & LatentRuntimeEngine::getInputNames ( ) const

Retrieves the names of all model input nodes.

These names are required to match inputs with their correct positions when using frameworks that rely on named tensor matching.

Returns
Vector of input node names (lifetime tied to engine instance)

◆ getOutputNames()

const std::vector< const char * > & LatentRuntimeEngine::getOutputNames ( ) const

Retrieves the names of all model output nodes.

These names help identify the semantic meaning of output tensors and are necessary for frameworks using named tensor matching.

Returns
Vector of output node names (lifetime tied to engine instance)

◆ getInputDTypes()

std::vector< std::string > LatentRuntimeEngine::getInputDTypes ( ) const

Gets the data types of all input tensors as strings.

Provides the ONNX data type of each input tensor in human-readable format (e.g., "float32", "int64", "float16").

Returns
Vector of input data type strings

◆ getOutputDTypes()

std::vector< std::string > LatentRuntimeEngine::getOutputDTypes ( ) const

Gets the data types of all output tensors as strings.

Provides the ONNX data type of each output tensor in human-readable format (e.g., "float32", "int64", "float16").

Returns
Vector of output data type strings

◆ getInputShapes()

const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getInputShapes ( ) const

Retrieves the dimensional shapes of all input tensors.

Returns the exact shape requirements for each input tensor. Dynamic dimensions (if supported by the model) will be represented as -1.

Returns
Vector of shape vectors, where each shape vector contains the dimensions of one input tensor

◆ getOutputShapes()

const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getOutputShapes ( ) const

Retrieves the dimensional shapes of all output tensors.

Provides the shape information for each output tensor. For models with dynamic output shapes, these will reflect the shapes from the most recent inference or the default shapes if no inference has been performed.

Returns
Vector of shape vectors, where each shape vector contains the dimensions of one output tensor

◆ infer() [1/3]

void LatentRuntimeEngine::infer ( const std::vector< DLManagedTensor * > &  t_input_data_vec)

Performs inference using DLPack tensor inputs.

Executes the model using the provided DLPack tensors as input. This approach enables easy integration with frameworks that support DLPack as a tensor interchange format (PyTorch, MXNet, etc.).

Parameters
t_input_data_vecVector of DLManagedTensor pointers for each input
Exceptions
std::runtime_errorIf input count/shapes don't match model requirements

◆ infer() [2/3]

void LatentRuntimeEngine::infer ( const std::vector< Ort::Value > &  t_input_data_vec)

Performs inference using ONNX Runtime tensor inputs.

Executes the model using native ONNX Runtime tensors. This is the most efficient approach when working directly with Ort::Value tensors.

Parameters
t_input_data_vecVector of Ort::Value objects for each input
Exceptions
std::runtime_errorIf input count/shapes don't match model requirements

◆ infer() [3/3]

void LatentRuntimeEngine::infer ( const std::vector< void * > &  t_input_data_vec,
const std::vector< int64_t * >  shape,
const std::string  device 
)

Performs inference using raw memory pointers and shapes.

Executes the model using raw data pointers. This approach allows for direct integration with any memory management system without requiring conversion to specific tensor formats first.

Parameters
t_input_data_vecVector of void pointers to input data buffers
shapeVector of pointers to shape arrays for each input
deviceTarget device specifier ("CPU" or "GPU")
Exceptions
std::runtime_errorIf input count/shapes don't match model requirements

◆ makeORTTensor()

Ort::Value LatentRuntimeEngine::makeORTTensor ( void *  t_input_data_vec,
const int64_t *  shape,
int  input_index,
const std::string &  device 
)

Creates an ONNX Runtime tensor from raw memory.

Utility method to construct an Ort::Value tensor from raw memory and shape information. Handles proper memory placement on the specified device.

Parameters
t_input_data_vecPointer to raw tensor data
shapeArray of dimensions defining the tensor shape
input_indexIndex into the model's input array (used to determine data type)
deviceTarget device ("CPU" or "GPU")
Returns
Ort::Value Created tensor with data on the specified device
Exceptions
Ort::ExceptionIf tensor creation fails

◆ getOutput()

std::vector< DLManagedTensor * > LatentRuntimeEngine::getOutput ( )

Retrieves inference results as DLPack tensors.

Returns the model outputs from the most recent inference as DLPack tensors. Handles conversion between internal representation and DLPack format, including proper memory placement based on the configured output device.

Returns
Vector of newly created DLManagedTensor pointers (caller takes ownership)
Note
Returns a dummy tensor if no inference has been performed

◆ getOutputOrt()

std::vector< Ort::Value > LatentRuntimeEngine::getOutputOrt ( )

Retrieves and transfers ownership of inference results as ONNX Runtime tensors.

Returns and transfers ownership of the internal output tensors to the caller. After calling this method, the engine no longer holds references to these tensors.

Returns
Vector of Ort::Value output tensors (ownership transferred to caller)
Note
The engine's internal output tensor storage is cleared after this call

◆ setCPUOutput()

void LatentRuntimeEngine::setCPUOutput ( bool  use_cpu)

Controls output tensor placement between device and host memory.

Configures whether output tensors should be automatically moved to CPU memory after inference. This is particularly useful when inference is performed on GPU but the results need to be accessed by the CPU for post-processing.

Parameters
use_cpuTrue to place outputs in CPU memory, false to keep them on the inference device

◆ isCPUOutput()

bool LatentRuntimeEngine::isCPUOutput ( )

Checks the current output tensor memory placement policy.

Returns
True if outputs are configured to be placed in CPU memory, false if they remain on the inference device

◆ getMetaValue()

std::string LatentRuntimeEngine::getMetaValue ( std::string  key)

Retrieves model metadata by key.

Extracts custom metadata values stored within the ONNX model file. Common keys include "author", "version", "description", etc., but any key-value pairs stored in the model's metadata can be accessed.

Parameters
keyMetadata key to retrieve
Returns
String value associated with the key, or empty string if key doesn't exist
OnnxLre::Options::deviceID
int deviceID
Device index for multi-GPU systems.
Definition: onnx_lre.hpp:164
OnnxLre::Options::precision
Precision precision
Numerical precision for calculations.
Definition: onnx_lre.hpp:161
OnnxLre::ExecutionProvider::CUDA
@ CUDA
NVIDIA CUDA - GPU acceleration without TensorRT optimizations.
OnnxLre::Options
Configuration parameters for the inference engine.
Definition: onnx_lre.hpp:157
OnnxLre::Precision::Float16
@ Float16
16-bit floating point - reduced precision, ~50% memory reduction, faster on compatible hardware
OnnxLre::LatentRuntimeEngine
Definition: onnx_lre.hpp:167
OnnxLre::Options::executionProvider
ExecutionProvider executionProvider
Hardware acceleration backend.
Definition: onnx_lre.hpp:160