Core runtime functions for model loading and inference. More...

Functions
	OnnxLre::LatentRuntimeEngine::LatentRuntimeEngine (const std::string &modelPath, const Options &config=Options())
	High-performance inference engine for ONNX models. More...

	OnnxLre::LatentRuntimeEngine::~LatentRuntimeEngine ()
	Releases all allocated resources. More...

size_t	OnnxLre::LatentRuntimeEngine::getNumberOfInputs () const
	Returns the number of input tensors required by the model. More...

size_t	OnnxLre::LatentRuntimeEngine::getNumberOfOutputs () const
	Returns the number of output tensors produced by the model. More...

const std::vector< const char * > &	OnnxLre::LatentRuntimeEngine::getInputNames () const
	Retrieves the names of all model input nodes. More...

const std::vector< const char * > &	OnnxLre::LatentRuntimeEngine::getOutputNames () const
	Retrieves the names of all model output nodes. More...

std::vector< std::string >	OnnxLre::LatentRuntimeEngine::getInputDTypes () const
	Gets the data types of all input tensors as strings. More...

std::vector< std::string >	OnnxLre::LatentRuntimeEngine::getOutputDTypes () const
	Gets the data types of all output tensors as strings. More...

const std::vector< std::vector< int64_t > > &	OnnxLre::LatentRuntimeEngine::getInputShapes () const
	Retrieves the dimensional shapes of all input tensors. More...

const std::vector< std::vector< int64_t > > &	OnnxLre::LatentRuntimeEngine::getOutputShapes () const
	Retrieves the dimensional shapes of all output tensors. More...

void	OnnxLre::LatentRuntimeEngine::infer (const std::vector< DLManagedTensor * > &t_input_data_vec)
	Performs inference using DLPack tensor inputs. More...

void	OnnxLre::LatentRuntimeEngine::infer (const std::vector< Ort::Value > &t_input_data_vec)
	Performs inference using ONNX Runtime tensor inputs. More...

void	OnnxLre::LatentRuntimeEngine::infer (const std::vector< void * > &t_input_data_vec, const std::vector< int64_t * > shape, const std::string device)
	Performs inference using raw memory pointers and shapes. More...

Ort::Value	OnnxLre::LatentRuntimeEngine::makeORTTensor (void t_input_data_vec, const int64_t shape, int input_index, const std::string &device)
	Creates an ONNX Runtime tensor from raw memory. More...

std::vector< DLManagedTensor * >	OnnxLre::LatentRuntimeEngine::getOutput ()
	Retrieves inference results as DLPack tensors. More...

std::vector< Ort::Value >	OnnxLre::LatentRuntimeEngine::getOutputOrt ()
	Retrieves and transfers ownership of inference results as ONNX Runtime tensors. More...

void	OnnxLre::LatentRuntimeEngine::setCPUOutput (bool use_cpu)
	Controls output tensor placement between device and host memory. More...

bool	OnnxLre::LatentRuntimeEngine::isCPUOutput ()
	Checks the current output tensor memory placement policy. More...

std::string	OnnxLre::LatentRuntimeEngine::getMetaValue (std::string key)
	Retrieves model metadata by key. More...

Detailed Description

Core runtime functions for model loading and inference.

Function Documentation

◆ LatentRuntimeEngine()

LatentRuntimeEngine::LatentRuntimeEngine	(	const std::string &	modelPath,
		const Options &	config = `Options()`
	)

explicit

High-performance inference engine for ONNX models.

The Latent Runtime Engine (LRE) is designed to provide a seamless and efficient interface for executing machine learning models in ONNX format. It abstracts the complexities of hardware acceleration, memory management, and tensor conversions, allowing developers to focus on model inference without worrying about the underlying implementation details.

Key features:

Support for multiple hardware acceleration backends (TensorRT, CUDA, CPU)
Flexible precision modes (FP32, FP16, INT8)
Tensor interoperability with DLPack and raw memory
Configurable memory placement for inputs/outputs
Automatic model introspection capabilities

Example usage:

// Initialize with CUDA acceleration
OnnxLre::Options opts;
opts.executionProvider = OnnxLre::ExecutionProvider::CUDA;
opts.precision = OnnxLre::Precision::Float16;
opts.deviceID = 0;
 
// Create engine with model path and options
OnnxLre::LatentRuntimeEngine engine("/path/to/model.onnx", opts);
 
// Get model input requirements
auto inputShapes = engine.getInputShapes();
 
// Prepare input data (example with raw pointer)
std::vector<float> inputData(calculateSize(inputShapes[0]));
// ... fill input data ...
 
// Run inference with raw pointers
std::vector<void*> inputs = {inputData.data()};
std::vector<int64_t*> shapes = {const_cast<int64_t*>(inputShapes[0].data())};
engine.infer(inputs, shapes, "CPU");
 
// Get and safely manage output tensors
std::vector<SafeDLTensor> safeOutputs;
for (auto* tensor : engine.getOutput()) {
    safeOutputs.emplace_back(tensor);  // Immediately wrap in smart pointer
}
 
// ... process outputs ...
 
// Clean up DLPack tensors
for (auto tensor : outputs) {
    if (tensor->deleter) tensor->deleter(tensor);
}

Parameters

modelPath	Path to the ONNX model file (.onnx)
config	Engine configuration options (defaults to CUDA with FP32 precision)

Exceptions

std::runtime_error If model loading fails or requested hardware is unavailable

◆ ~LatentRuntimeEngine()

LatentRuntimeEngine::~LatentRuntimeEngine ( )

Releases all allocated resources.

Ensures proper cleanup of ONNX Runtime session, provider options, and any tensor data owned by the engine.

◆ getNumberOfInputs()

size_t LatentRuntimeEngine::getNumberOfInputs ( ) const

Returns the number of input tensors required by the model.

Returns: Number of distinct input tensors expected for inference

◆ getNumberOfOutputs()

size_t LatentRuntimeEngine::getNumberOfOutputs ( ) const

Returns the number of output tensors produced by the model.

Returns: Number of distinct output tensors generated by inference

◆ getInputNames()

const std::vector< const char * > & LatentRuntimeEngine::getInputNames ( ) const

Retrieves the names of all model input nodes.

These names are required to match inputs with their correct positions when using frameworks that rely on named tensor matching.

Returns: Vector of input node names (lifetime tied to engine instance)

◆ getOutputNames()

const std::vector< const char * > & LatentRuntimeEngine::getOutputNames ( ) const

Retrieves the names of all model output nodes.

These names help identify the semantic meaning of output tensors and are necessary for frameworks using named tensor matching.

Returns: Vector of output node names (lifetime tied to engine instance)

◆ getInputDTypes()

std::vector< std::string > LatentRuntimeEngine::getInputDTypes ( ) const

Gets the data types of all input tensors as strings.

Provides the ONNX data type of each input tensor in human-readable format (e.g., "float32", "int64", "float16").

Returns: Vector of input data type strings

◆ getOutputDTypes()

std::vector< std::string > LatentRuntimeEngine::getOutputDTypes ( ) const

Gets the data types of all output tensors as strings.

Provides the ONNX data type of each output tensor in human-readable format (e.g., "float32", "int64", "float16").

Returns: Vector of output data type strings

◆ getInputShapes()

const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getInputShapes ( ) const

Retrieves the dimensional shapes of all input tensors.

Returns the exact shape requirements for each input tensor. Dynamic dimensions (if supported by the model) will be represented as -1.

Returns: Vector of shape vectors, where each shape vector contains the dimensions of one input tensor

◆ getOutputShapes()

const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getOutputShapes ( ) const

Retrieves the dimensional shapes of all output tensors.

Provides the shape information for each output tensor. For models with dynamic output shapes, these will reflect the shapes from the most recent inference or the default shapes if no inference has been performed.

Returns: Vector of shape vectors, where each shape vector contains the dimensions of one output tensor

◆ infer() [1/3]

void LatentRuntimeEngine::infer ( const std::vector< DLManagedTensor * > & t_input_data_vec )

Performs inference using DLPack tensor inputs.

Executes the model using the provided DLPack tensors as input. This approach enables easy integration with frameworks that support DLPack as a tensor interchange format (PyTorch, MXNet, etc.).

Parameters

t_input_data_vec Vector of DLManagedTensor pointers for each input

Exceptions

std::runtime_error If input count/shapes don't match model requirements

◆ infer() [2/3]

void LatentRuntimeEngine::infer ( const std::vector< Ort::Value > & t_input_data_vec )

Performs inference using ONNX Runtime tensor inputs.

Executes the model using native ONNX Runtime tensors. This is the most efficient approach when working directly with Ort::Value tensors.

Parameters

t_input_data_vec Vector of Ort::Value objects for each input

Exceptions

std::runtime_error If input count/shapes don't match model requirements

◆ infer() [3/3]

void LatentRuntimeEngine::infer	(	const std::vector< void * > &	t_input_data_vec,
		const std::vector< int64_t * >	shape,
		const std::string	device
	)

Performs inference using raw memory pointers and shapes.

Executes the model using raw data pointers. This approach allows for direct integration with any memory management system without requiring conversion to specific tensor formats first.

Parameters

t_input_data_vec	Vector of void pointers to input data buffers
shape	Vector of pointers to shape arrays for each input
device	Target device specifier ("CPU" or "GPU")

Exceptions

std::runtime_error If input count/shapes don't match model requirements

◆ makeORTTensor()

Ort::Value LatentRuntimeEngine::makeORTTensor	(	void *	t_input_data_vec,
		const int64_t *	shape,
		int	input_index,
		const std::string &	device
	)

Creates an ONNX Runtime tensor from raw memory.

Utility method to construct an Ort::Value tensor from raw memory and shape information. Handles proper memory placement on the specified device.

Parameters

t_input_data_vec	Pointer to raw tensor data
shape	Array of dimensions defining the tensor shape
input_index	Index into the model's input array (used to determine data type)
device	Target device ("CPU" or "GPU")

Returns: Ort::Value Created tensor with data on the specified device

Exceptions

Ort::Exception If tensor creation fails

◆ getOutput()

std::vector< DLManagedTensor * > LatentRuntimeEngine::getOutput ( )

Retrieves inference results as DLPack tensors.

Returns the model outputs from the most recent inference as DLPack tensors. Handles conversion between internal representation and DLPack format, including proper memory placement based on the configured output device.

Returns: Vector of newly created DLManagedTensor pointers (caller takes ownership)

Note: Returns a dummy tensor if no inference has been performed

◆ getOutputOrt()

std::vector< Ort::Value > LatentRuntimeEngine::getOutputOrt ( )

Retrieves and transfers ownership of inference results as ONNX Runtime tensors.

Returns and transfers ownership of the internal output tensors to the caller. After calling this method, the engine no longer holds references to these tensors.

Returns: Vector of Ort::Value output tensors (ownership transferred to caller)

Note: The engine's internal output tensor storage is cleared after this call

◆ setCPUOutput()

void LatentRuntimeEngine::setCPUOutput ( bool use_cpu )

Controls output tensor placement between device and host memory.

Configures whether output tensors should be automatically moved to CPU memory after inference. This is particularly useful when inference is performed on GPU but the results need to be accessed by the CPU for post-processing.

Parameters

use_cpu True to place outputs in CPU memory, false to keep them on the inference device

◆ isCPUOutput()

bool LatentRuntimeEngine::isCPUOutput ( )

Checks the current output tensor memory placement policy.

Returns: True if outputs are configured to be placed in CPU memory, false if they remain on the inference device

◆ getMetaValue()

std::string LatentRuntimeEngine::getMetaValue ( std::string key )

Retrieves model metadata by key.

Extracts custom metadata values stored within the ONNX model file. Common keys include "author", "version", "description", etc., but any key-value pairs stored in the model's metadata can be accessed.

Parameters

key	Metadata key to retrieve

Returns: String value associated with the key, or empty string if key doesn't exist

Functions

Detailed Description

Function Documentation

◆ LatentRuntimeEngine()

◆ ~LatentRuntimeEngine()

◆ getNumberOfInputs()

◆ getNumberOfOutputs()

◆ getInputNames()

◆ getOutputNames()

◆ getInputDTypes()

◆ getOutputDTypes()

◆ getInputShapes()

◆ getOutputShapes()

◆ infer() [1/3]

◆ infer() [2/3]

◆ infer() [3/3]

◆ makeORTTensor()

◆ getOutput()

◆ getOutputOrt()

◆ setCPUOutput()

◆ isCPUOutput()

◆ getMetaValue()