![]() |
ONNX-LRE
C++ API documentation
|
Core runtime functions for model loading and inference. More...
Functions | |
| OnnxLre::LatentRuntimeEngine::LatentRuntimeEngine (const std::string &modelPath, const Options &config=Options()) | |
| High-performance inference engine for ONNX models. More... | |
| OnnxLre::LatentRuntimeEngine::~LatentRuntimeEngine () | |
| Releases all allocated resources. More... | |
| size_t | OnnxLre::LatentRuntimeEngine::getNumberOfInputs () const |
| Returns the number of input tensors required by the model. More... | |
| size_t | OnnxLre::LatentRuntimeEngine::getNumberOfOutputs () const |
| Returns the number of output tensors produced by the model. More... | |
| const std::vector< const char * > & | OnnxLre::LatentRuntimeEngine::getInputNames () const |
| Retrieves the names of all model input nodes. More... | |
| const std::vector< const char * > & | OnnxLre::LatentRuntimeEngine::getOutputNames () const |
| Retrieves the names of all model output nodes. More... | |
| std::vector< std::string > | OnnxLre::LatentRuntimeEngine::getInputDTypes () const |
| Gets the data types of all input tensors as strings. More... | |
| std::vector< std::string > | OnnxLre::LatentRuntimeEngine::getOutputDTypes () const |
| Gets the data types of all output tensors as strings. More... | |
| const std::vector< std::vector< int64_t > > & | OnnxLre::LatentRuntimeEngine::getInputShapes () const |
| Retrieves the dimensional shapes of all input tensors. More... | |
| const std::vector< std::vector< int64_t > > & | OnnxLre::LatentRuntimeEngine::getOutputShapes () const |
| Retrieves the dimensional shapes of all output tensors. More... | |
| void | OnnxLre::LatentRuntimeEngine::infer (const std::vector< DLManagedTensor * > &t_input_data_vec) |
| Performs inference using DLPack tensor inputs. More... | |
| void | OnnxLre::LatentRuntimeEngine::infer (const std::vector< Ort::Value > &t_input_data_vec) |
| Performs inference using ONNX Runtime tensor inputs. More... | |
| void | OnnxLre::LatentRuntimeEngine::infer (const std::vector< void * > &t_input_data_vec, const std::vector< int64_t * > shape, const std::string device) |
| Performs inference using raw memory pointers and shapes. More... | |
| Ort::Value | OnnxLre::LatentRuntimeEngine::makeORTTensor (void *t_input_data_vec, const int64_t *shape, int input_index, const std::string &device) |
| Creates an ONNX Runtime tensor from raw memory. More... | |
| std::vector< DLManagedTensor * > | OnnxLre::LatentRuntimeEngine::getOutput () |
| Retrieves inference results as DLPack tensors. More... | |
| std::vector< Ort::Value > | OnnxLre::LatentRuntimeEngine::getOutputOrt () |
| Retrieves and transfers ownership of inference results as ONNX Runtime tensors. More... | |
| void | OnnxLre::LatentRuntimeEngine::setCPUOutput (bool use_cpu) |
| Controls output tensor placement between device and host memory. More... | |
| bool | OnnxLre::LatentRuntimeEngine::isCPUOutput () |
| Checks the current output tensor memory placement policy. More... | |
| std::string | OnnxLre::LatentRuntimeEngine::getMetaValue (std::string key) |
| Retrieves model metadata by key. More... | |
Core runtime functions for model loading and inference.
|
explicit |
High-performance inference engine for ONNX models.
The Latent Runtime Engine (LRE) is designed to provide a seamless and efficient interface for executing machine learning models in ONNX format. It abstracts the complexities of hardware acceleration, memory management, and tensor conversions, allowing developers to focus on model inference without worrying about the underlying implementation details.
Key features:
Example usage:
| modelPath | Path to the ONNX model file (.onnx) |
| config | Engine configuration options (defaults to CUDA with FP32 precision) |
| std::runtime_error | If model loading fails or requested hardware is unavailable |
| LatentRuntimeEngine::~LatentRuntimeEngine | ( | ) |
Releases all allocated resources.
Ensures proper cleanup of ONNX Runtime session, provider options, and any tensor data owned by the engine.
| size_t LatentRuntimeEngine::getNumberOfInputs | ( | ) | const |
Returns the number of input tensors required by the model.
| size_t LatentRuntimeEngine::getNumberOfOutputs | ( | ) | const |
Returns the number of output tensors produced by the model.
| const std::vector< const char * > & LatentRuntimeEngine::getInputNames | ( | ) | const |
Retrieves the names of all model input nodes.
These names are required to match inputs with their correct positions when using frameworks that rely on named tensor matching.
| const std::vector< const char * > & LatentRuntimeEngine::getOutputNames | ( | ) | const |
Retrieves the names of all model output nodes.
These names help identify the semantic meaning of output tensors and are necessary for frameworks using named tensor matching.
| std::vector< std::string > LatentRuntimeEngine::getInputDTypes | ( | ) | const |
Gets the data types of all input tensors as strings.
Provides the ONNX data type of each input tensor in human-readable format (e.g., "float32", "int64", "float16").
| std::vector< std::string > LatentRuntimeEngine::getOutputDTypes | ( | ) | const |
Gets the data types of all output tensors as strings.
Provides the ONNX data type of each output tensor in human-readable format (e.g., "float32", "int64", "float16").
| const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getInputShapes | ( | ) | const |
Retrieves the dimensional shapes of all input tensors.
Returns the exact shape requirements for each input tensor. Dynamic dimensions (if supported by the model) will be represented as -1.
| const std::vector< std::vector< int64_t > > & LatentRuntimeEngine::getOutputShapes | ( | ) | const |
Retrieves the dimensional shapes of all output tensors.
Provides the shape information for each output tensor. For models with dynamic output shapes, these will reflect the shapes from the most recent inference or the default shapes if no inference has been performed.
| void LatentRuntimeEngine::infer | ( | const std::vector< DLManagedTensor * > & | t_input_data_vec | ) |
Performs inference using DLPack tensor inputs.
Executes the model using the provided DLPack tensors as input. This approach enables easy integration with frameworks that support DLPack as a tensor interchange format (PyTorch, MXNet, etc.).
| t_input_data_vec | Vector of DLManagedTensor pointers for each input |
| std::runtime_error | If input count/shapes don't match model requirements |
| void LatentRuntimeEngine::infer | ( | const std::vector< Ort::Value > & | t_input_data_vec | ) |
Performs inference using ONNX Runtime tensor inputs.
Executes the model using native ONNX Runtime tensors. This is the most efficient approach when working directly with Ort::Value tensors.
| t_input_data_vec | Vector of Ort::Value objects for each input |
| std::runtime_error | If input count/shapes don't match model requirements |
| void LatentRuntimeEngine::infer | ( | const std::vector< void * > & | t_input_data_vec, |
| const std::vector< int64_t * > | shape, | ||
| const std::string | device | ||
| ) |
Performs inference using raw memory pointers and shapes.
Executes the model using raw data pointers. This approach allows for direct integration with any memory management system without requiring conversion to specific tensor formats first.
| t_input_data_vec | Vector of void pointers to input data buffers |
| shape | Vector of pointers to shape arrays for each input |
| device | Target device specifier ("CPU" or "GPU") |
| std::runtime_error | If input count/shapes don't match model requirements |
| Ort::Value LatentRuntimeEngine::makeORTTensor | ( | void * | t_input_data_vec, |
| const int64_t * | shape, | ||
| int | input_index, | ||
| const std::string & | device | ||
| ) |
Creates an ONNX Runtime tensor from raw memory.
Utility method to construct an Ort::Value tensor from raw memory and shape information. Handles proper memory placement on the specified device.
| t_input_data_vec | Pointer to raw tensor data |
| shape | Array of dimensions defining the tensor shape |
| input_index | Index into the model's input array (used to determine data type) |
| device | Target device ("CPU" or "GPU") |
| Ort::Exception | If tensor creation fails |
| std::vector< DLManagedTensor * > LatentRuntimeEngine::getOutput | ( | ) |
Retrieves inference results as DLPack tensors.
Returns the model outputs from the most recent inference as DLPack tensors. Handles conversion between internal representation and DLPack format, including proper memory placement based on the configured output device.
| std::vector< Ort::Value > LatentRuntimeEngine::getOutputOrt | ( | ) |
Retrieves and transfers ownership of inference results as ONNX Runtime tensors.
Returns and transfers ownership of the internal output tensors to the caller. After calling this method, the engine no longer holds references to these tensors.
| void LatentRuntimeEngine::setCPUOutput | ( | bool | use_cpu | ) |
Controls output tensor placement between device and host memory.
Configures whether output tensors should be automatically moved to CPU memory after inference. This is particularly useful when inference is performed on GPU but the results need to be accessed by the CPU for post-processing.
| use_cpu | True to place outputs in CPU memory, false to keep them on the inference device |
| bool LatentRuntimeEngine::isCPUOutput | ( | ) |
Checks the current output tensor memory placement policy.
| std::string LatentRuntimeEngine::getMetaValue | ( | std::string | key | ) |
Retrieves model metadata by key.
Extracts custom metadata values stored within the ONNX model file. Common keys include "author", "version", "description", etc., but any key-value pairs stored in the model's metadata can be accessed.
| key | Metadata key to retrieve |