![]() |
ONNX-LRE
C++ API documentation
|
The LatentRuntimeEngine class provides a C++ interface to load and run ONNX models using ONNX Runtime. More...
#include <onnx_lre.hpp>
Public Member Functions | |
| LatentRuntimeEngine (const std::string &modelPath, const Options &config=Options()) | |
| High-performance inference engine for ONNX models. | |
| ~LatentRuntimeEngine () | |
| Releases all allocated resources. | |
| size_t | getNumberOfInputs () const |
| Returns the number of input tensors required by the model. | |
| size_t | getNumberOfOutputs () const |
| Returns the number of output tensors produced by the model. | |
| const std::vector< const char * > & | getInputNames () const |
| Retrieves the names of all model input nodes. | |
| const std::vector< const char * > & | getOutputNames () const |
| Retrieves the names of all model output nodes. | |
| std::vector< std::string > | getInputDTypes () const |
| Gets the data types of all input tensors as strings. | |
| std::vector< std::string > | getOutputDTypes () const |
| Gets the data types of all output tensors as strings. | |
| const std::vector< std::vector< int64_t > > & | getInputShapes () const |
| Retrieves the dimensional shapes of all input tensors. | |
| const std::vector< std::vector< int64_t > > & | getOutputShapes () const |
| Retrieves the dimensional shapes of all output tensors. | |
| std::string | getExecutionProvider () const |
| Returns the currently active execution provider. | |
| void | infer (const std::vector< DLManagedTensor * > &t_input_data_vec) |
| Performs inference using DLPack tensor inputs. | |
| void | infer (const std::vector< Ort::Value > &t_input_data_vec) |
| Performs inference using ONNX Runtime tensor inputs. | |
| void | infer (const std::vector< void * > &t_input_data_vec, const std::vector< int64_t * > shape, const std::string device) |
| Performs inference using raw memory pointers and shapes. | |
| Ort::Value | makeORTTensor (void *t_input_data_vec, const int64_t *shape, int input_index, const std::string &device) |
| Creates an ONNX Runtime tensor from raw memory. | |
| std::vector< DLManagedTensor * > | getOutput () |
| Retrieves inference results as DLPack tensors. | |
| std::vector< Ort::Value > | getOutputOrt () |
| Retrieves and transfers ownership of inference results as ONNX Runtime tensors. | |
| void | setCPUOutput (bool use_cpu) |
| Controls output tensor placement between device and host memory. | |
| bool | isCPUOutput () |
| Checks the current output tensor memory placement policy. | |
| std::string | getMetaValue (std::string key) |
| Retrieves model metadata by key. | |
| std::string | getVersion () |
| Retrieves the ONNX LRE library version at runtime. | |
Private Member Functions | |
| void | autoSelectExecutionProvider (ExecutionProvider currentProvider) |
| Auto-selects the most appropriate execution provider based on system capabilities. | |
| void | initLRE (std::vector< unsigned char > model) |
| Initializes the model for inference. | |
| void | configureTensorRTProvider () |
| Configures TensorRT provider options. | |
| void | configureCUDAProvider () |
| Configures TensorRT provider options. | |
| void | allocateIO (bool onlyOutput) |
| void | hasDynamicInputsOutputs (Ort::Session &session) |
| void | fetchInputNodeInfo () |
| Fetches and stores input node information. | |
| void | fetchOutputNodeInfo () |
| Fetches and stores output node information. | |
| double | getAverageInferenceTimeMs () const |
| std::string | generateModelInit () |
| std::string | generateModelInferenceEvent () |
Private Attributes | |
| Options | config |
| Ort::Env | env |
| ONNX Runtime environment. | |
| Ort::SessionOptions | sessionOptions |
| Session options for ONNX Runtime. | |
| Ort::Session | session {nullptr} |
| The ONNX Runtime session for model inference. | |
| Ort::IoBinding | io_binding {nullptr} |
| std::string | model_path |
| Path to the ONNX model file. | |
| bool | isModelLoaded = false |
| Flag indicating if the model is successfully loaded. | |
| bool | gpuInput = false |
| Flag indicating if the input tensors shpuld be on GPU (true for CUDA and TensorrRT) | |
| bool | gpuOutput = false |
| Flag indicating if the output tensors would be on GPU (true for CUDA and TensorrRT) | |
| bool | graphQuantized = false |
| Flag indicating if the model is quantized. | |
| bool | dynamicGraph = false |
| Flag indicating if the model is dynamic (condition and loop) | |
| bool | dynamicInputs = false |
| Flag indicating if the model has dynamic input shape. | |
| bool | dynamicOutputs = false |
| Flag indicating if the model has dynamic output shape. | |
| bool | enableProfiling_ = false |
| Flag to enable profiling for the engine. | |
| Ort::MemoryInfo | cpu_memory_info {nullptr} |
| Ort::MemoryInfo | cuda_memory_info {nullptr} |
| OrtTensorRTProviderOptionsV2 * | tensorrt_options = nullptr |
| TensorRT provider options. | |
| OrtCUDAProviderOptionsV2 * | cuda_options = nullptr |
| CUDA Provider options. | |
| Ort::ModelMetadata | metadata {nullptr} |
| size_t | number_inputs = 0 |
| size_t | number_outputs = 0 |
| Count of input and output nodes. | |
| std::vector< const char * > | input_names |
| std::vector< const char * > | output_names |
| Names of input and output nodes. | |
| std::vector< ONNXTensorElementDataType > | input_dtypes |
| std::vector< ONNXTensorElementDataType > | output_dtypes |
| Data types of input and output nodes. | |
| std::vector< std::vector< int64_t > > | input_shapes |
| std::vector< std::vector< int64_t > > | output_shapes |
| Shapes of input and output nodes. | |
| std::vector< size_t > | input_tensors_dtype_bytes |
| std::vector< size_t > | output_tensors_dtype_bytes |
| Ort::AllocatorWithDefaultOptions | allocator |
| Allocator for ONNX Runtime. | |
| OrtAllocator * | cpu_device_allocator = nullptr |
| OrtAllocator * | cuda_device_allocator = nullptr |
| std::vector< Ort::Value > | input_tensors |
| std::vector< Ort::Value > | output_tensors |
| ExecutionProvider | executionProvider |
| Precision | precision |
| std::string | sys_info_dump |
| std::string | model_context_uuid |
| Ort::Value | dummy_tensor {nullptr} |
| std::string | tempDirectoryPath |
| LeipCommClient | comm_ |
| LicenseWithKey | license_ |
| std::vector< double > | infer_durations = std::vector<double>(INFER_HISTORY_SIZE, 0.0) |
| size_t | infer_index = 0 |
| size_t | infer_count = 0 |
| size_t | total_infer_count = 0 |
| std::chrono::_V2::system_clock::time_point | start_time |
| std::chrono::_V2::system_clock::time_point | end_time |
Static Private Attributes | |
| static constexpr size_t | INFER_HISTORY_SIZE = 10 |
The LatentRuntimeEngine class provides a C++ interface to load and run ONNX models using ONNX Runtime.
This class abstracts the details of model loading, device configuration, and inference execution. It allows users to easily run ONNX models on various hardware with different configurations and precision settings.
|
private |
Auto-selects the most appropriate execution provider based on system capabilities.
| currentProvider | The currently set execution provider |
|
private |
Initializes the model for inference.
|
private |
Configures TensorRT provider options.
|
private |
Configures TensorRT provider options.
|
private |
|
private |
|
private |
Fetches and stores input node information.
|
private |
Fetches and stores output node information.
|
private |
|
private |
|
private |
|
private |
|
private |
ONNX Runtime environment.
|
private |
Session options for ONNX Runtime.
|
private |
The ONNX Runtime session for model inference.
|
private |
|
private |
Path to the ONNX model file.
|
private |
Flag indicating if the model is successfully loaded.
|
private |
Flag indicating if the input tensors shpuld be on GPU (true for CUDA and TensorrRT)
|
private |
Flag indicating if the output tensors would be on GPU (true for CUDA and TensorrRT)
|
private |
Flag indicating if the model is quantized.
|
private |
Flag indicating if the model is dynamic (condition and loop)
|
private |
Flag indicating if the model has dynamic input shape.
|
private |
Flag indicating if the model has dynamic output shape.
|
private |
Flag to enable profiling for the engine.
|
private |
|
private |
|
private |
TensorRT provider options.
|
private |
CUDA Provider options.
|
private |
|
private |
|
private |
Count of input and output nodes.
|
private |
|
private |
Names of input and output nodes.
|
private |
|
private |
Data types of input and output nodes.
|
private |
|
private |
Shapes of input and output nodes.
|
private |
|
private |
|
private |
Allocator for ONNX Runtime.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
staticconstexprprivate |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |