Using PyLRE to Deploy your Optimized model¶
You have optimized your trained model using LEIP Optimize, and now you want to deploy it in a target environment. This tutorial provides a quick start guide for loading an optimized artifact, creating an LRE instance, and performing inference.
Runtime Setup¶
We need two components to execute a model on your target:
- a target-compatible and model-compatible runtime (LRE)
- a target-compatible model or model library (optimized output)
from pylre import LatentRuntimeEngine as LRE
import numpy as np
optimized_artifact_path = "path/to/optimized_model.onnx" # or "path/to/optimized_model/modelLibrary.so"
lre = LRE(optimized_artifact_path)
Create a random tensor to test inference¶
To verify that the LRE instantiation is working correctly, we can feed it a randomly generated input tensor. This model expects a single input, so we'll first create and assign a single tensor to our input variable.
shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)
With this input data tensor, we can run an inference on the model LRE instantiation we created.
output = lre(input)
This output is in a device-independent format. You may want to convert it to a format that's easier to work with for postprocessing. We'll use NumPy in this example, but depending on your application and hardware, other formats may be more suitable.
numpy_output = np.from_dlpack(output[0])
Verify expected output shape¶
expected_output_shape = lre.output_shapes[0]
assert numpy_output.shape == expected_output_shape