Using PyLRE to Deploy your Optimized model¶
You have optimized your trained model using LEIP Optimize, and now you want to deploy it in a target environment. This tutorial provides a quick start guide for loading an optimized artifact, creating an LRE instance, and performing inference.
Runtime Setup¶
We need two components to execute a model on your target:
- a target-compatible and model-compatible runtime (LRE)
- a target-compatible model or model library (optimized output)
from pylre import LatentRuntimeEngine as LRE
import numpy as np
optimized_artifact_path = "path/to/optimized_model.onnx" # or "path/to/optimized_model/modelLibrary.so"
lre = LRE(optimized_artifact_path)
Create a random tensor to test inference¶
To verify that the LRE instantiation is working correctly, we can feed it a randomly generated input tensor.
def _normalize_shape(s):
"""
Normalize shape tuple by replacing dynamic dimensions (None or negative) with 1.
Args:
s: Shape tuple that may contain None or negative values for dynamic dimensions
Returns:
Tuple with all dynamic dimensions replaced with 1
"""
return tuple(1 if dim is None or dim < 0 else dim for dim in s)
shapes = lre.input_shapes
dtypes = lre.input_dtypes
input = [
np.random.random(_normalize_shape(shape)).astype(np.dtype(dtype))
for shape, dtype in zip(shapes, dtypes)
]
With this input data tensor, we can run an inference on the model LRE instantiation we created.
# Ensure outputs are on CPU since we'll be using NumPy for post-processing with from_dlpack
lre.set_cpu_output(True)
output = lre(input)
This output is in a device-independent format. You may want to convert it to a format that's easier to work with for postprocessing. We'll use NumPy in this example, but depending on your application and hardware, other formats may be more suitable.
numpy_output = [np.from_dlpack(o) for o in output]