Using PyLRE to Deploy your Compiled modelLibrary.so¶
You have compiled a model with LEIP Optimize, and now you want to deploy it in a target environment. The tutorial provides step-by-step instructions for loading an optimized artifact, creating an LRE instance, and performing inference
Runtime Setup¶
We need two components to execute a model on your target:
- target-compatible and model-compatible runtime (LRE)
- target-compatible model or model library (optimized output)
import pylre
from pylre import LatentRuntimeEngine as LRE
import numpy as np
Using LEIP Optimize, get your optimized artifact. This tutorial assumes the model is compiled for float32 and for CPU target. For more information, consult the LEIP Optimize tutorial for quantizing and compiling a model.
optimized_artifact_path = "path/to/modelLibrary.so"
pylre_options = pylre.TVMOptions(precision="float32")
lre = LRE(optimized_artifact_path, options=pylre_options)
With this LRE object, we can introspect on the model we have optimized:
lre.get_metadata()
Creating a random tensor to do inference¶
As the model expects only one input, we pick the first one to create a random input tensor:
shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)
With this input data tensor, we can run an inference on the model LRE instantiation we created:
output = lre(input)
This output is in a device-independent target. But you may want to convert it into a more amenable format for postprocessing. We will use NumPy for this, but depending on your application and hardware usage, you may want to explore other formats.
numpy_output = np.from_dlpack(output[0])
Verifying expected output shape¶
expected_output_shape = lre.output_shapes[0]
assert numpy_output.shape == expected_output_shape