Using LRE to Deploy your Model¶
You have compiled or exported a model with LEIP Optimize, and now you want to deploy it in a target environment. This tutorial walks you through basic deployment steps for any LRE object.
Runtime Setup¶
We need two components to execute a model on your target,
- target compatible and model compatible runtime (LRE)
- target compatible model or model library (optimized output)
from pylre import LatentRuntimeEngine as LRE
import numpy as np
import time
optimized_output_dir = "optimized_outputs/notebook_llvm_fp32"
We first need to create an LRE instantiation for the optimized output we have.
lre = LRE(f"{optimized_output_dir}/modelLibrary.so")
With this LRE object, we can do introspection on the model we have optimized in the host enviroment.
lre.get_metadata()
If this model has just one input, we can create a random tensor to test the model.
shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)
With this input data tensor, we can run an inference on the model LRE instantiation we created.
output = lre(input)
This output is in a device-independent target. But you may want to convert into a more amenable format for postprocessing. We will use NumPy for this, but depending on your application and hardware usage, you may want to explore other formats.
numpy_output = np.from_dlpack(output[0])
Speed measurements¶
def speed_test(lre, sample_input, iterations):
print('==== Speed Testing ====')
t_start = time.time()
for _ in range(iterations):
lre.infer(sample_input)
elapsed_time = time.time() - t_start
latency = elapsed_time / iterations
fps = iterations / elapsed_time
print()
print(f"FPS: {np.round(fps, 2)}; Latency: {np.round(latency, 2)}s")
return
speed_test(lre, input, 2)