Using LRE to Deploy your Model¶

You have compiled or exported a model with LEIP Optimize, and now you want to deploy it in a target environment. This tutorial walks you through basic deployment steps for any LRE object.

Runtime Setup¶

We need two components to execute a model on your target,

target compatible and model compatible runtime (LRE)
target compatible model or model library (optimized output)

In [27]:

Copied!

from pylre import LatentRuntimeEngine as LRE
import numpy as np

import time

optimized_output_dir = "optimized_outputs/notebook_llvm_fp32"
from pylre import LatentRuntimeEngine as LRE
import numpy as np

import time

optimized_output_dir = "optimized_outputs/notebook_llvm_fp32"

We first need to create an LRE instantiation for the optimized output we have.

In [ ]:

Copied!

lre = LRE(f"{optimized_output_dir}/modelLibrary.so")
lre = LRE(f"{optimized_output_dir}/modelLibrary.so")

With this LRE object, we can do introspection on the model we have optimized in the host enviroment.

In [ ]:

Copied!

lre.get_metadata()
lre.get_metadata()

If this model has just one input, we can create a random tensor to test the model.

In [30]:

Copied!

shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)
shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)

With this input data tensor, we can run an inference on the model LRE instantiation we created.

In [31]:

Copied!

output = lre(input)
output = lre(input)

This output is in a device-independent target. But you may want to convert into a more amenable format for postprocessing. We will use NumPy for this, but depending on your application and hardware usage, you may want to explore other formats.

In [32]:

Copied!

numpy_output = np.from_dlpack(output[0])
numpy_output = np.from_dlpack(output[0])

Speed measurements¶

In [33]:

Copied!





def speed_test(lre, sample_input, iterations):

    print('==== Speed Testing ====')

    t_start = time.time()
    for _ in range(iterations):
        lre.infer(sample_input)

    elapsed_time = time.time() - t_start

    latency = elapsed_time / iterations
    fps =  iterations / elapsed_time

    print()
    print(f"FPS: {np.round(fps, 2)}; Latency: {np.round(latency, 2)}s")
    return
def speed_test(lre, sample_input, iterations):

    print('==== Speed Testing ====')

    t_start = time.time()
    for _ in range(iterations):
        lre.infer(sample_input)

    elapsed_time = time.time() - t_start

    latency = elapsed_time / iterations
    fps =  iterations / elapsed_time

    print()
    print(f"FPS: {np.round(fps, 2)}; Latency: {np.round(latency, 2)}s")
    return 

In [ ]:

Copied!

speed_test(lre, input, 2)
speed_test(lre, input, 2)