LEIP Package
The goal of the LEIP Package
is to enable deployment of executables provided by the LEIP compiler. This tool provides multiple ways of deploying the compiled Neural Network (NN) model because the end user may want to use different targets and use cases.
Overview
You do not need the LEIP SDK container to run the model once the model has been compiled. The LEIP framework currently supports the following ways to perform inference on a compiled model:
Through LEIP Evaluate or LEIP Run, which have limited support for detection based models;
Through a series of Python code examples provided in the examples directory of the Docker image that hosts the LEIP framework; and
Through a C++ Wrapper or C Wrapper, which are meant for embedded design and thus produce a set of artifacts along with a Makefile that requires user additions before generating an executable.
The LEIP Package
is a Latent AI Runtime Environment (LRE) Object that can be accessed by the end user through an API. The LRE Object encapsulates a number of services, such as authentication, encryption, watermarking, and pre/post processing that enables the end user to build an application. By default, no services are added into the LRE Object. This results in a small memory footprint.
CLI Usage
The basic command is:
$ Usage: leip package [OPTIONS]
Generates a directory with all the required files needed to generate an
executable on the target device.
Options:
--loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
Log output level [default: WARNING]
--input_path TEXT The directory or file path to the model
[required]
--output_path TEXT The root output directory path for the
compiling artifacts [default:
./package_output]
--input_names LIST The comma-separated names of the input
layers of the model
--preprocessor TEXT The callback method used for preprocessing
input data when running inference.
It has three possible forms:
1) A name from
[bgrtorgb|bgrtorgb2|bgrtorgb3|bgrtorgbcaffe
|imagenet|imagenet_caffe|imagenet_torch_nchw
|mnist|mnist_int|rgbtogray|rgbtogray_int8
|rgbtogray_symm|float32|uint8|symm|norm]
2) A python function as 'package.module.func'
3) A python function as 'path/to/module.py::func'
--postprocessor TEXT The callback method used for postprocessing
output data after running inference.
It has three possible forms:
1) A name from
[top1|top5]
2) A python function as 'package.module.func'
3) A python function as 'path/to/module.py::func'
--metrics LIST from [inferences_count|latency|most_common_class]
Metrics to include in runtime library
--format [python|cc] Library's output format
--config FILE Read configuration from FILE.
--help Show this message and exit.
Format
You can specify the target language of the generated LRE Object using --format
. The options are python
and cc
. The cc
option generates a directory structure with a Makefile and libraries and includes headers and the C++ source code.
Metrics
You can specify the metrics to be collected at runtime using --metrics
. The valid options are: inferences_count
, latency
, and most_common_class
.
Pre and post processing
The pre and post processing can be added to the LRE Object using --preprocessor
and --postprocessor
respectively. By default, no pre nor post processing services are added into the LRE Object. This results in a small memory footprint.
For a list of the valid pre and post processors supported by the LEIP SDK please consult the CLI Reference for LEIP Evaluate.
Example Packaging a LRE Runtime
In this example, we will package a model and the runtime and include some metrics. Then we will run inferences on the target device.
It is assumed we have an optimized model (an output of leip optimize
) at path/to/optimized/model
.
Run the following command on the SDK container:
$ leip package --input_path path/to/optimized/model \
--format python \
--metrics inferences_count,latency,most_common_class
Latent AI Runtime for Python 3.9 created at package_output
This will create a file called latentai.lre
in the directory package_output
.
The latentai.lre
file will be transferred to the path /opt/latentai.lre
in a target device, with Python 3.9 installed.
$ export PYTHONPATH=$PYTHONPATH:/opt/latentai.lre
Now the package latentai_runtime
is available for import in Python. When importing this package, a bootstrap process will make other required dependencies available.
You can use the package model in your Python program as follows:
user@laptop:~# python3
Python 3.9.13 (main, May 18 2022, 02:11:21)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from latentai_runtime import Model
>>> model = Model()
>>> from PIL import Image
>>> model.predict([Image.open("/home/user/penguin.jpg")])
[array([[-128, -109, -127, -128, -128, -128, 44, -127, -66, -128]],
dtype=int8)]
It is important to import latentai_runtime
before any other statement in order for the bootstrap process to take place.
Following these steps in the next sections will allow the user to run inferences on the target device.
Getting Metrics
In the previous example, after running multiple inferences we may want to read the metrics we define at leip package
time. This can be done as follows:
>>> import json
>>> print(json.dumps(model.get_metrics(), indent=4))
{
"inferences_count": 4,
"latency": {
"inferences": 4,
"last_inference_seconds": 0.10162597999988066,
"average_inferences_per_second": 0.0993410665000738
},
"most_common_class": "2"
}
We can see that there is a key per each metric specified at package time with its corresponding value.
These metrics will be accumulated on each inference (model.predict
call) on a Model
instance.
Getting Metadata
We may obtain the model’s metadata by calling model.get_metadata()
and it will output the following:
>>> print(json.dumps(model.get_metadata(), indent=4))
{
"model_schema": {
"inference_context": "cpu",
"input_names": [
"input_1"
],
"output_names": [],
"input_shapes": [],
"remove_nodes": [],
"dataset": {
"public_dataset": "custom",
"type": "leip"
},
"custom_objects": null,
"crc": null,
"metadata": {
"name": "mobilenetv2",
"variant": "keras-open-images-10-classes",
"full_name": "Mobilenet V2",
"description": "Mobilenet V2 is an image classification model that implements depth-wise convolutions within the network in an effort to optimize latency on mobile devices. MobilenetV2 is architecturally similar to V1, but has been further optimized to reduce latency on mobile devices.",
"type": "Image Classification",
"source": "https://github.com/latentai/model-zoo-models/tree/master/mobilenetv2",
"tags": [
"turing"
]
},
"preprocessor": "symm",
"postprocessor": null,
"preprocessor_config": null,
"output_format": "classifier"
},
"runtime_parameters": {
"metrics": [
"inferences_count",
"latency",
"most_common_class"
]
}
}
Python Bootstrapping on Import
A file called latentai.lre
is created during leip package
time. This is a ZIP file that includes:
The model files:
modelLibrary.so
andmodel_schema.json
;Required python dependencies;
Required .so files for Python 3.9; and
libtvm_runtime.so
for the CPU.
The latentai_runtime
package then performs the following when imported into Python:
Creates the directory
.latentai/LRE
at the user’s home directory.Extracts the content of the
latentai.lre
file in the~/.latentai/LRE
directory.Adds the newly created directory to the Python dependencies.