Skip to main content
Skip table of contents

LEIP Package

The goal of the LEIP Package is to enable deployment of executables provided by the LEIP compiler. This tool provides multiple ways of deploying the compiled Neural Network (NN) model because the end user may want to use different targets and use cases.

Overview

You do not need the LEIP SDK container to run the model once the model has been compiled. The LEIP framework currently supports the following ways to perform inference on a compiled model:

  • Through LEIP Evaluate or LEIP Run, which have limited support for detection based models;

  • Through a series of Python code examples provided in the examples directory of the Docker image that hosts the LEIP framework; and

  • Through a C++ Wrapper or C Wrapper, which are meant for embedded design and thus produce a set of artifacts along with a Makefile that requires user additions before generating an executable.

The LEIP Package is a Latent AI Runtime Environment (LRE) Object that can be accessed by the end user through an API. The LRE Object encapsulates a number of services, such as authentication, encryption, watermarking, and pre/post processing that enables the end user to build an application. By default, no services are added into the LRE Object. This results in a small memory footprint.

CLI Usage

The basic command is:

BASH
$ Usage: leip package [OPTIONS]

  Generates a directory with all the required files needed to generate an
  executable on the target device.

Options:
  --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Log output level  [default: WARNING]
  --input_path TEXT               The directory or file path to the model
                                  [required]

  --output_path TEXT              The root output directory path for the
                                  compiling artifacts  [default:
                                  ./package_output]

  --input_names LIST              The comma-separated names of the input
                                  layers of the model

  --preprocessor TEXT             The callback method used for preprocessing
                                  input data when running inference.
                                  It has three possible forms:
                                  1) A name from
                                  [bgrtorgb|bgrtorgb2|bgrtorgb3|bgrtorgbcaffe
                                  |imagenet|imagenet_caffe|imagenet_torch_nchw
                                  |mnist|mnist_int|rgbtogray|rgbtogray_int8
                                  |rgbtogray_symm|float32|uint8|symm|norm]
                                  2) A python function as 'package.module.func'
                                  3) A python function as 'path/to/module.py::func'

  --postprocessor TEXT            The callback method used for postprocessing
                                  output data after running inference.
                                  It has three possible forms:
                                  1) A name from
                                  [top1|top5]
                                  2) A python function as 'package.module.func'
                                  3) A python function as 'path/to/module.py::func'

  --metrics LIST from [inferences_count|latency|most_common_class]
                                  Metrics to include in runtime library
  --format [python|cc]            Library's output format
  --config FILE                   Read configuration from FILE.
  --help                          Show this message and exit.

Format

You can specify the target language of the generated LRE Object using --format. The options are python and cc. The cc option generates a directory structure with a Makefile and libraries and includes headers and the C++ source code.

Metrics

You can specify the metrics to be collected at runtime using --metrics. The valid options are: inferences_count, latency, and most_common_class.

Pre and post processing

The pre and post processing can be added to the LRE Object using --preprocessor and --postprocessor respectively. By default, no pre nor post processing services are added into the LRE Object. This results in a small memory footprint.

For a list of the valid pre and post processors supported by the LEIP SDK please consult the CLI Reference for LEIP Evaluate.

Example Packaging a LRE Runtime

In this example, we will package a model and the runtime and include some metrics. Then we will run inferences on the target device.

It is assumed we have an optimized model (an output of leip optimize) at path/to/optimized/model.

Run the following command on the SDK container:

CODE
$ leip package --input_path path/to/optimized/model \
               --format python \
               --metrics inferences_count,latency,most_common_class
Latent AI Runtime for Python 3.9 created at package_output

This will create a file called latentai.lre in the directory package_output.

The latentai.lre file will be transferred to the path /opt/latentai.lre in a target device, with Python 3.9 installed.

CODE
$ export PYTHONPATH=$PYTHONPATH:/opt/latentai.lre

Now the package latentai_runtime is available for import in Python. When importing this package, a bootstrap process will make other required dependencies available.

You can use the package model in your Python program as follows:

BASH
user@laptop:~# python3
Python 3.9.13 (main, May 18 2022, 02:11:21) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from latentai_runtime import Model
>>> model = Model()
>>> from PIL import Image
>>> model.predict([Image.open("/home/user/penguin.jpg")])
[array([[-128, -109, -127, -128, -128, -128,   44, -127,  -66, -128]],
      dtype=int8)]

It is important to import latentai_runtime before any other statement in order for the bootstrap process to take place.

Following these steps in the next sections will allow the user to run inferences on the target device.

Getting Metrics

In the previous example, after running multiple inferences we may want to read the metrics we define at leip package time. This can be done as follows:

BASH
>>> import json
>>> print(json.dumps(model.get_metrics(), indent=4))
{
    "inferences_count": 4,
    "latency": {
        "inferences": 4,
        "last_inference_seconds": 0.10162597999988066,
        "average_inferences_per_second": 0.0993410665000738
    },
    "most_common_class": "2"
}

We can see that there is a key per each metric specified at package time with its corresponding value.

These metrics will be accumulated on each inference (model.predict call) on a Model instance.

Getting Metadata

We may obtain the model’s metadata by calling model.get_metadata() and it will output the following:

CODE
>>> print(json.dumps(model.get_metadata(), indent=4))
{
    "model_schema": {
        "inference_context": "cpu",
        "input_names": [
            "input_1"
        ],
        "output_names": [],
        "input_shapes": [],
        "remove_nodes": [],
        "dataset": {
            "public_dataset": "custom",
            "type": "leip"
        },
        "custom_objects": null,
        "crc": null,
        "metadata": {
            "name": "mobilenetv2",
            "variant": "keras-open-images-10-classes",
            "full_name": "Mobilenet V2",
            "description": "Mobilenet V2 is an image classification model that implements depth-wise convolutions within the network in an effort to optimize latency on mobile devices. MobilenetV2 is architecturally similar to V1, but has been further optimized to reduce latency on mobile devices.",
            "type": "Image Classification",
            "source": "https://github.com/latentai/model-zoo-models/tree/master/mobilenetv2",
            "tags": [
                "turing"
            ]
        },
        "preprocessor": "symm",
        "postprocessor": null,
        "preprocessor_config": null,
        "output_format": "classifier"
    },
    "runtime_parameters": {
        "metrics": [
            "inferences_count",
            "latency",
            "most_common_class"
        ]
    }
}

Python Bootstrapping on Import

A file called latentai.lre is created during leip package time. This is a ZIP file that includes:

  • The model files: modelLibrary.so and model_schema.json;

  • Required python dependencies;

  • Required .so files for Python 3.9; and

  • libtvm_runtime.so for the CPU.

The latentai_runtime package then performs the following when imported into Python:

  • Creates the directory .latentai/LRE at the user’s home directory.

  • Extracts the content of the latentai.lre file in the ~/.latentai/LRE directory.

  • Adds the newly created directory to the Python dependencies.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.