The Latent AI Efficient Inference Platform (LEIP) is a modular, fully-integrated workflow designed to harmonize the end-to-end workspace between AI scientists and embedded software engineers. The LEIP software development kit (SDK) enables developers to train, quantize, and deploy efficient deep neural networks. Its modular architecture enables the platform to be expanded to incorporate new functionality to meet the current and future needs of an evolving edge AI market. LEIP is made up of the service modules described below.
LEIP Core Modules
All-in-one optimizer for models from several supported frameworks in the form of a Python API and CLI familiar to AI scientists and software developers. It consists of two internal phases:
A tool that is used for automating the process of doing Quantization Guided Training (QGT) on a model. This is intended for first-time to moderate level users of the LEIP SDK. More advanced users might want details on using the Python API for QGT directly.
Allows you to execute one or more LEIP tasks sequentially, operating on one or more models, and where the input of each task may be the output of a previous task.
A broad collection of pre-trained models used for a range of applications from audio to computer vision that you can use to test the LEIP SDK. Documentation and models are available to learn more about how the LEIP SDK optimizes neural networks for size and performance to handle inference workloads on edge devices.
The LEIP SDK supports an end-to-end development workflow. From your set of pre-trained neural network models, LEIP Optimize generates an optimized model, in the form of a Latent AI Runtime Environment (LRE) object, that is quantized to your desired bit-precision and contains executable code native to the target hardware processor.
LEIP Optimize (or LEIP Compile) generates an LRE object that is optimized for a target hardware. The LRE object is a standalone executable binary or linkable object in the processor native binaries. The LEIP SDK is highly flexible to generate different variants of the LRE object. Each variant comes with a different level of optimization complexity to offer range of compute and memory efficiencies. The main LRE object variants are: (a) parameters and computation in floating point, normally used as a baseline for evaluation; and (b) parameters and computation are all integers.
The resulting binary artifacts can be incorporated into an end-user application after a neural network model is compiled. Latent AI provides C/C++ and Python API examples that include pre-processing of the inputs before they are fed into the neural network binary artifact and/or post-processing of the outputs from the binary artifact. The end-user can add/modify these examples to suit their particular needs. Additionally, the C/C++ API examples include a Makefile that will produce an executable for the target device that can be compiled to produce an executable. The Python API example can be transferred to the target device, along with the binary artifacts produced by the compiler, for their execution.
LEIP Optimize (or LEIP Compile) provides a number of ancillary artifacts that can support the deployment of the LRE object. These artifacts includes metadata (files in JSON) that can provide details such as timestamps, tool versions, and security keys that can be optionally used for model management during deployment.