LEIP Compile
LEIP Compile takes a computational graph (for example, quantized model from quantization phase of LEIP Optimize) as input and produces a binary representation (for example, LRE Object) based on the target specified by the user. The binary LRE Object is in the form of a shared object file that can be loaded into a small runtime for its execution.
The runtime is created through a Python script that can perform any pre or post processing of the data as well as include any other components of the application. It is also possible to bundle the runtime along with the neural network model as a single binary.
LEIP Compile can perform several optimizations by manipulating the Compute Graph that represents the neural network. However, exposing the entire search space of optimizations can be computationally expensive and thus the default is to only perform standard optimizations.
Although LEIP Compile is capable of generating binaries for multiple processors, the ones fully supported at this time are those based on the x86, NVIDIA, and ARM architectures.
The compiler is capable of generating binaries that support 32-bit floating point, 8-bit integer, and mixed types. The compiler will match the best data type depending on the hardware capabilities of the target architecture.
CLI Usage
The basic command is:
$ leip compile --input_path path/to/model \
--output_path path/to/output
LEIP Optimize also compiles models with no additional options required for that. For a detailed explanation of each option, refer to the CLI References for LEIP Compile or LEIP Optimize.
Layout
You can specify the desired layout (NCHW or NHWC) using --layout
, but only for non CUDA targets. The default value is NCHW
.
Target
You can use the --target
parameter to specify the desired target hardware CodeGen, and this defaults to llvm
. You can specify the architecture through the -mcpu=architecture
or -device=architecture
flags for CPU targets. You can pass any other architecture specific flags supported by llvm
because we use llvm
as the low level optimizing compiler. Although only the x86 and ARM architectures have been tested by the LEIP tool flow, any other architectures supported by llvm
can also be targeted.
Optimized compilation is possible for hardware based accelerators using the parameter as --target family[:model]
. Currently cuda
family is supported.
As an example, this is how a model is compiled and optimized for NVIDIA 2080 Ti:
$ leip compile --input_path path/to/model \
--output_path path/to/output \
--target cuda:2080ti
The following example shows how to target an x86 Skylake:
$ leip compile --input_path path/to/model \
--output_path path/to/output \
--target llvm -mcpu=skylake
Finally, this example shows how to target the ARM processor on a Raspberry Pi 4 system:
$ leip compile --input_path path/to/model \
--output_path path/to/output \
--target llvm -device=arm_cpu -model=bcm2837 -mtriple=aarch64-linux-gnu -mattr=+neon -mcpu=cortex-a72
Optimization
You can specify different kinds of optimization using the --optimization
(or --compile_optimization
in leip optimize
) parameter more than once. The following optimizations are supported:
Optimization | Description |
---|---|
| Specifies a level of kernel optimization between 1 and 4 (the higher the better). The default is 3. Please note that a layout conversion will not be possible for levels below 3. |
| Specifies the CUDA optimization for CUDA targets only. By default, it is enabled for CUDA targets and disabled otherwise. |
| Specifies how many iterations to use for the graph optimization algorithm. This can be used as follows:
|
Full example:
$ leip compile --input_path path/to/model \
--output_path path/to/output \
--optimization category:kernel,level:4 \
--optimization category:graph,iterations:2000
Running Inference
The --inference_context family
parameter needs to be set when calling inference on a hardware accelerator optimized model.