LEIP Compile takes a computational graph (e.g. quantized model from quantization phase of LEIP Optimize) as input and produces a binary representation (e.g. LRE Object) based on the target specified by the user. The binary LRE Object is in the form of a shared object file which can be loaded into a small runtime for its execution.

The runtime is created through a python script that can perform any pre or post processing of the data as well as include any other components of the application. It is also possible to bundle the runtime along with the neural network model as a single binary.

LEIP Compile can perform several optimizations by manipulating the Compute Graph that represents the neural network. However, exposing the entire search space of optimizations can be computationally expensive and thus the default is to only perform standard optimizations.

Although LEIP Compile is capable of generating binaries for multiple processors, the ones fully supported at this time are those based on the x86, NVIDIA and ARM architectures.

The compiler is capable of generating binaries that support 32-bit floating point, 8-bit integer, and mixed types. The compiler will match the best data type depending on the hardware capabilities of the target architecture.

CLI Usage

The basic command is:

$ leip compile --input_path path/to/model \
               --output_path path/to/output

LEIP Optimize also compiles models with no additional options required for that.

For a detailed explanation of each option see the CLI Reference for LEIP Compile or LEIP Optimize.


You can specify the desired layout (NCHW or NHWC) using --layout, but only for non CUDA targets. The default value is NCHW.


You can use the --target parameter to specify desired target hardware codegen, which defaults to llvm. For CPU targets you specify the architecture through the -mcpu=architecture or -device=architecture flags. Because we use llvm as the low level optimizing compiler, you can pass any other architecture specific flags supported by llvm. Although only the x86 and ARM architectures have been tested by the LEIP tool flow, any other architectures supported by llvm can also be targeted.

Optimized compilation is possible for hardware based accelerators using the parameter as --target family[:model]. Currently cuda family is supported.

As an example, this is how a model is compiled and optimized for NVIDIA 2080 ti:

$ leip compile --input_path path/to/model \
               --output_path path/to/output \
               --target cuda:2080ti

The following example shows how to target an x86 SkyLake:

$ leip compile --input_path path/to/model \
               --output_path path/to/output \
               --target llvm -mcpu=skylake

Finally, this example shows how to target the ARM processor on a RaspBerry Pi 4 system:

$ leip compile --input_path path/to/model \
               --output_path path/to/output \
               --target llvm -device=arm_cpu -model=bcm2837 -mtriple=aarch64-linux-gnu -mattr=+neon -mcpu=cortex-a72


You can specify different kinds of optimization using the --optimization (or --compile_optimization in leip optimize) parameter more than once. The following optimizations are supported:



--optimization category:kernel,level:<1-4>

Specifies a level of kernel optimization between 1 and 4 (the higher the better). The default is 3. Please note that a layout conversion will not be possible for levels below 3.

--optimization category:cuda,enabled:true|false

Specifies the cuda optimization for cuda targets only. By default, it’s enabled for cuda targets, and disabled otherwise.

--optimization category:graph,iterations:<0-30000>

Specifies how many iterations to use for the graph optimization algorithm. This can be used as follows:

  • Not specifying it: This is the default and has the shortest compilation time, but does not perform extensive code optimizations.

  • Giving a positive number of iterations (first time): This will create a log file with json extension at output_path/optimization_artifacts subfolder, run that number of iterations, save the results in that file, and use the best found schedule.

  • Giving a positive number of iterations (after a previous graph optimization): This will load the log file with json extension at output_path/optimization_artifacts subfolder from a previous optimization, and run the specified number of additional iterations.

  • Giving 0 as the number of iterations (after a previous graph optimization): This will load the log file with json extension at output_path/optimization_artifacts subfolder from a previous optimization, and use it to determine the best schedule, without running any additional iterations.

Full example:

$ leip compile --input_path path/to/model \
               --output_path path/to/output \
               --optimization category:kernel,level:4 \
               --optimization category:graph,iterations:2000

Running Inference

When calling inference on a hardware accelerator optimized model, --inference_context family parameter needs to be set.