Detector Recipe Step Two: Compile and Optimize
In Step One, LEIP Recipes were used to export a pre-trained, traced model. Step Two compiles and compresses this model in the same Docker container to produce a binary artifact for the model optimized to the architecture of your target device. The pipeline also packages the model into a Latent AI Runtime Environment Object. You can then use this artifact for evaluation on your target device.
Step Two of the recipe uses the LEIP Pipeline command with a predefined build configuration provided in /latentai/recipes
to ensure the best compilation results.
Locate the model you exported as part of Step One. You will find the model in a sub-directory under /latentai/artifacts/export
. If you used the large size as in our example, the model that you exported will be found at:
/latentai/artifacts/export/leip_yolov5l_batch1_640-640/traced_model.pt
This is the full path that will be used as part of the leip pipeline
command in the following examples.
The next steps will depend on the hardware target you would like to evaluate on. We recommend first targeting the localhost and evaluating inside the SDK docker container. The leip pipeline
command is used to compile and optimize your model using a pipeline configuration file matching your hardware target. You may need to modify this file, depending on your hardware profile. The default pipeline files are in the /latentai/recipes/yolov5/
directory:
Target architecture | Pipeline Configuration File |
---|---|
x86_64 | pipeline_x86_64.yaml |
x86_64 with NVIDIA GPU | pipeline_x86_64_cuda.yaml |
Arm72 (e.g., Raspberry Pi 4) | pipeline_aarch64.yaml |
Arm72 with NVIDIA GPU (e.g., AGX, NX) | pipeline_aarch64_cuda.yaml |
Pay particular attention to the line target: cuda
if you are targeting a device with an NVIDIA GPU. You may need to append a specific target architecture -arch
flag depending on your target GPU, or you may find that a target architecture flag provides better optimization. If you leave off the architecture flag, the resulting binaries will be less optimized but should work with most NVIDIA GPU hardware.
Some compiled models will not run with the wrong -arch flag
while other models may run, but less optimally. Note that the ARM/GPU pipeline file has -arch=sm_72
by default, which is optimizing for the NVIDIA Xavier AGX and NX. If you are targeting a different GPU, you will need to edit the file and replace each instance of -arch=sm_xx
with the correct code. You may find this blog post helpful if you are unsure of which code to use.
You will find it helpful to set different target directories for each permutation if you plan on evaluating several models on different devices. The convention that we recommend is to add the target in the output path, such as in the following example, where we use x86_64, x86_64_cuda, etc., for the different output paths. We will use yolov5l
in our example:
# Store the location of the exported model
export MODEL=/latentai/artifacts/export/leip_yolov5l_batch1_640-640/traced_model.pt
export MNAME=yolov5l
# Compile / Optimize for x86 (no GPU)
leip pipeline \
--input_path $MODEL \
--output_path workspace/output/$MNAME/x86_64 \
--config_path /latentai/recipes/yolov5/pipeline_x86_64.yaml
# Compile / Optimize for x86 with CUDA
leip pipeline \
--input_path $MODEL \
--output_path workspace/output/$MNAME/x86_64_cuda \
--config_path /latentai/recipes/yolov5/pipeline_x86_64_cuda.yaml
# Compile / Optimize for ARM (no GPU)
leip pipeline \
--input_path $MODEL \
--output_path workspace/output/$MNAME/aarch64 \
--config_path /latentai/recipes/yolov5/pipeline_aarch64.yaml
# Compile / Optimize for ARM with CUDA
leip pipeline \
--input_path $MODEL \
--output_path workspace/output/$MNAME/aarch64_cuda \
--config_path /latentai/recipes/yolov5/pipeline_aarch64_cuda.yaml
Look in the directory specified in the output_path
above after you have run the pipeline. You will find several subdirectories have been created:
Float32-compile: The model compiled for Float32
Float32-package: The packaged LRE object for the Float32 compiled model
Int8-optimize: The model optimized and compiled for Int8
Int8-package: The packaged LRE object for the Int8 optimized model
Refer to the SDK documentation for LEIP Package to learn more about the LRE object.
Next you will evaluate your optimized model on the target device in Step Three.