Skip to main content
Skip table of contents

Detector Recipe Step Two: Compile and Optimize

In Step One, LEIP Recipes were used to export a pre-trained, traced model. Step Two compiles and compresses this model in the same Docker container to produce a binary artifact for the model optimized to the architecture of your target device. The pipeline also packages the model into a Latent AI Runtime Environment Object. You can then use this artifact for evaluation on your target device.

Step Two of the recipe uses the LEIP Pipeline command with a predefined build configuration provided in /latentai/recipes to ensure the best compilation results.

Locate the model you exported as part of Step One. You will find the model in a sub-directory under /latentai/artifacts/export. If you used the large size as in our example, the model that you exported will be found at:

/latentai/artifacts/export/leip_yolov5l_batch1_640-640/traced_model.pt

This is the full path that will be used as part of the leip pipeline command in the following examples.

The next steps will depend on the hardware target you would like to evaluate on. We recommend first targeting the localhost and evaluating inside the SDK docker container. The leip pipeline command is used to compile and optimize your model using a pipeline configuration file matching your hardware target. You may need to modify this file, depending on your hardware profile. The default pipeline files are in the /latentai/recipes/yolov5/ directory:

Target architecture

Pipeline Configuration File

x86_64

pipeline_x86_64.yaml

x86_64 with NVIDIA GPU

pipeline_x86_64_cuda.yaml

Arm72 (e.g., Raspberry Pi 4)

pipeline_aarch64.yaml

Arm72 with NVIDIA GPU (e.g., AGX, NX)

pipeline_aarch64_cuda.yaml

Pay particular attention to the line target: cuda if you are targeting a device with an NVIDIA GPU. You may need to append a specific target architecture -arch flag depending on your target GPU, or you may find that a target architecture flag provides better optimization. If you leave off the architecture flag, the resulting binaries will be less optimized but should work with most NVIDIA GPU hardware.

Some compiled models will not run with the wrong -arch flag while other models may run, but less optimally. Note that the ARM/GPU pipeline file has -arch=sm_72 by default, which is optimizing for the NVIDIA Xavier AGX and NX. If you are targeting a different GPU, you will need to edit the file and replace each instance of -arch=sm_xx with the correct code. You may find this blog post helpful if you are unsure of which code to use.

You will find it helpful to set different target directories for each permutation if you plan on evaluating several models on different devices. The convention that we recommend is to add the target in the output path, such as in the following example, where we use x86_64, x86_64_cuda, etc., for the different output paths. We will use yolov5l in our example:

CODE
# Store the location of the exported model
export MODEL=/latentai/artifacts/export/leip_yolov5l_batch1_640-640/traced_model.pt
export MNAME=yolov5l

# Compile / Optimize for x86 (no GPU)
leip pipeline \
  --input_path $MODEL \
  --output_path workspace/output/$MNAME/x86_64 \
  --config_path /latentai/recipes/yolov5/pipeline_x86_64.yaml

# Compile / Optimize for x86 with CUDA
leip pipeline \
  --input_path $MODEL \
  --output_path workspace/output/$MNAME/x86_64_cuda \
  --config_path /latentai/recipes/yolov5/pipeline_x86_64_cuda.yaml

# Compile / Optimize for ARM (no GPU)
leip pipeline \
  --input_path $MODEL \
  --output_path workspace/output/$MNAME/aarch64 \
  --config_path /latentai/recipes/yolov5/pipeline_aarch64.yaml

# Compile / Optimize for ARM with CUDA
leip pipeline \
  --input_path $MODEL \
  --output_path workspace/output/$MNAME/aarch64_cuda \
  --config_path /latentai/recipes/yolov5/pipeline_aarch64_cuda.yaml

Look in the directory specified in the output_path above after you have run the pipeline. You will find several subdirectories have been created:

  • Float32-compile: The model compiled for Float32

  • Float32-package: The packaged LRE object for the Float32 compiled model

  • Int8-optimize: The model optimized and compiled for Int8

  • Int8-package: The packaged LRE object for the Int8 optimized model

Refer to the SDK documentation for LEIP Package to learn more about the LRE object.

Next you will evaluate your optimized model on the target device in Step Three.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.