Skip to main content
Skip table of contents

Classifier Recipe Step Two: Compile, Optimize, and Package

A LEIP Classifier Recipe was used to train and export a traced model in Step One. The second step of a recipe compiles and compresses an exported model to produce a binary artifact for the model optimized to the architecture of your target device. The LEIP pipeline also packages the model into a Latent AI Runtime Environment Object. You can then use this artifact for evaluation on your target device.

Locate the model you exported as part of Step One. You will find the model in a subdirectory under /latentai/artifacts/export. If you used the default classifier recipe with the timm:gernet_m backbone, the model that you trained and exported will be found at:

/latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt

This is the full path that will be used as part of the leip pipeline command in the following examples.

The next steps will depend on the hardware target you will be using for evaluations. We recommend first targeting the localhost and evaluating inside the SDK Docker container. The leip pipeline command is used to compile and optimize your model using a pipeline configuration file matching your hardware target. You may need to modify this file depending on your hardware profile. The default pipeline files are in the /latentai/recipes/classifiers/ directory:

Target architecture

Pipeline Configuration File

x86_64

pipeline_x86_64.yaml

x86_64 with NVIDIA GPU

pipeline_x86_64_cuda.yaml

Arm72 (e.g. Raspberry Pi 4)

pipeline_aarch64.yaml

Arm72 with NVIDIA GPU (e.g. AGX, NX)

pipeline_aarch64_cuda.yaml

Pay particular attention to the line target: cuda if you are targeting a device with an NVIDIA GPU. You may need to append a specific target architecture -arch flag depending on your GPU, or you may find that a target architecture flag provides better optimization. If you leave off the architecture flag, the resulting binaries will be less optimized but should work with most NVIDIA GPU hardware.

Some compiled models will not run with the wrong -arch flag while other models may run, but less optimally. Note that the ARM/GPU pipeline file has -arch=sm_72 by default, which is optimizing for the NVIDIA Xavier AGX and NX. If you are targeting a different GPU, you will need to edit the file and replace each instance of -arch=sm_xx with the correct code. You may find this blog post helpful if you are unsure of which code to use.

You will find it helpful to set different target directories for each permutation if you plan on evaluating several models on different devices. We recommend adding the target in the output path such as in the following example (where we use x86_64, x86_64_cuda, etc.) for the different output paths. We will use timm:gernet_m in our example as we did in Step One:

CODE
# Store the location of the exported model
export CMODEL=/latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt

# Compile / Optimize for x86 (no GPU)
leip pipeline \
  --input_path $CMODEL \
  --output_path workspace/output/timm-gernet_m/x86_64 \
  --config_path recipes/classifiers/pipeline_x86_64.yaml
  
# Compile / Optimize for x86 with CUDA
leip pipeline \
  --input_path $CMODEL \
  --output_path workspace/output/timm-gernet_m/x86_64_cuda \
  --config_path recipes/classifiers/pipeline_x86_64_cuda.yaml
  
# Compile / Optimize for ARM (no GPU)
leip pipeline \
  --input_path $CMODEL \
  --output_path workspace/output/timm-gernet_m/aarch64 \
  --config_path recipes/classifiers/pipeline_aarch64.yaml
  
# Compile / Optimize for ARM with CUDA
leip pipeline \
  --input_path $CMODEL \
  --output_path workspace/output/timm-gernet_m/aarch64_cuda \
  --config_path recipes/classifiers/pipeline_aarch64_cuda.yaml

Look in the directory specified in the output_path shown above after you have run the pipeline. You will find several subdirectories have been created:

  • Float32-compile: The model compiled for Float32

  • Float32-package: The packaged LRE object for the Float32 compiled model

  • INT8-optimize: The model optimized and compiled for INT8

  • INT8-package: The packaged LRE object for the INT8 optimized model

Refer to the SDK documentation for LEIP Package to learn more about the LRE object.

In Step Three, you will evaluate your optimized model on the target device.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.