Classifier Recipe Step Two: Compile, Optimize, and Package
A LEIP Classifier Recipe was used to train and export a traced model in Step One. The second step of a recipe compiles and compresses an exported model to produce a binary artifact for the model optimized to the architecture of your target device. The LEIP pipeline also packages the model into a Latent AI Runtime Environment Object. You can then use this artifact for evaluation on your target device.
Locate the model you exported as part of Step One. You will find the model in a subdirectory under /latentai/artifacts/export
. If you used the default classifier recipe with the timm:gernet_m
backbone, the model that you trained and exported will be found at:
/latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt
This is the full path that will be used as part of the leip pipeline
command in the following examples.
The next steps will depend on the hardware target you will be using for evaluations. We recommend first targeting the localhost and evaluating inside the SDK Docker container. The leip pipeline
command is used to compile and optimize your model using a pipeline configuration file matching your hardware target. You may need to modify this file depending on your hardware profile. The default pipeline files are in the /latentai/recipes/classifiers/
directory:
Target architecture | Pipeline Configuration File |
---|---|
x86_64 | pipeline_x86_64.yaml |
x86_64 with NVIDIA GPU | pipeline_x86_64_cuda.yaml |
Arm72 (e.g., Raspberry Pi 4) | pipeline_aarch64.yaml |
Arm72 with NVIDIA GPU (e.g., AGX, NX) | pipeline_aarch64_cuda.yaml |
Pay particular attention to the line target: cuda
if you are targeting a device with an NVIDIA GPU. You may need to append a specific target architecture -arch
flag depending on your GPU, or you may find that a target architecture flag provides better optimization. If you leave off the architecture flag, the resulting binaries will be less optimized but should work with most NVIDIA GPU hardware.
Some compiled models will not run with the wrong -arch flag
while other models may run, but less optimally. Note that the ARM/GPU pipeline file has -arch=sm_72
by default, which is optimizing for the NVIDIA Xavier AGX and NX. If you are targeting a different GPU, you will need to edit the file and replace each instance of -arch=sm_xx
with the correct code. You may find this blog post helpful if you are unsure of which code to use.
You will find it helpful to set different target directories for each permutation if you plan on evaluating several models on different devices. We recommend adding the target in the output path such as in the following example (where we use x86_64, x86_64_cuda, etc.) for the different output paths. We will use timm:gernet_m
in our example as we did in Step One:
# Store the location of the exported model
export CMODEL=/latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt
# Compile / Optimize for x86 (no GPU)
leip pipeline \
--input_path $CMODEL \
--output_path /latentai/workspace/output/timm-gernet_m/x86_64 \
--config_path /latentai/recipes/classifiers/pipeline_x86_64.yaml
# Compile / Optimize for x86 with CUDA
leip pipeline \
--input_path $CMODEL \
--output_path /latentai/workspace/output/timm-gernet_m/x86_64_cuda \
--config_path /latentai/recipes/classifiers/pipeline_x86_64_cuda.yaml
# Compile / Optimize for ARM (no GPU)
leip pipeline \
--input_path $CMODEL \
--output_path /latentai/workspace/output/timm-gernet_m/aarch64 \
--config_path /latentai/recipes/classifiers/pipeline_aarch64.yaml
# Compile / Optimize for ARM with CUDA (Xavier Jetpack 4.6)
leip pipeline \
--input_path $CMODEL \
--output_path /latentai/workspace/output/timm-gernet_m/aarch64_cuda \
--config_path /latentai/recipes/classifiers/pipeline_aarch64_cuda_xavier_jp4.yaml
# Compile / Optimize for ARM with CUDA (Orin Jetpack 5.x)
leip pipeline \
--input_path $CMODEL \
--output_path /latentai/workspace/output/timm-gernet_m/aarch64_cuda \
--config_path /latentai/recipes/classifiers/pipeline_aarch64_cuda_orin_jp5.yaml
# If you are using Xavier with Jetpack 5, you will need to create your own
# from these examples, or contact Latent AI for assistance.
Look in the directory specified in the output_path
shown above after you have run the pipeline. You will find several subdirectories have been created:
Float32-compile: The model compiled for Float32
Float32-package: The packaged LRE object for the Float32 compiled model
INT8-optimize: The model optimized and compiled for INT8
INT8-package: The packaged LRE object for the INT8 optimized model
Refer to the SDK documentation for LEIP Package to learn more about the LRE object.
In Step Three, you will evaluate your optimized model on the target device.