Classifier Recipe Step Two: Compile, Optimize, and Package
In Step One, a LEIP Classifier Recipe was used to train and export a traced model. The second step of a recipe compiles and compresses an exported model to produce a binary artifact for the model optimized to the architecture of your target device. The pipeline also packages the model into a Latent AI Runtime Environment Object. You can then use this artifact for evaluation on your target device.
Locate the model you exported as part of Step One. You will find the model in a sub-directory under
/latentai/artifacts/export. If you used the default classifier recipe with the
timm:gernet_m backbone, the model that you trained and exported will be found at:
This is the full path that will be used as part of the
leip pipeline command in the following examples.
The next steps will depend on the hardware target you would like to evaluate on. We recommend first targeting the localhost and evaluating inside the SDK docker container. The
leip pipeline command is used to compile and optimize your model using a pipeline configuration file matching your hardware target. You may need to modify this file, depending on your hardware profile. The default pipeline files are in the
Pipeline Configuration File
x86_64 with Nvidia GPU
Arm72 (e.g. Raspberry Pi 4)
Arm72 with Nvidia GPU (e.g. AGX, NX)
If you are targeting a device with an Nvidia GPU, pay particular attention to the line:
target: cuda. Depending on your target GPU, you may need to append a specific target architecture
-arch flag, or you may find that a target architecture flag provides better optimization. If you leave off the architecture flag, the resulting binaries will be less optimized but should work with most Nvidia GPU hardware.
Some compiled models will not run with the wrong
-arch flag while other models may run, but less optimally. Note that the ARM/GPU pipeline file has
-arch=sm_72 by default, which is optimizing for the Nvidia Xavier AGX and NX. If you are targeting a different GPU, you will need to edit the file and replace each instance of
-arch=sm_xx with the correct code. If you are unsure of which code to use, you may find this blog post helpful.
If you plan on evaluating several models on different devices, you will find it helpful to set different target directories for each permutation. The convention that we recommend is to add the target in the output path, such as in the following example, where we use x86_64, x86_64_cuda, etc. for the different output paths. We will use
timm:gernet_m in our example as we did in Step One:
# Store the location of the exported model export CMODEL=/latentai/artifacts/export/leip_classifier_batch1_224-224/leip_classifier_timm-gernet_m_1x224x224x10.pt # Compile / Optimize for x86 (no GPU) leip pipeline \ --input_path $CMODEL \ --output_path workspace/output/timm-gernet_m/x86_64 \ --config_path recipes/classifiers/pipeline_x86_64.yaml # Compile / Optimize for x86 with CUDA leip pipeline \ --input_path $CMODEL \ --output_path workspace/output/timm-gernet_m/x86_64_cuda \ --config_path recipes/classifiers/pipeline_x86_64_cuda.yaml # Compile / Optimize for ARM (no GPU) leip pipeline \ --input_path $CMODEL \ --output_path workspace/output/timm-gernet_m/aarch64 \ --config_path recipes/classifiers/pipeline_aarch64.yaml # Compile / Optimize for ARM with CUDA leip pipeline \ --input_path $CMODEL \ --output_path workspace/output/timm-gernet_m/aarch64_cuda \ --config_path recipes/classifiers/pipeline_aarch64_cuda.yaml
After you have run the pipeline, look in the directory specified in the
output_path above, you will find several subdirectories have been created:
Float32-compile: The model compiled for Float32
Float32-package: The packaged LRE object for the Float32 compiled model
Int8-optimize: The model optimized and compiled for Int8
Int8-package: The packaged LRE object for the Int8 optimized model
To learn more about the LRE object, see the SDK documentation for LEIP Package.
In Step Three, you will evaluate your optimized model on the target device.