Classifier Recipes: Spicing Up Your Recipe
There are several possible steps you can take now that you have completed an end-to-end workflow for training, exporting, optimizing, compiling, packaging, and evaluating the timm:gernet_m
classifier recipe. These steps are explained in the following sections.
Explore Different Classifier Backbones
One key benefit of adopting LEIP recipes is you can choose from a selection of models that are guaranteed to compile and optimize for a variety of target devices. You can try out different models on your hardware to see which ones best fit your speed and size requirements. From there you can train a few of those selected models to see which one gives you the best accuracy. If you have a heterogeneous hardware environment, you can select the best model for your needs knowing that it is easy to compile and optimize that model across many different architectures. We have released 22 backbones that work across 4 flavors of hardware in our initial classifier recipe release. This released provided more than 88 classifier recipe variants. To try out another model, return to Step One and repeat the steps by replacing timm:gernet_m
with another backbone. Please note that while training, Open Images will provide you with a good sense of the speed performance of a model. However, the small size of the dataset does not lend itself to a good metric for accuracy. We have arranged our list of backbones from fastest at the top to slowest at the bottom. Accuracy will generally increase as you go down the list. But be mindful the accuracy results will vary by dataset.
Bring Your Own Data
You only need to set up your dataset once to evaluate any of the models or sweep against all of them. We support a simple, straightforward ImageFolder format for the classifier recipes. We also provide instructions for preparing your data to train and evaluate with LEIP recipes.
Tweak the Machine Learning Recipe
The machine learning configuration for a recipe is meant to be a starting point. You may find that the default settings allow you to train a model accurately with your data. Or you may find that you want to adjust the learning parameters to suit your purpose. For example, you may want to experiment with different schedulers or learning rates to improve the accuracy of your model. Alternatively, you may wish to change settings to trade off accuracy for faster training. We will provide a quick example here that will allow you to speed up the model training of the classifier recipe. You may find this useful if you want to evaluate a number of backbones for speed on your target hardware before turning your focus to getting the most accuracy out of the chosen model. Refer to the available Advanced AF Options documentation if you want to optimize for accuracy.
We will now alter the classifier-recipe
by overwriting some default parameters. We will adjust the following parameters for faster training:
Pytorch Lighting Parameter | classifier-recipe | classifier-fast |
---|---|---|
gradient_clip_val | 0.5 | 0.1 |
gradient_clip_algorithm | default (norm) | value |
max_epochs | 30 | 4 |
We override these values to create a fast training run. We will do this by passing the following settings as part of the af
command line argument:
The default classifier-recipe
does not set trainer.gradient_clip_algorithm
, so we need to prepend a +
to add the parameter. In most cases, you will be overriding settings, so you should not use +
.
af --config-name classifier-recipe \
model.module.backbone=timm:efficientnet_em \
trainer.gradient_clip_val=0.1 \
+trainer.gradient_clip_algorithm="value" \
trainer.max_epochs=4 \
command=train
For your convenience, we have provided a second configuration called classifier-fast
for fast training using the above settings. You can perform the same training as above by using this configuration:
af --config-name classifier-fast \
model.module.backbone=timm:efficientnet_em \
command=train
The important takeaway is that the provided recipes are a starting point, and advanced users can modify the recipes to find parameters that better meet their requirements, be it higher accuracy or faster training times. Refer to the Advanced AF Options documentation for more information on modifying recipes.
Tweak the Build Recipe
Quantization Options
We provide default build recipes for common ARM and x86 targets, both with and without Nvidia GPU support. The GPU pipelines target both Float32 and Int8 with per-channel quantization. The default CPU pipelines provide Float32, but the Int8 default is per-tensor quantization. The per-tensor default allows for faster optimization, and is supported across all of the provided Classifier Recipes. Some of the models may significantly improve only the CPU accuracy by optimizing with per-channel quantization. For example, you can add the following to /latentai/recipes/classifier-recipe/pipeline_x86_64.yaml
if you want to try symmetric per-channel quantization targeting x86:
- name: Int8pc
model:
path: "$input_path"
input_shapes: [ [ 1, 3, 224, 224 ] ]
preprocessor: imagenet_torch_nchw
postprocessor: top1
optimize:
compress:
rep_dataset: /latentai/recipes/classifiers/rep_dataset.txt
quantize_input: false
quantize_output: false
quantizer: symmetricpc
compile:
target: llvm
- name: Int8pc
model:
path: $TASK_OUTPUT{Int8pc:optimize}
package:
format: python3.8
Note the name Int8pc
in the above example. By adding this to the pipeline with a different name, the optimizer will provide the per-tensor optimized output in the Int8 subdirectory with the per-channel optimized output in the Int8pc subdirectory. This allows you to easily compare the results. Simply replace Int8 with Int8pc in the leip evaluate
instructions detailed in Step Three to evaluate the resulting per-channel output. Refer to the SDK documentation for LEIP Optimize for more information on these options.
Target Hardware Optimizations
Some provided build recipes are optimized for certain hardware targets. You may need to change the compiler flags if you are targeting alternative hardware. In some cases, incorrect compiler flags will cause suboptimal performance. In other cases, incorrect compiler flags will prevent the target system from running the compiled models. The default build recipe targets are listed below:
Build Recipe | CPU | GPU |
---|---|---|
pipeline_x86_64.yaml | Intel Skylake processor | None |
pipeline_x86_64_cuda.yaml | Intel Skylake processor | Generic cuda (no sm_xx flag) |
pipeline_aarch64.yaml | ARM7 Cortex (Raspberry Pi) | None |
pipeline_aarch64_cuda.yaml | ARM8 Carmel (Xavier AGX/NX) | Volta (-arch=sm_72) |
These default pipeline configuration files can be updated to match your hardware target by modifying the target:
or target_host:
fields. Refer to the LEIP SDK documentation for more information about the compiler settings and the LEIP Pipeline configuration files.
Add Additional Pipeline Steps
Additional tasks can be added to the provided build recipes to further automate your build process. You may wish to add the leip evaluate
step to the pipeline file so that a single leip pipeline
command completes the optimize, compile, package, and evaluate. Refer to the LEIP Pipeline documentation for more information on building your own build recipe.