There are several possible next steps you can take now that you have completed an end-to-end workflow for training, exporting, optimizing, compiling, packaging, and evaluating the timm:gernet_m classifier recipe. These steps are explained in the following sections.

Explore Different Classifier Backbones

One key benefit of adopting LEIP recipes is you can choose from a selection of models that are guaranteed to compile and optimize for a variety of target devices. You try out different models on your hardware to see which ones best fit your speed and size requirements. From there you can train a few of those selected models to see which one gives you the best accuracy. If you have a heterogeneous hardware environment, you can select the best model for your needs knowing that it is easy to compile and optimize that model across many different architectures. In our initial classifier recipe release, we have released twenty-two backbones that work across four flavors of hardware, providing more than eighty-eight classifier recipe variants. To try out another model, simply go back to Step One and repeat the steps by replacing timm:gernet_m with another backbone. Note that while training with Open Images will provide you with a good sense of the speed performance of a model, the small size of the dataset does not lend itself to a good metric for accuracy. We have arranged our list of backbones from fastest at the top to slowest at the bottom. Accuracy will generally increase as you go down the list, but bear in mind the accuracy results will vary by dataset.

Bring Your Own Data

You only need to set up your dataset once to evaluate any of the models, or sweep against all of them. For the classifier recipes, we support a simple, straightforward image-folder format, and provide instructions for preparing your data to train and evaluate with LEIP recipes.

Tweak the Machine Learning Recipe

The machine learning configuration for a recipe is meant to be a starting point. You may find that the default settings allow you to train a model accurately with your data. Or you may find that you want to adjust the learning parameters to suit your purpose. You may wish, for example, to experiment with different schedulers or learning rates to improve the accuracy of your model. Alternatively, you may wish to change settings to trade off accuracy for faster training. We will provide a quick example here that will allow you to speed up the model training of the classifier recipe. You may find this useful if you want to evaluate a number of backbones for speed on your target hardware before turning your focus to getting the most accuracy out of the chosen model. If you want to optimize for accuracy, see the available advanced AF options.

Lets alter the classifier-recipe by overwriting some default parameters. nWe will adjust the following parameters for faster training:

Pytorch Lighting Parameter







default (norm)





We override these values to create a fast training run. We will do this by passing the following settings as part of the af command line argument:

The default classifier-recipe does not set trainer.gradient_clip_algorithm, so we need to prepend a + to add the parameter. In most cases, you will be overriding settings, so you should not use +.

af --config-name classifier-recipe \
  model.module.backbone=timm:efficientnet_em \
  trainer.gradient_clip_val=0.1 \
  +trainer.gradient_clip_algorithm="value" \
  trainer.max_epochs=4 \

For your convenience, we have provided a second configuration called classifier-fast for fast training using the above settings. You can perform the same training as above by using this configuration:

af --config-name classifier-fast \
  model.module.backbone=timm:efficientnet_em \

The important take-away is that the provided recipes are a starting point, and advanced users can modify the recipes to find parameters that better meet their requirements, be it higher accuracy or faster training times. For more information on modifying recipes, see the Advanced AF documentation.

Tweak the Build Recipe

Quantization Options

We provide default build recipes for common ARM and x86 targets, both with and without Nvidia GPU support. The GPU pipelines target both Float32 and Int8 with per-channel quantization. The default CPU pipelines provide Float32, but the Int8 default is per-tensor quantization. The per-tensor default allows for faster optimization, and is supported across all of the provided Classifer Recipes. For some of the models, you may find significant CPU-only accuracy improvement by optimizing with per-channel quantization. As an example, if you want to try symmetric per-channel quantization targeting x86, you can add this to /latentai/recipes/classifier-recipe/pipeline_x86_64.yaml

  - name: Int8pc
      path: "$input_path"
      input_shapes: [ [ 1, 3, 224, 224 ] ]
      preprocessor: imagenet_torch_nchw
      postprocessor: top1
        data_type: int8
        rep_dataset: /latentai/recipes/classifiers/rep_dataset.txt
        quantize_input: false
        quantize_output: false
        quantizer: symmetricpc
        target: llvm
  - name: Int8pc
      path: $TASK_OUTPUT{Int8pc:optimize}
      format: python3.6

Note the name Int8pc in the above example. By adding this to the pipeline with a different name, the optimizer will provide the per-tensor optimized output in the Int8 subdirectory, with the per-channel optimized output in the Int8pc subdirectory, allowing you to easily compare the results. In the leip evaluate instructions in Step Three, simply replace Int8 with Int8pc to evaluate the resulting per-channel output. For more information on these options, see the SDK documentation for LEIP Optimize.

Target Hardware Optimizations

Some provided build recipes are optimized for certain hardware targets. If you are targeting alternative hardware, you may need to change the compiler flags. In some cases, incorrect compiler flags will cause sub-optimal performance. In other cases, incorrect compiler flags will prevent the target system from running the compiled models. The default build recipe targets are listed below:

Build Recipe




Intel Skylake processor



Intel Skylake processor

Generic cuda (no sm_xx flag)


ARM7 Cortex (Raspberry Pi)



ARM8 Carmel (Xavier AGX/NX)

Volta (-arch=sm_72)

These default pipeline configuration files can be updated to match your hardware target by modifying the target: or target_host: fields. Refer to the LEIP SDK documentation for more information about the compiler settings and the leip pipeline configuration files.

Add Additional Pipeline Steps

As you look to further automate your build process, you can add additional tasks to the provided build recipes. You may wish to add the leip evaluate step to the pipeline file so that a single leip pipeline command completes the optimize, compile, package, and evaluate. For information on building your own build recipe, refer to the documentation on the LEIP Pipeline.