Let's start with the most basic example: generating the pretrained YOLOv5 model we used to get the benchmarks in our recent blog post. We will skip training the model with your own data for now. The model provided has been pre-trained with the MS COCO dataset and is suitable for many applications. If you would prefer to retrain the model, see the section on adding your own data. While this tutorial demonstrates using the YOLOv5 large recipe, the same steps will work for other recipes as well. For example if you want to try the YOLOv5 medium or small recipes instead, simply replace L with M or S in the recipe names and paths. The differences between the small, medium, and large recipes is the size of the models.

Please note, these steps can be extremely memory intensive depending on the model and dataset. For example, for the YOLOv5 large model with the MS COCO dataset, you may need 16GB of memory and ensure that other Docker containers are not competing for available resources. Refer to the Troubleshooting section at the bottom of this page

Using LEIP Recipes to export a pre-trained model is a simple one-step process. In this example, we are using a machine learning model configuration file called yolov5_L_RT.yaml and using the provided YOLOv5 large model that has been pre-trained on the MS COCO dataset.

Make sure you are in the /latentai directory before running the steps below, to be consistent with where the compiler will look for the files later in this tutorial:

cd /latentai

To generate the pretrained YOLOv5 large model:

af --config-name=yolov5_L_RT command=export

Your traced model will be found at /latentai/artifacts/export/recipe_yolov5l_batch1_640-640/traced_model.pt

That is it. You can now use this exported model and proceed to compiling or optimizing the model. You may, however, wish to first evaluate the accuracy of the model in your host environment. You should also become familiar with the following two additional machine learning recipe options:

1. To evaluate the pretrained YOLOv5 large model:

af --config-name=yolov5_L_RT command=evaluate

The evaluate command will print a table of mAP scores and generate a metrics report in:


2. To visualize the bounding box predictions of the pretrained YOLOv5 large model:

af --config-name=yolov5_L_RT command=predict

The images with bounding boxes drawn over them are now under /latentai/artifacts/predictions/coco-detection/validation.

Next, we will compile and optimize the traced model for evaluation on the target device. If instead you would like to retrain the model, we have provided instructions for adding your own data to the recipe and evaluating your model with your data.


Insufficient memory is the most likely cause if the af commands fail on a preconfigured recipe. If the commands fail:

  • Ensure that you have sufficient memory available in your system.

  • Ensure that other Docker containers are not competing for resources.

  • Ensure any GPU card you are using are not in use by other processes

Use the --ipc=host option when launching the Docker container (an append to the Docker run command) to allocate the maximum amount of RAM to the container.

If your system does not have at least 12GB of RAM, follow the steps listed below to reduce the recipe’s demand on your system.

Use the --gpus all option when launching the Docker container (on multi-gpu machines, to provide access to all the gpus. Before launching a command, determine which gpus are free by using the nvidia-smi linux command. Then use the CUDA_VISIBLE_DEVICES= environment variable to expose which gpus are free to the af command.

Reduce Demands on Your Host Hardware:

The task settings in the recipe correlate to your hardware. You may tweak these to better adapt to your hardware resources.

  • task.batch_sizes The default batch size is [8,8] (8 samples during training, and 8 samples during evaluation). The optimal value for this parameter depends on the amount of RAM you are able to allocate to the container. A batch size of 8 correlates with a requirement of at least 12G of RAM allocated to the container. If allocating this amount of RAM is not possible, you may reduce the batch size (and therefore reduce RAM requirements) by appending task.batch_sizes=[4,4] to the commands above.

  • task.num_workers The default value is 4. The optimal value for this parameter is a bit trickier to determine, but a good place to start is using the number of CPU cores in your machine (source). If your CPU has a different number of cores, you may override the default by appending task.num_workers=8 to the commands above.