Skip to main content
Skip table of contents

Detector Recipe Step One: Evaluating and Exporting a Model

This tutorial demonstrates using the YOLOv5 large recipe. The same steps will work for other YOLOv5 sizes as well. If you want to try the YOLOv5 medium or small recipes instead, simply replace L with M or S in the recipe names and paths. The differences between the small, medium, and large recipes is the size of the architectures. 

Please note that these steps can be extremely memory intensive depending on the model and the dataset. For example, you may need 16GB of memory for the YOLOv5 large model with the MS COCO dataset and you need to ensure that other Docker containers are not competing for available resources. Refer to the Troubleshooting section at the bottom of this page.

It is a simple one-step process to use LEIP Recipes to export a pre-trained model. For this example, we are using the provided YOLOv5 large model that has been pre-trained on the MS COCO dataset. Internally, that recipe is called yolov5 and we will be selecting the “large” architecture.

Make sure you are in the /latentai directory before running the steps below. This will ensure consistency in where the compiler will look for the files later:

cd /latentai

Export the Pretrained Model

Perform the following to generate the pretrained YOLOv5 large model:

af --config-name=yolov5 model.architecture=yolov5l \
  +export.include_preprocessor=True \

Your traced model will be found at /latentai/artifacts/export/leip_yolov5l_batch1_640-640/

+export.include_preprocessor=True includes with the exported model a serialized version of the preprocessor needed to run inference with this model. You will need this preprocessor when evaluating the model in the next steps.

In a later version of the LEIP SDK, the preprocessor will be exported by default and therefore the use of this flag will be deprecated.

That is it. You can now use this exported model and proceed to compiling or optimizing the model. However, you may wish to evaluate the accuracy of the original model in the machine learning environment. We have provided the following options for testing out the model:

Evaluate the Pretrained Model on Your Host Environment

Perform the following to evaluate the pretrained YOLOv5 large model:

af --config-name=yolov5 model.architecture=yolov5l command=evaluate

The evaluate command will print a table of mAP scores and generate a metrics report located in:


Perform the following to visualize the bounding box predictions of the pretrained YOLOv5 large model:

af --config-name=yolov5 model.architecture=yolov5l command=predict

The images with bounding boxes drawn over them are now located in: /latentai/artifacts/predictions/coco-detection-80class/validation.

Next, we will compile and optimize the traced model for evaluation on the target device. We have provided instructions for adding your own data to the recipe and evaluating your model with your data if you would like to retrain the model.


If the af commands fail on a preconfigured recipe, the cause is most likely insufficient memory. If the commands fail:

  • Ensure that you have sufficient memory available in your system.

  • Ensure that other Docker containers are not competing for resources.

  • Ensure that any GPU card you are using is not in use by other processes

Use the --ipc=host option when launching the Docker container (an append to the Docker run command) to allocate the maximum amount of RAM to the container.

Follow the steps listed below to reduce the recipe’s demand on your system If your system does not have at least 32GB of RAM.

Use the --gpus all option when launching the Docker container (on multi-GPU machines to provide access to all the GPUs. Before launching a command, determine which GPUs are free by using the nvidia-smi Linux command. Then use the CUDA_VISIBLE_DEVICES= environment variable to expose which GPUs are free to utilize the af command.

Reduce Demands on Your Host Hardware:

The task settings in the recipe correlate to your hardware. You may tweak these to better adapt to your hardware resources.

  • task.batch_sizes The default batch size is [8,8] (8 samples during training and 8 samples during evaluation). The optimal value for this parameter depends on the amount of RAM you are able to allocate to the container. A batch size of 8 correlates with a requirement of at least 12GB of RAM allocated to the container. If allocating this amount of RAM is not possible, you may reduce the batch size (and therefore reduce RAM requirements) by appending task.batch_sizes=[4,4] to the commands above.

  • task.num_workers The default value is 4. The optimal value for this parameter is a bit trickier to determine, but a good place to start is using the number of CPU cores in your machine (source). If your CPU has a different number of cores, you may override the default by appending task.num_workers=8 in the commands listed above.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.