Detector Recipe Step One: Exporting and Evaluating a Model
We will start with the most basic example: generating the pretrained YOLOv5 model we used to get the benchmarks in our recent blog post. We also provide several EfficientDet and MobileNet SSD detector recipes. We recommend following the YOLOv5 example first to get familiar with the process.
We will skip training the model with your own data for now. The model provided has been pre-trained with the MS COCO dataset and is suitable for many applications. Refer to the section on adding your own data if you would prefer to retrain the model. While this tutorial demonstrates using the YOLOv5 large recipe, the same steps will work for other YOLOv5 sizes as well. For example, if you want to try the YOLOv5 medium or small recipes instead, simply replace
S in the recipe names and paths. The differences between the small, medium, and large recipes is the size of the architectures.
Please note, these steps can be extremely memory intensive depending on the model and the dataset. For example, you may need 16GB of memory for the YOLOv5 large model with the MS COCO dataset and you need to ensure that other Docker containers are not competing for available resources. Refer to the Troubleshooting section at the bottom of this page.
It is a simple one-step process to use LEIP Recipes to export a pre-trained model. For this example, we are using the provided YOLOv5 large model that has been pre-trained on the MS COCO dataset. Internally, that recipe is called
yolov5 and we will be selecting the “large” architecture.
Make sure you are in the
/latentai directory before running the steps below. This will ensure consistency in where the compiler will look for the files later:
Export the Pretrained Model
Perform the following to generate the pretrained YOLOv5 large model:
af --config-name=yolov5 model.architecture=yolov5l command=export
Your traced model will be found at
That is it. You can now use this exported model and proceed to compiling or optimizing the model. However, you may wish to evaluate the accuracy of the original model in the machine learning environment. We have provided the following options for testing out the model:
Evaluate the Pretrained Model on Your Host Environment
Perform the following to evaluate the pretrained YOLOv5 large model:
af --config-name=yolov5 model.architecture=yolov5l command=evaluate
The evaluate command will print a table of mAP scores and generate a metrics report located in:
Perform the following to visualize the bounding box predictions of the pretrained YOLOv5 large model:
af --config-name=yolov5 model.architecture=yolov5l command=predict
The images with bounding boxes drawn over them are now located in:
Next, we will compile and optimize the traced model for evaluation on the target device. We have provided instructions for adding your own data to the recipe and evaluating your model with your data if you would like to retrain the model.
af commands fail on a preconfigured recipe, the cause is most likely insufficient memory. If the commands fail:
Ensure that you have sufficient memory available in your system.
Ensure that other Docker containers are not competing for resources.
Ensure any GPU card you are using are not in use by other processes
--ipc=host option when launching the Docker container (an append to the Docker run command) to allocate the maximum amount of RAM to the container.
Follow the steps listed below to reduce the recipe’s demand on your system If your system does not have at least 12GB of RAM.
--gpus all option when launching the Docker container (on multi-gpu machines, to provide access to all the gpus. Before launching a command, determine which gpus are free by using the
nvidia-smi linux command. Then use the
CUDA_VISIBLE_DEVICES= environment variable to expose which GPUs are free to utilize the
Reduce Demands on Your Host Hardware:
The task settings in the recipe correlate to your hardware. You may tweak these to better adapt to your hardware resources.
task.batch_sizesThe default batch size is
[8,8](8 samples during training and 8 samples during evaluation). The optimal value for this parameter depends on the amount of RAM you are able to allocate to the container. A batch size of 8 correlates with a requirement of at least 12GB of RAM allocated to the container. If allocating this amount of RAM is not possible, you may reduce the batch size (and therefore reduce RAM requirements) by appending
task.batch_sizes=[4,4]to the commands above.
task.num_workersThe default value is
4. The optimal value for this parameter is a bit trickier to determine, but a good place to start is using the number of CPU cores in your machine (source). If your CPU has a different number of cores, you may override the default by appending
task.num_workers=8in the commands listed above.