A git repo with Python and C++ applications is provided to make it easy to get started on the path to deploying the models you have created with LEIP recipes. You will also find instructions in the repository to run the examples, as well as instructions for installing any needed dependencies.
Collecting the Model
You will need the LEIP Runtime Environment objects you created with the pipeline build process in Step Two in order to deploy your model. These objects are stored in the directory path provided by the
--output_path flag when you ran the
leip pipeline command.
We will use the example of building a YOLOv5 Large model targeting an NVIDIA AGX device. You may have provided the output path of:
If you now look in that directory and you used the default
pipeline_aarch64_cuda.yaml build recipe, you will find the following artifacts have been created:
# The following directory will change based on your earlier pipeline command ls /latentai/workspace/output/yolov5l/aarch64_cuda Float32-compile Float32-package Int8-optimize Int8-package results.json
If you are using a Python application with your model, you will want the packaged
latentai.lre object in either the
Int8-package directory depending on whether you are building around the compiled Float32 or optimized Int8 version of the model. The packaged version of the model will include a number of Python dependencies needed to run the model.
If you are using a C++ application with your model, you will want the
modelLibrary.so LRE file that you will find in the
Testing Your Models on the Device
Perform the following steps to test your models on the target device:
modelLibrary.sofiles over to the device you want to test. You may skip this step if you are testing within the SDK docker container.
Modify the provided scripts to match your device architecture and installed model location
Run the test code using the provided instructions and scripts.
Evaluating for Speed
The C++ example applications are the best way to evaluate the timing of your model on the device. Note that some of the example applications code for pre- and post- processing has been written to be general enough to run on different devices and you may get faster total inference times by optimizing the pre- and post- processing examples using specific optimized libraries available on your platform.
The LRE binaries can now be integrated into your own application. Please contact email@example.com for assistance if you need additional help integrating the recipe models into your application.
Please refer to the instructions for Bring Your Own Data if you would like to train the recipe models with your own dataset.