Introduction¶
This document shows the various ways that offline mode can be set up on a LEIP Design installation.
What is offline mode?¶
LEIP Design is installed locally and runs locally. However, depending on what kind of recipes you run, it may pull various artifacts via the network. These can be things such as our Golden Recipe Database (GRDB) of pre-qualified models, corresponding model code, pre-trained backbones, and so on. No matter if you use the GRDB or not, there is a good chance that you need some artifact over the network. During regular use, particularly when you're online, these are pulled on demand and automatically. They are then cached locally on your installation for future use.
When is that not enough? If you intend to run LEIP Design on a machine or installation that is not connected to the network ("offline"), you would not be able to run many of the recipes. This is where offline mode comes in.
Offline mode works as follows:
- You install LEIP Design on the machine.
- While still connected to the network, you use the below offline setup tools to decide what parts of the Recipe space you intend to have available offline.
- You can disconnect the machine from the network and run the selected Recipes
The below demonstrates different ways to define and prepare Recipes for offline mode:
- Single Recipe
- Families of models
- GRDB: By query
- GRDB: By pareto optimality
Note: You can change recipes during offline mode. However, if the change is such that it'd need an online artifact that you have not pulled before, it will fail. For instance, if you pulled the yolov8x backbone but not the yolov8s backbone, and then try to use yolov8s offline, it will fail.
As guidance, these types of components typically require an online artifact
- model families (e.g. "Yolo v5", "EfficientDet")
- pretrained model backbones ("Yolo v8 nano")
How does it work? The setup phase of offline mode basically takes your recipes and performs a minimal "pretend" run, including training. This ensures that all components that are required to run this recipe later are cached locally.
How could it fail? It is certainly possible that you miss a component during the setup phase that is later needed during offline operation (see the example of the Yolo v8 backbones above). The only cure is to put the machine online temporarily and add the proper Recipes for offline setup before going offline again.
Note: Any recipe to set up for offline mode has to be complete in the sense that you could train with it. Incomplete Recipes cannot be setup for offline mode at the moment.
How long does it take to prepare for offline use? This depends on the number of recipes you want to have available offline, but should range from seconds to minutes.
Do I have to make my data available for offline use with this mechanism? You do not, as long as you have to data available to the local installation during offline use.
First we to do some setup for LEIP Design tooling:
import leip_recipe_designer as lrd
# We need a pantry to build recipes
pantry = lrd.Pantry.build('my_pantry')
The first step in configuring offline mode is to prepare some common components for offline use. This should be called once per installation.
lrd.helpers.offline.setup(pantry)
Single Recipe¶
Individual Recipes can be made available offline via the helper function
helpers.offline.single_recipe(recipe)
# set up an empty object detector recipe
recipe = lrd.create.empty_detection_recipe(pantry = pantry)
# Fill it out with some choices.
# Note that we need to have some data defined so that the Recipe can be set up for offline mode.
# You can bring your own data into the offline environment and then use it with one of the
# BYOD data generators.
# here we are using one of the built-in data generators ("fruit detection"). It does not
# matter much what data you are using here as long as the recipe can run
recipe.fill_empty_recursively({
'model' : 'Yolo v8',
'data_generator' : 'Smoke (data/sets/url/detection/smoke-pascal-like)',
})
# set up this recipe for offline use
lrd.helpers.offline.single_recipe(recipe)
Now we can run the recipe offline:
- Take machine offline
- Add your local data to the recipe
- Run:
lrd.tasks.train(recipe)
Families of Models¶
What if we want several backbones available? This can be done by iterating over the choices. For instance, if we want all possible backbones for the Yolo v8 family:
recipe = lrd.create.empty_detection_recipe(pantry = pantry)
# start with a basic recipe (make sure to have some data)
recipe.fill_empty_recursively({
'model' : 'Yolo v8',
'data_generator' : 'Smoke (data/sets/url/detection/smoke-pascal-like)',
})
# Now we can iterate over various changes of the recipe, such as the backbone
# This should only take a few minutes to run
for backbone in recipe.options('model.architecture'):
print(f'Preparing {backbone}')
recipe['model.architecture'] = backbone
# set up this recipe for offline use
lrd.helpers.offline.single_recipe(recipe)
GRDB¶
Another way to select and prepare recipes for offline mode is to use the Latent AI Golden Recipe Database (GRDB). This is meaningful if you want to use and query the GRDB itself offline (as it is an online artifact).
Here we show two common ways to use the GRDB to prepare Recipes for offline use.
By Query¶
goldenvolumes = lrd.GoldenVolumes()
volumes = goldenvolumes.list_volumes_from_zoo()
volume = 'xval_det'
df = volumes[volume].get_golden_df()
# select a random recipe and make available offline
row = df.sample()
recipe_id = row['id'].values[0]
print(recipe_id)
lrd.helpers.offline.GRDB_recipe(recipe_id = recipe_id, grdb_volume = volume, pantry = pantry)
By Pareto Optimality¶
Let's downselect the relevant GRDB entries first. We're choosing a particular volume and target hardware first.
import matplotlib.pyplot as plt
# retrieve the Golden Recipe DB volumes
goldenvolumes = lrd.GoldenVolumes()
volumes = goldenvolumes.list_volumes_from_zoo()
# choose the cross-validated set
volume = 'xval_det'
# get the Pandas DataFrame of all Recipes and their metrics
full_df = volumes[volume].get_golden_df()
# we choose a specific hardware for this example, NVidia RTX A4500
hw = 'cuda:A4500'
# get the subset of the DataFrame for the specific hardware
df = full_df.query(f"hardware in ['{hw}']")
Now that we have downselected the target hardware, we can use helpers.offline.query_pareto_optimal_recipes to query the pareto optimal recipes and make them available for offline use.
We'll show two different methods to choose the set of "best" Recipes: exact and fuzzy. "Exact" means the set of experiments that are precisely on the empirical pareto front, i.e. no other experiment has a better local tradeoff between the two metrics (e.g. accuracy vs. speed). "Fuzzy" means that we're assigning each Recipe a value of "how optimally" it trades off the two metrics. The latter allows to explore a larger set of experiments while still being precise about the accuracy/speed tradeoff.
First, we'll retrieve and plot an example for both methods, and then make them available for offline use.
# let's choose a metric that is important for us
# in this case we select "on device inference rate in milliseconds"
metric = 'od_inf_rate_ms'
N_max_recipes = 50
# method 1: the exact, empirical Pareto front
recipe_ids_exact, pareto_exact = lrd.helpers.offline.query_pareto_optimal_recipes(df, metric_custom=metric, method = 'exact', sort_by = 'paretoness')
# method 2: the fuzzy Pareto front, we have to choose how many recipes we want to consider
recipe_ids_fuzzy, pareto_fuzzy = lrd.helpers.offline.query_pareto_optimal_recipes(df, metric_custom=metric, method = 'fuzzy', sort_by = 'paretoness', top_N_fuzzy = N_max_recipes)
plt.scatter(pareto_fuzzy['metric_custom_all'], pareto_fuzzy['metric_accuracy_all'], alpha=0.5)
plt.scatter(pareto_fuzzy['metric_custom'], pareto_fuzzy['metric_accuracy'], alpha=0.5, color='red')
plt.scatter(pareto_exact['metric_custom'], pareto_exact['metric_accuracy'], alpha=0.5, color='green')
for i, txt in enumerate(recipe_ids_fuzzy):
plt.annotate(f'{i}', (pareto_fuzzy['metric_custom'][i], pareto_fuzzy['metric_accuracy'][i]))
plt.xlabel(pareto_exact['metric_custom_name'])
plt.ylabel(pareto_exact['metric_accuracy_name'])
plt.xlim(left=0)
plt.ylim(bottom=0)
plt.title(f'Pareto Front for {hw} - {metric}, ordered by paretoness')
plt.legend(['All Recipes', f'Fuzzy Pareto (top {N_max_recipes})', 'Exact Pareto'])
plt.show()
Now that we've selected possible experiments to run offline, we can make them available. This should take a couple of minutes.
chosen_recipes = recipe_ids_fuzzy[:3]
for recipe_id in chosen_recipes:
lrd.helpers.offline.GRDB_recipe(recipe_id = recipe_id, grdb_volume = volume, pantry = pantry)