Advanced Application Framework Options
The Latent AI Machine Learning Application Framework (AF) is a modular framework that enables users to solve machine learning problems by bringing their own data and using those datasets to quickly train and evaluate different models to select the best performing model that meets their design requirements. Models exported from AF can be optimized, compiled and evaluated on target edge hardware to verify criteria are met.
When used with LEIP Recipes, AF default configurations are provided that are designed to provide good performance to a broad set of applications. Depending on your dataset, you may need to change the defaults, such as changing the input shapes. You may also wish to alter parameters to explore different learning rates, or to trade off accuracy for fast training, or vice-versa.
AF builds on top of many component technologies, including Hydra and Pytorch Lightning, giving users configurable access to many underlying components.
We recommend you start with LEIP Recipes to gain experience with AF and a number of available models that not only work with AF out-of-the-box, but are also guaranteed to compile, optimize, and run on many different hardware platforms. If you would like to experiment with different parameters to find more optimal settings for your dataset, the following will give you an introduction to the underlying capabilities. If you have more specific needs or different models you would like to use in this modular fashion, please contact us at Latent AI.
AF supports a set of commands/modes, and each is configurable with various aspects of the ML process:
train: Train a model
evaluate: Evaluate a trained model
predict: Visualize and summarize the predictions of a trained model
vizdata: Visualize the input data to verify that it is correctly ingested
export: Export a trained model for further processing in the LEIP SDK (i.e., compile, optimize, etc.).
Each mode can be set as shown here (evaluate mode as example):
af [...] command=evaluate
Historically, the default mode (i.e., if no
command is specified) is
train. This behavior is deprecated and will be removed in future versions of AF.
Available Detection Models
See the list of supported detector models for the config and architecture names.
af --config-name=<recipe_name> model.architecture=<architecture_name> command=<command_name>
Example: Export a Yolov5 Small model
af --config-name=yolov5 model.architecture=yolov5s command=export
How Do I?
Train a Model With My Data: BYOD
BYOD instructions differ depending on the type of recipe. Instructions are available for both Classifier and Detector models.
Specify a Trained Checkpoint for Further Processing (Evaluate, Visualize Predictions, Export)
The training process will generate checkpoints of the best models as they are being trained. They end up in the
artifacts/ folder (e.g., if your current folder is
Use the following syntax in order to specify such an existing checkpoint for continuing training, exporting, evaluating or visualizing:
af [...] +checkpoint=<absolute qualified path of .ckpt file>
The pathname has to be absolute and not relative to the current working directory.
If you would like to change the export file, for example to simplify scripting for automatic test and integration, you can use the following option:
Export a Model
To export a pre-trained model (e.g., the YOLOv5 pre-trained on MSCOCO), call it with the same configuration used for training and add
af [...] command=export
Notice that if you do not provide a
+checkpoint=</absolute/path/to.ckpt>, the AF knows to pull in the pretrained weights of the model by default.
Export a Trained Model
To export a newly trained model, locate the checkpoint that you would like to export and call it with the same configuration used for training:
af [...] command=export +checkpoint=<absolute qualified path of .ckpt file>
The default location for the exported
.pt file will be:
Evaluate with a Trained Model Checkpoint
Locate the checkpoint that you would like to evaluate and call it with the same configuration used for training. Call with
af [...] command=evaluate +checkpoint=<absolute qualified path of .ckpt file>
The AF will predict and run evaluation metrics over the entire validation set. At the end, you will see a metrics report, which will also be exported to
By default, the recipe you select will define what evaluation protocol to use.
The following protocols are available: for detection models:
Evaluate using MS COCO protocol:
Evaluate the mAP for 0.5:0.95 with step 0.05:
Evaluate using Pascal protocol, computes the AP for each class:
Generate Precision versus Recall curves for each class:
Visualize Predictions with a Trained Model Checkpoint
Locate the checkpoint that you would like to predict with and visualize and call it with the same configuration used for training. Call with
af [...] command=predict +checkpoint=<absolute qualified path of .ckpt file>
Additional Options for command=predict
When visualizing detector model predictions, it can be useful to set a minimum confidence for bounding boxes that we want to display:
af [...] command=predict +checkpoint=<path> predict.annotation_renderers.bbox.confidence_threshold=0.1
Show Ground Truth
The default behavior is to display the ground truth and the predictions side to side. Users can optionally choose not to include the ground truth next to the predictions:
af [...] command=predict +checkpoint=<path> predict.annotation_renderers.bbox.show_gt=False
When visualizing the ground truth next to the predictions, it can be useful to highlight false positives and false negatives:
af [...] command=predict +checkpoint=<path> predict.annotation_renderers.bbox.show_errors=True
Note that the visualizer will consider an error any prediction with an IoU < 0.5 with any ground truth box.
Change the Learning Rate
Use the following command to change the learning rate:
af [...] model.module.optimizer.lr=0.1
Change the Classifier
Use the following command to change the classifier:
af [...] model.module.backbone=timm:visformer_small
Change the Processing Resolution
Use the following command to change the processing resolution:
af [...] task.width=384 task.height=384
Change the Batch Size
Use the following command to change the batch size:
af [...] task.bs_train=16 task.bs_val=64
Change the Optimizer
Use the following command to change the optimizer:
af [...] model.module.optimizer=timm.adamw
Add ML Metrics Logging -- Tensorboard
Use the following command to add Tensorboard to the ML metrics logging:
af [...] +loggers@loggers=tensorboard
The logs will be stored in the configured experiment output folder,
./outputs by default.
Add ML Metrics Logging -- Neptune.AI
Use the following command to add Neptune.AI the ML metric logging:
af [...] +loggers@loggers=neptune loggers.neptune.project="<your_neptune_project_id>"
Note: You have to provide your Neptune credentials in
NEPTUNE_API_TOKEN, refer to https://docs.neptune.ai/getting-started/installation#authentication-neptune-api-token. The logs will be stored in your Neptune project.
Change the Learning Rate Scheduler
There is a change on command line (this is a group of values) so the syntax differs:
af [...] model/module/scheduler=OneCycle
Increase the Number of Training Epochs
Use the following command to increase the number of trains epochs:
af [...] trainer.max_epochs=42
Limit the Training Time
Use the following command to limit to 2 hours and 42 minutes of total training time:
af [...] trainer.max_time="00:02:42:00"
Change the Display and Log Metrics
For the classifiers:
# add one or more metrics +email@example.com=[AUROC,AveragePrecision] # override to one or more metrics model/module/metrics=[Accuracy,AUROC,AveragePrecision]
Train with Multiple GPUs on One Machine
Enter the following commands to train multiple GPUs on one machine.
# use all available gpus af [...] trainer.devices=-1 # use first and third available gpus af [...] trainer.devices=[0,2] # use two gpus af [...] trainer.devices=2
Get More Debug Output in the Console
Enter the following command to receiving more debugging output:
af [...] hydra.verbose=[af]
When Does Training Stop?
Training generally stops when either of the following conditions are met:
An early termination callback is enabled and its conditions are met, for example EarlyStopping based on val_loss_epoch.
The user hits Ctrl-C.
How to Configure
Change on command line:
af [...] model.module.optimizer.moniker=timm.adamw
Change in YAML:
model: module: optimizer: moniker: timm.adamw
torch:Adadelta torch:Adagrad torch:Adam torch:AdamW torch:SparseAdam torch:Adamax torch:ASGD torch:LBFGS torch:NAdam torch:RAdam torch:RMSprop torch:Rprop torch:SGD timm:sgd timm:nesterov timm:momentum timm:sgdp timm:adam timm:adamw timm:adamp timm:nadam timm:radam timm:adamax timm:adabelief timm:radabelief timm:adadelta timm:adagrad timm:adafactor timm:lamb timm:lambc timm:larc timm:lars timm:nlarc timm:nlars timm:madgrad timm:madgradw timm:novograd timm:nvnovograd timm:rmsprop timm:rmsproptf
How to Configure
The schedulers can be configured either via the command line OR by modifying the recipe YAML file directly.
Note: internally schedulers are groups of values, so the syntax for command line and YAML file changes is different than changes to single values.
Change the scheduler on command line:
af [...] model/module/scheduler=OneCycle
ExponentialDecay ReduceOnPlateau OneCycle OneCycleAnnealed
Starting LR (eg. 0.01)
Decay rate (eg 0.95)
Any other parameter of torch scheduler
Max epoch for LR scaling (e.g. 42)
Starting LR (e.g. 0.1)
Ending LR at last epoch (e.g. 0.0001)
Any other parameter of torch scheduler
Max epoch for LR scheduling (e.g. 42)
Max LR (e.g. 0.1)
Any parameter of torch scheduler
Two phase LR: (1) OneCycle → (2) constant LR annealing
# of epochs for initial OneCycle
factor of initial LR to anneal
Max LR (e.g., 0.1)
Any parameter of torch sequential scheduler
Any parameter of first phase OneCycle scheduler
Any parameter of second phase constant scheduler
Reduces the learning rate when a metric has stopped improving.
Sets the learning rate of each parameter group using a cosine annealing schedule.