Classifier Recipes: Bring Your Own Data (BYOD)
Bring Your Own Classifier Data
With the steps outlined earlier in this tutorial, you will be able to get an idea of which classifier backbones will meet your speed requirements by evaluating their speed on the target hardware. Once you are ready to train the models with your own dataset, you will need to make sure your data is in the supported format for training and evaluation following the steps below.
Formatting Classifier Data
Classifier recipes support a single data format that is simple and straightforward. This data format is based on the PyTorch ImageFolder.
Understanding the ImageFolder Format
The open-images-10 dataset was used in the previous steps of the tutorial. It may be helpful to look at the file structure of that dataset which was installed in the SDK at:/latentai/workspace/datasets/open-images-10-classes
The directory tree structure that Application Framework (AF) expects is as follows:
path/to/mydataset/
|---train
|---class_names.txt
|---index.txt
|---Class1
|---image1.jpeg
|---image2.jpeg
...
|---Class2
|---image1.jpeg
|---image2.jpeg
...
...
|---eval
|---class_names.txt
|---index.txt
|---Class1
|---image1.jpeg
|---image2.jpeg
...
|---Class2
|---image1.jpeg
|---image2.jpeg
...
...
The training and evaluation datasets are put in separate sub-directories, and both of these directories have the same format. For each class, place a directory in train
and eval
with the name of the class. Using open-images-10
as an example, there are directories for Apple
, Carnivore
, Chicken
, etc. Within train/Apple
and eval/Apple
are the images of apples for training and evaluating respectively.
In addition to the directories containing the images, you will need to create an additional file named index.txt
which has the following format:
Apple/06e47f3aa0036947.jpg 0
Apple/d5e81b98e4627cd9.jpg 0
...
Carnivore/4e66355715e555eb.jpg 1
Carnivore/9108585613f30153.jpg 1
...
Chicken/3a4080f26e7e9a6c.jpg 2
Chicken/bd79d206d5ba197d.jpg 2
...
Each entry in that file provides the class/directory name and the file name of an image in that directory. A class index is also provided, starting with 0. In the above example, Apple is class 0, Carnivore is class 1, etc. The order of the entries of this file are not important, as long as the class indices are correct and there is a single line entry for each image in the respective dataset (train or eval).
Ingesting your data using the Imagefolder template
A Dataset YAML file provides parameters that tell the AFs how to ingest and interpret your dataset. You will create a YAML file for your data using the template and these instructions.
Once your data has been provided, the modular nature of LEIP Recipes means that your dataset will be compatible with training existing and future recipes all other image classifiers. This will give you a simple path for trying out various models and backbones for your application in a reproducible way for identifying the best model for your needs.
The template has been provided for the imagefolder
format at the following location in the SDK.
/latentai/custom-configs/data/imagefolder-like-template.yaml
To use the template, make a copy of that file and rename it to associate it with your dataset.
cp /latentai/custom-configs/data/imagefolder-like-template.yaml \
/latentai/custom-configs/data/<your classification dataset name>.yaml
The template contains a number of fields, but the following fields will need to be changed to match your data:
nclasses
: Number of classes in your datasetroot_path
: Absolute path to the root directory of the datasetis_split
: If you provided separate training and evaluation directories as in the example above, this should be set totrue
train_split_subdir
: The directory path to the training data relative to theroot_path
. Provided when is_split is true.val_split_subdir
: The directory path to the validation data relative to theroot_path
. Provided when is_split is true.trainval_split_ratio
: Ratio of dataset to be used for training (e.g. 0.75). Used only if is_split is false.trainval_split_seed
: Random number generator seed for dividing training and validation data. Used only if is_split is false.dataset_name:
This name will be used to identify the artifacts generated with this data. Using a name without spaces or slash characters is recommended.
Leave the remaining items at their default values.
Using your BYOD Dataset with AF
Once you have formatted your data and created the dataset file, you are ready to visualize your data and train and evaluate your model on it. Lets say you followed the above instructions for a dataset of cats and dogs, and you now have a dataset file: /latentai/custom-configs/data/cats-and-dogs.yaml
.
Visualize the data
af --config-name=classifier-recipe data=cats-and-dogs command=vizdata
Using the information on the yaml file, the AF will load a few samples of the data and write the class label on it. The resulting labeled images will be stored in /latentai/artifacts/vizdata/cats-and-dogs/*
.
Train
Just like in Step One of the classifier recipe tutorial, you can train your model on your dataset by specifying what backbone you want to train. This time, pass data=cats-and-dogs
so the AF knows to train with that data:
af --config-name classifier-recipe \
data=cats-and-dogs \
model.module.backbone="timm:gernet_m" \
command=train
Understanding the command:
--config-name classifier-recipe
: selects theclassifiers-recipe
recipe, which contains good defaults for training, evaluating, visualizing and exporting classifier models.data=cats-and-dogs
: use the./data/cats-and-dogs.yaml
filemodel.module.backbone="timm:gernet_m"
: selects thegernet_m
backbone provided by thetimm
library.
Note where the checkpoint is stored at the end of the training run. The checkpoint will be stored in a path of the form: /latentai/artifacts/train/{date}_{time}_BYOD_recipe/{epoch}.ckpt
Find that filepath and store it for ease of use in the next steps:
export CHECKPOINT=<absolute path to .ckpt file>
# Example:
# export CHECKPOINT=/latentai/artifacts/train/2022-08-30_20-22-03_task_leip_classifier/epoch-6_step-1421.ckpt
Evaluate
af --config-name classifier-recipe \
data=cats-and-dogs \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=evaluate
Visualize Predictions
af --config-name classifier-recipe \
data=cats-and-dogs \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=predict
The predicted images will be in /latentai/artifacts/predictions/cats-and-dogs/*
.

Left: Ground truth sample from the open-images-10-classes validation set.
Right: Prediction generated by a trained timm:gernet_m model.
Export the Model to Use With LEIP SDK
af --config-name classifier-recipe \
data=cats-and-dogs \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=export
The traced model will be stored as: /latentai/artifacts/export/leip_classifier_batch1_224-224/leip_classifier_timm-gernet_m_1x224x224x10.pt
Advanced Settings
Change input shapes
If you wish to change the input shape, you will need to change task.width
and task.height
. Append the width and height overrides with your desired width and height on each of the commands above:
af [...] task.width=384 task.height=384
If you change the input shape, the name of the exported file will also change. For example:
leip_classifier_timm-gernet_m_1x384x384x10.pt
Adjust your SDK commands accordingly.
You will also need to adjust the input shape in the pipeline configuration files that you used in Step Two. Look for input_shape
under model:
and adjust it to your new input shape:
input_shapes: [ [ 1, 3, 384, 384 ] ]
Using your BYOD Dataset with LEIP Evaluate
When you get to Step Three with the model you have trained with your own dataset, you will need to modify the test_path
to instruct leip evaluate
where to find the index.txt
file for your validation data. For example:
--test_path=/latentai/workspace/datasets/cat-and-dogs/eval/index.txt