Classifier Recipes: BYOD
Bring Your Own Classifier Data
Testing the backbones in the previous step will give you an idea which classifier backbones will meet your speed requirements on the target hardware. Ensure your data is in the supported format for training and evaluation once you are ready to train the models with your own dataset.
Formatting Classifier Data
Classifier recipes support a single data format that is simple and straightforward. This data format is based on the PyTorch ImageFolder.
Understanding the ImageFolder Format
The open-images-10
dataset was used in the previous steps of the tutorial. It may be helpful to look at the file structure of that dataset that was installed in the SDK at:/latentai/workspace/datasets/open-images-10-classes
The directory tree structure that Application Framework (AF) expects is as follows:
path/to/mydataset/
|---train
|---class_names.txt
|---index.txt
|---Class1
|---image1.jpeg
|---image2.jpeg
...
|---Class2
|---image1.jpeg
|---image2.jpeg
...
...
|---eval
|---class_names.txt
|---index.txt
|---Class1
|---image1.jpeg
|---image2.jpeg
...
|---Class2
|---image1.jpeg
|---image2.jpeg
...
...
The training and evaluation datasets are put in separate sub-directories, and both of these directories have the same format. For each class, place a directory in train
and eval
with the name of the class. Using open-images-10
as an example, there are directories for Apple
, Carnivore
, Chicken
, etc. Within train/Apple
and eval/Apple
are the images of apples for training and evaluating respectively.
You will need to create an additional file named index.txt
in addition to the directories containing the images. The file has the following format:
Apple/06e47f3aa0036947.jpg 0
Apple/d5e81b98e4627cd9.jpg 0
...
Carnivore/4e66355715e555eb.jpg 1
Carnivore/9108585613f30153.jpg 1
...
Chicken/3a4080f26e7e9a6c.jpg 2
Chicken/bd79d206d5ba197d.jpg 2
...
Each entry in that file provides the class/directory name and the file name of an image in that directory. A class index is also provided, starting with 0. In the above example, Apple is class 0, Carnivore is class 1, and Chicken is class 2. The order of the entries of this file are not important as long as the class indices are correct and there is a single line entry for each image in the respective dataset (train or eval).
Ingesting Data Using the ImageFolder Reader
A dataset YAML file provides parameters that tell the AFs how to ingest and interpret your dataset. You will need to create a YAML file for your data using the template and these instructions.
Once your data has been provided, the modular nature of LEIP Recipes means that your dataset will be compatible with training existing and future recipes all other image classifiers. This will give you a simple path for trying out various models and backbones for your application in a reproducible way for identifying the best model for your needs.
The template has been provided for the imagefolder
format at the following location in the SDK.
/latentai/custom-configs/data/imagefolder-reader-template.yaml
To use the template, make a copy of that file and rename it to associate it with your dataset.
cp /latentai/custom-configs/data_old/imagefolder-reader-template.yaml \
/latentai/custom-configs/data/penguin-and-sheep.yaml
The template contains a number of fields, but the following fields will need to be changed to match your data:
nclasses
: This is the number of classes in your dataset.root_path
: This is the absolute path to the root directory of the dataset.is_split
: This should be set totrue
if you provided separate training and evaluation directories as shown in the above example.train_split_subdir
: This is the directory path to the training data relative to theroot_path
. This is provided when is_split is true.val_split_subdir
: This is the directory path to the validation data relative to theroot_path
. This is provided when is_split is true.trainval_split_ratio
: This is the ratio of dataset to be used for training (e.g., 0.75). It is used only if is_split is false.trainval_split_seed
: This is a random number generator seed for dividing training and validation data. It is used only if is_split is false.dataset_name:
This name will be used to identify the artifacts generated with this data. Using a name without spaces or slash characters is recommended.
The remaining items should be left at their default values.
A concrete Classifier BYOD Example
Penguin and Sheep Detection Dataset Overview
Download and extract the dataset.
mkdir -p /latentai/workspace/datasets
cd /latentai/workspace/datasets
wget https://s3.us-west-1.amazonaws.com/leip-showcase.latentai.io/recipes/penguin-and-sheep.zip
unzip penguin-and-sheep.zip
# Verify data location matches the directories in this example:
ls /latentai/workspace/datasets/penguin-and-sheep
eval train
# Get back to /latentai
cd /latentai
Let's copy the imagefolder reader template to use with the penguin-and-sheep sample dataset:
cp /latentai/custom-configs/data_old/imagefolder-reader-template.yaml \
/latentai/custom-configs/data/penguin-and-sheep.yaml
Now edit the /latentai/custom-configs/data/penguin-and-sheep.yaml
file to fill in the blanks with the information specific to your dataset.
For this particular example, will need to edit the yaml file to provide the correct:
root_path:
/latentai/workspace/datasets/penguin-and-sheep
nclasses:
2
train_split_subdir: train/images
val_split_subdir: val/images
dataset_name: penguin-and-sheep
The rest of the values wont need to be changed.
Visualize the Data
af --config-name=classifier-recipe data=penguin-and-sheep command=vizdata
The AF will load a few samples of the data and write the class label on it using the information on the YAML file. The resulting labeled images will be stored in /latentai/artifacts/vizdata/penguin-and-sheep/*
.
Train
You can train your model on your dataset by specifying what backbone you want to train just like in Step One of the classifier recipe tutorial. This time, pass data=penguin-and-sheep
so the AF knows to train with that data:
af --config-name classifier-recipe \
data=penguin-and-sheep \
model.module.backbone="timm:gernet_m" \
command=train
Understanding the command:
--config-name classifier-recipe
: selects theclassifiers-recipe
recipe, which contains good defaults for training, evaluating, visualizing and exporting classifier models.data=penguin-and-sheep
: use the/latentai/custom-configs/data/penguin-and-sheep.yaml
filemodel.module.backbone="timm:gernet_m"
: selects thegernet_m
backbone provided by thetimm
library.
Note where the checkpoint is stored at the end of the training run. The checkpoint will be stored in a path of the form: /latentai/artifacts/train/{date}_{time}_leip_classifier_{backbone}/{epoch}.ckpt
. Find that filepath and store it for ease of use in the next steps:
export CHECKPOINT=<absolute path to .ckpt file>
# Example:
# export CHECKPOINT=/latentai/artifacts/train/2022-08-30_20-22-03_task_leip_classifier/epoch-6_step-1421.ckpt
Evaluate
Run the following commands to evaluate the data:
af --config-name classifier-recipe \
data=penguin-and-sheep \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=evaluate
Visualize Predictions
Run the following commands to visualize the predictions:
af --config-name classifier-recipe \
data=penguin-and-sheep \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=predict
The predicted images will be in /latentai/artifacts/predictions/penguin-and-sheep/*
.
Export the Model to Use With LEIP SDK
Run the following commands to export the model to use with the LEIP SDK:
af --config-name classifier-recipe \
data=penguin-and-sheep \
model.module.backbone="timm:gernet_m" \
+checkpoint=$CHECKPOINT \
command=export \
+export.include_preprocessor=True
The traced model will be stored as: /latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt
Advanced Settings
Change Input Shapes
If you wish to change the input shape, you will need to change task.width
and task.height
. Append the width and height overrides with your desired width and height on each of the commands above:
af [...] task.width=384 task.height=384
If you change the input shape, the name of the exported file will also change. For example:
leip_classifier_timm-gernet_m_1x384x384x10.pt
Adjust your SDK commands accordingly.
You will also need to adjust the input shape in the LEIP Pipeline configuration files that you used in Step Two. Look for input_shapes
under model:
and adjust it to your new input shape:
input_shapes: [ [ 1, 3, 384, 384 ] ]
Using Your BYOD Dataset with LEIP Evaluate
You will need to create a model schema file for your BYOD dataset. For your own classification dataset, create a file at /latentai/workspace/datasets/penguin-and-sheep/dataset_schema.json
. In that file, put the following information:
{
"leip": {
"data_dir": "/latentai/workspace/datasets/penguin-and-sheep/eval/images",
"annotations": "/latentai/workspace/datasets/penguin-and-sheep/annotations/index.txt"
}
}
The test_path
will have to be modified to instruct leip evaluate
where to find the index.txt
file for your validation data when you get to Step Three. For example:
--test_path=/latentai/workspace/datasets/penguin-and-sheep/dataset_schema.json