Skip to main content
Skip table of contents

Classifier Recipes: Bring Your Own Data (BYOD)

Bring Your Own Classifier Data

Testing the backbones in the previous step will give you an idea which classifier backbones will meet your speed requirements on the target hardware. Ensure your data is in the supported format for training and evaluation once you are ready to train the models with your own dataset.

Formatting Classifier Data

Classifier recipes support a single data format that is simple and straightforward. This data format is based on the PyTorch ImageFolder.

Understanding the ImageFolder Format

The open-images-10 dataset was used in the previous steps of the tutorial. It may be helpful to look at the file structure of that dataset that was installed in the SDK at:
/latentai/workspace/datasets/open-images-10-classes

The directory tree structure that Application Framework (AF) expects is as follows:

CODE
   path/to/mydataset/                
                    |---train
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...
                    |---eval
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...

The training and evaluation datasets are put in separate sub-directories, and both of these directories have the same format. For each class, place a directory in train and eval with the name of the class. Using open-images-10 as an example, there are directories for Apple, Carnivore, Chicken, etc. Within train/Apple and eval/Apple are the images of apples for training and evaluating respectively.

You will need to create an additional file named index.txt in addition to the directories containing the images. The file has the following format:

CODE
Apple/06e47f3aa0036947.jpg 0
Apple/d5e81b98e4627cd9.jpg 0
...
Carnivore/4e66355715e555eb.jpg 1
Carnivore/9108585613f30153.jpg 1
...
Chicken/3a4080f26e7e9a6c.jpg 2
Chicken/bd79d206d5ba197d.jpg 2
...

Each entry in that file provides the class/directory name and the file name of an image in that directory. A class index is also provided, starting with 0. In the above example, Apple is class 0, Carnivore is class 1, and Chicken is class 2. The order of the entries of this file are not important as long as the class indices are correct and there is a single line entry for each image in the respective dataset (train or eval).

Ingesting Your Data Using the ImageFolder Template

A dataset YAML file provides parameters that tell the AFs how to ingest and interpret your dataset. You will need to create a YAML file for your data using the template and these instructions.

Once your data has been provided, the modular nature of LEIP Recipes means that your dataset will be compatible with training existing and future recipes all other image classifiers. This will give you a simple path for trying out various models and backbones for your application in a reproducible way for identifying the best model for your needs.

The template has been provided for the imagefolder format at the following location in the SDK.

/latentai/custom-configs/data/imagefolder-like-template.yaml
To use the template, make a copy of that file and rename it to associate it with your dataset.

CODE
cp /latentai/custom-configs/data/imagefolder-like-template.yaml \
 /latentai/custom-configs/data/<your classification dataset name>.yaml

The template contains a number of fields, but the following fields will need to be changed to match your data:

  • nclasses: This is the number of classes in your dataset.

  • root_path: This is the absolute path to the root directory of the dataset.

  • is_split: This should be set to true if you provided separate training and evaluation directories as shown in the above example.

  • train_split_subdir: This is the directory path to the training data relative to the root_path. This is provided when is_split is true.

  • val_split_subdir: This is the directory path to the validation data relative to the root_path. This is provided when is_split is true.

  • trainval_split_ratio: This is the ratio of dataset to be used for training (e.g., 0.75). It is used only if is_split is false.

  • trainval_split_seed: This is a random number generator seed for dividing training and validation data. It is used only if is_split is false.

  • dataset_name: This name will be used to identify the artifacts generated with this data. Using a name without spaces or slash characters is recommended.

The remaining items should be left at their default values.

Using your BYOD Dataset With AF

Once you have formatted your data and created the dataset file, you are ready to visualize your data and train and evaluate your model on it. For example, if you followed the above instructions to format and create a dataset of cats and dogs, you should have the following dataset file: /latentai/custom-configs/data/cats-and-dogs.yaml.

Visualize the Data

CODE
af --config-name=classifier-recipe data=cats-and-dogs command=vizdata

The AF will load a few samples of the data and write the class label on it using the information on the YAML file. The resulting labeled images will be stored in /latentai/artifacts/vizdata/cats-and-dogs/*.

Train

You can train your model on your dataset by specifying what backbone you want to train just like in Step One of the classifier recipe tutorial. This time, pass data=cats-and-dogs so the AF knows to train with that data:

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  command=train

Understanding the command:

  • --config-name classifier-recipe: selects the classifiers-recipe recipe, which contains good defaults for training, evaluating, visualizing and exporting classifier models.

  • data=cats-and-dogs: use the ./data/cats-and-dogs.yaml file

  • model.module.backbone="timm:gernet_m": selects the gernet_m backbone provided by the timm library.

Note where the checkpoint is stored at the end of the training run. The checkpoint will be stored in a path of the form: /latentai/artifacts/train/{date}_{time}_BYOD_recipe/{epoch}.ckpt. Find that filepath and store it for ease of use in the next steps:

CODE
export CHECKPOINT=<absolute path to .ckpt file>

# Example:
# export CHECKPOINT=/latentai/artifacts/train/2022-08-30_20-22-03_task_leip_classifier/epoch-6_step-1421.ckpt

Evaluate

Run the following commands to evaluate the data:

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=evaluate

Visualize Predictions

Run the following commands to visualize the predictions:

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=predict

The predicted images will be in /latentai/artifacts/predictions/cats-and-dogs/*.

Left: Ground truth sample from the open-images-10-classes validation set.
Right: Prediction generated by a trained timm:gernet_m model.

Export the Model to Use With LEIP SDK

Run the following commands to export the model to use with the LEIP SDK:

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=export

The traced model will be stored as: /latentai/artifacts/export/leip_classifier_batch1_224-224/leip_classifier_timm-gernet_m_1x224x224x10.pt.

Advanced Settings

Change Input Shapes

If you wish to change the input shape, you will need to change task.width and task.height. Append the width and height overrides with your desired width and height on each of the commands above:

CODE
af [...] task.width=384 task.height=384

If you change the input shape, the name of the exported file will also change. For example:

leip_classifier_timm-gernet_m_1x384x384x10.pt

Adjust your SDK commands accordingly.

You will also need to adjust the input shape in the LEIP Pipeline configuration files that you used in Step Two. Look for input_shapes under model: and adjust it to your new input shape:

input_shapes: [ [ 1, 3, 384, 384 ] ]

Using Your BYOD Dataset with LEIP Evaluate

You will now need to modify the test_path to instruct leip evaluate where to find the index.txt file for your validation data when you get to Step Three. For example:

--test_path=/latentai/workspace/datasets/cat-and-dogs/eval/index.txt

 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.