Skip to main content
Skip table of contents

Classifier Recipes: Bring Your Own Data (BYOD)

Bring Your Own Classifier Data

With the steps outlined earlier in this tutorial, you will be able to get an idea of which classifier backbones will meet your speed requirements by evaluating their speed on the target hardware. Once you are ready to train the models with your own dataset, you will need to make sure your data is in the supported format for training and evaluation following the steps below.

Formatting Classifier Data

Classifier recipes support a single data format that is simple and straightforward. This data format is based on the PyTorch ImageFolder.

Understanding the ImageFolder Format

The open-images-10 dataset was used in the previous steps of the tutorial. It may be helpful to look at the file structure of that dataset which was installed in the SDK at:
/latentai/workspace/datasets/open-images-10-classes

The directory tree structure that Application Framework (AF) expects is as follows:

CODE
   path/to/mydataset/                
                    |---train
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...
                    |---eval
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...

The training and evaluation datasets are put in separate sub-directories, and both of these directories have the same format. For each class, place a directory in train and eval with the name of the class. Using open-images-10 as an example, there are directories for Apple, Carnivore, Chicken, etc. Within train/Apple and eval/Apple are the images of apples for training and evaluating respectively.

In addition to the directories containing the images, you will need to create an additional file named index.txt which has the following format:

CODE
Apple/06e47f3aa0036947.jpg 0
Apple/d5e81b98e4627cd9.jpg 0
...
Carnivore/4e66355715e555eb.jpg 1
Carnivore/9108585613f30153.jpg 1
...
Chicken/3a4080f26e7e9a6c.jpg 2
Chicken/bd79d206d5ba197d.jpg 2
...

Each entry in that file provides the class/directory name and the file name of an image in that directory. A class index is also provided, starting with 0. In the above example, Apple is class 0, Carnivore is class 1, etc. The order of the entries of this file are not important, as long as the class indices are correct and there is a single line entry for each image in the respective dataset (train or eval).

Ingesting your data using the Imagefolder template

A Dataset YAML file provides parameters that tell the AFs how to ingest and interpret your dataset. You will create a YAML file for your data using the template and these instructions.

Once your data has been provided, the modular nature of LEIP Recipes means that your dataset will be compatible with training existing and future recipes all other image classifiers. This will give you a simple path for trying out various models and backbones for your application in a reproducible way for identifying the best model for your needs.

The template has been provided for the imagefolder format at the following location in the SDK.

/latentai/custom-configs/data/imagefolder-like-template.yaml
To use the template, make a copy of that file and rename it to associate it with your dataset.

CODE
cp /latentai/custom-configs/data/imagefolder-like-template.yaml \
 /latentai/custom-configs/data/<your classification dataset name>.yaml

The template contains a number of fields, but the following fields will need to be changed to match your data:

  • nclasses: Number of classes in your dataset

  • root_path: Absolute path to the root directory of the dataset

  • is_split: If you provided separate training and evaluation directories as in the example above, this should be set to true

  • train_split_subdir: The directory path to the training data relative to the root_path . Provided when is_split is true.

  • val_split_subdir: The directory path to the validation data relative to the root_path . Provided when is_split is true.

  • trainval_split_ratio: Ratio of dataset to be used for training (e.g. 0.75). Used only if is_split is false.

  • trainval_split_seed: Random number generator seed for dividing training and validation data. Used only if is_split is false.

  • dataset_name: This name will be used to identify the artifacts generated with this data. Using a name without spaces or slash characters is recommended.

Leave the remaining items at their default values.

Using your BYOD Dataset with AF


Once you have formatted your data and created the dataset file, you are ready to visualize your data and train and evaluate your model on it. Lets say you followed the above instructions for a dataset of cats and dogs, and you now have a dataset file: /latentai/custom-configs/data/cats-and-dogs.yaml.

Visualize the data

CODE
af --config-name=classifier-recipe data=cats-and-dogs command=vizdata

Using the information on the yaml file, the AF will load a few samples of the data and write the class label on it. The resulting labeled images will be stored in /latentai/artifacts/vizdata/cats-and-dogs/*.

Train

Just like in Step One of the classifier recipe tutorial, you can train your model on your dataset by specifying what backbone you want to train. This time, pass data=cats-and-dogs so the AF knows to train with that data:

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  command=train

Understanding the command:

  • --config-name classifier-recipe: selects the classifiers-recipe recipe, which contains good defaults for training, evaluating, visualizing and exporting classifier models.

  • data=cats-and-dogs: use the ./data/cats-and-dogs.yaml file

  • model.module.backbone="timm:gernet_m": selects the gernet_m backbone provided by the timm library.

Note where the checkpoint is stored at the end of the training run. The checkpoint will be stored in a path of the form: /latentai/artifacts/train/{date}_{time}_BYOD_recipe/{epoch}.ckpt Find that filepath and store it for ease of use in the next steps:

CODE
export CHECKPOINT=<absolute path to .ckpt file>

# Example:
# export CHECKPOINT=/latentai/artifacts/train/2022-08-30_20-22-03_task_leip_classifier/epoch-6_step-1421.ckpt

Evaluate

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=evaluate

Visualize Predictions

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=predict

The predicted images will be in /latentai/artifacts/predictions/cats-and-dogs/*.

Left: Ground truth sample from the open-images-10-classes validation set.
Right: Prediction generated by a trained timm:gernet_m model.

Export the Model to Use With LEIP SDK

CODE
af --config-name classifier-recipe \
  data=cats-and-dogs \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=export

The traced model will be stored as: /latentai/artifacts/export/leip_classifier_batch1_224-224/leip_classifier_timm-gernet_m_1x224x224x10.pt

Advanced Settings

Change input shapes

If you wish to change the input shape, you will need to change task.width and task.height. Append the width and height overrides with your desired width and height on each of the commands above:

CODE
af [...] task.width=384 task.height=384

If you change the input shape, the name of the exported file will also change. For example:

leip_classifier_timm-gernet_m_1x384x384x10.pt

Adjust your SDK commands accordingly.

You will also need to adjust the input shape in the pipeline configuration files that you used in Step Two. Look for input_shape under model: and adjust it to your new input shape:

input_shapes: [ [ 1, 3, 384, 384 ] ]

Using your BYOD Dataset with LEIP Evaluate

When you get to Step Three with the model you have trained with your own dataset, you will need to modify the test_path to instruct leip evaluate where to find the index.txt file for your validation data. For example:

--test_path=/latentai/workspace/datasets/cat-and-dogs/eval/index.txt

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.