Skip to main content
Skip table of contents

Classifier Recipes: BYOD

Bring Your Own Classifier Data

Testing the backbones in the previous step will give you an idea which classifier backbones will meet your speed requirements on the target hardware. Ensure your data is in the supported format for training and evaluation once you are ready to train the models with your own dataset.

Formatting Classifier Data

Classifier recipes support a single data format that is simple and straightforward. This data format is based on the PyTorch ImageFolder.

Understanding the ImageFolder Format

The open-images-10 dataset was used in the previous steps of the tutorial. It may be helpful to look at the file structure of that dataset that was installed in the SDK at:
/latentai/workspace/datasets/open-images-10-classes

The directory tree structure that Application Framework (AF) expects is as follows:

CODE
   path/to/mydataset/                
                    |---train
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...
                    |---eval
                        |---class_names.txt
                        |---index.txt
                        |---Class1
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                        |---Class2
                            |---image1.jpeg
                            |---image2.jpeg
                                ...
                            ...

The training and evaluation datasets are put in separate sub-directories, and both of these directories have the same format. For each class, place a directory in train and eval with the name of the class. Using open-images-10 as an example, there are directories for Apple, Carnivore, Chicken, etc. Within train/Apple and eval/Apple are the images of apples for training and evaluating respectively.

You will need to create an additional file named index.txt in addition to the directories containing the images. The file has the following format:

CODE
Apple/06e47f3aa0036947.jpg 0
Apple/d5e81b98e4627cd9.jpg 0
...
Carnivore/4e66355715e555eb.jpg 1
Carnivore/9108585613f30153.jpg 1
...
Chicken/3a4080f26e7e9a6c.jpg 2
Chicken/bd79d206d5ba197d.jpg 2
...

Each entry in that file provides the class/directory name and the file name of an image in that directory. A class index is also provided, starting with 0. In the above example, Apple is class 0, Carnivore is class 1, and Chicken is class 2. The order of the entries of this file are not important as long as the class indices are correct and there is a single line entry for each image in the respective dataset (train or eval).

Ingesting Data Using the ImageFolder Reader

A dataset YAML file provides parameters that tell the AFs how to ingest and interpret your dataset. You will need to create a YAML file for your data using the template and these instructions.

Once your data has been provided, the modular nature of LEIP Recipes means that your dataset will be compatible with training existing and future recipes all other image classifiers. This will give you a simple path for trying out various models and backbones for your application in a reproducible way for identifying the best model for your needs.

The template has been provided for the imagefolder format at the following location in the SDK.

/latentai/custom-configs/data/imagefolder-reader-template.yaml
To use the template, make a copy of that file and rename it to associate it with your dataset.

CODE
cp /latentai/custom-configs/data_old/imagefolder-reader-template.yaml \
 /latentai/custom-configs/data/penguin-and-sheep.yaml

The template contains a number of fields, but the following fields will need to be changed to match your data:

  • nclasses: This is the number of classes in your dataset.

  • root_path: This is the absolute path to the root directory of the dataset.

  • is_split: This should be set to true if you provided separate training and evaluation directories as shown in the above example.

  • train_split_subdir: This is the directory path to the training data relative to the root_path. This is provided when is_split is true.

  • val_split_subdir: This is the directory path to the validation data relative to the root_path. This is provided when is_split is true.

  • trainval_split_ratio: This is the ratio of dataset to be used for training (e.g., 0.75). It is used only if is_split is false.

  • trainval_split_seed: This is a random number generator seed for dividing training and validation data. It is used only if is_split is false.

  • dataset_name: This name will be used to identify the artifacts generated with this data. Using a name without spaces or slash characters is recommended.

The remaining items should be left at their default values.

A concrete Classifier BYOD Example

Penguin and Sheep Detection Dataset Overview

Download and extract the dataset.

CODE
mkdir -p /latentai/workspace/datasets
cd /latentai/workspace/datasets
wget https://s3.us-west-1.amazonaws.com/leip-showcase.latentai.io/recipes/penguin-and-sheep.zip
unzip penguin-and-sheep.zip

# Verify data location matches the directories in this example:
ls /latentai/workspace/datasets/penguin-and-sheep
eval train


# Get back to /latentai
cd /latentai

Let's copy the imagefolder reader template to use with the penguin-and-sheep sample dataset:

CODE
cp /latentai/custom-configs/data_old/imagefolder-reader-template.yaml \
 /latentai/custom-configs/data/penguin-and-sheep.yaml

Now edit the /latentai/custom-configs/data/penguin-and-sheep.yaml file to fill in the blanks with the information specific to your dataset.

For this particular example, will need to edit the yaml file to provide the correct:

  • root_path: /latentai/workspace/datasets/penguin-and-sheep

  • nclasses: 2

  • train_split_subdir: train/images

  • val_split_subdir: val/images

  • dataset_name: penguin-and-sheep

The rest of the values wont need to be changed.

Visualize the Data

CODE
af --config-name=classifier-recipe data=penguin-and-sheep command=vizdata

The AF will load a few samples of the data and write the class label on it using the information on the YAML file. The resulting labeled images will be stored in /latentai/artifacts/vizdata/penguin-and-sheep/*.

Train

You can train your model on your dataset by specifying what backbone you want to train just like in Step One of the classifier recipe tutorial. This time, pass data=penguin-and-sheep so the AF knows to train with that data:

CODE
af --config-name classifier-recipe \
  data=penguin-and-sheep \
  model.module.backbone="timm:gernet_m" \
  command=train

Understanding the command:

  • --config-name classifier-recipe: selects the classifiers-recipe recipe, which contains good defaults for training, evaluating, visualizing and exporting classifier models.

  • data=penguin-and-sheep: use the /latentai/custom-configs/data/penguin-and-sheep.yaml file

  • model.module.backbone="timm:gernet_m": selects the gernet_m backbone provided by the timm library.

Note where the checkpoint is stored at the end of the training run. The checkpoint will be stored in a path of the form: /latentai/artifacts/train/{date}_{time}_leip_classifier_{backbone}/{epoch}.ckpt. Find that filepath and store it for ease of use in the next steps:

CODE
export CHECKPOINT=<absolute path to .ckpt file>

# Example:
# export CHECKPOINT=/latentai/artifacts/train/2022-08-30_20-22-03_task_leip_classifier/epoch-6_step-1421.ckpt

Evaluate

Run the following commands to evaluate the data:

CODE
af --config-name classifier-recipe \
  data=penguin-and-sheep \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=evaluate

Visualize Predictions

Run the following commands to visualize the predictions:

CODE
af --config-name classifier-recipe \
  data=penguin-and-sheep \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=predict

The predicted images will be in /latentai/artifacts/predictions/penguin-and-sheep/*.

A sample prediction image: index_000022.jpeg

Export the Model to Use With LEIP SDK

Run the following commands to export the model to use with the LEIP SDK:

CODE
af --config-name classifier-recipe \
  data=penguin-and-sheep \
  model.module.backbone="timm:gernet_m" \
  +checkpoint=$CHECKPOINT \
  command=export \
  +export.include_preprocessor=True

The traced model will be stored as: /latentai/artifacts/export/leip_classifier_timm-gernet_m_batch1_224-224/traced_model.pt

Advanced Settings

Change Input Shapes

If you wish to change the input shape, you will need to change task.width and task.height. Append the width and height overrides with your desired width and height on each of the commands above:

CODE
af [...] task.width=384 task.height=384

If you change the input shape, the name of the exported file will also change. For example:

leip_classifier_timm-gernet_m_1x384x384x10.pt

Adjust your SDK commands accordingly.

You will also need to adjust the input shape in the LEIP Pipeline configuration files that you used in Step Two. Look for input_shapes under model: and adjust it to your new input shape:

input_shapes: [ [ 1, 3, 384, 384 ] ]

Using Your BYOD Dataset with LEIP Evaluate

You will need to create a model schema file for your BYOD dataset. For your own classification dataset, create a file at /latentai/workspace/datasets/penguin-and-sheep/dataset_schema.json. In that file, put the following information:

CODE
{
  "leip": {
    "data_dir": "/latentai/workspace/datasets/penguin-and-sheep/eval/images",
    "annotations": "/latentai/workspace/datasets/penguin-and-sheep/annotations/index.txt"
  }
}


The test_path will have to be modified to instruct leip evaluate where to find the index.txt file for your validation data when you get to Step Three. For example:

--test_path=/latentai/workspace/datasets/penguin-and-sheep/dataset_schema.json

 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.