Skip to content

Preparing datasets

You can use Latent Agent to explore recipes and, with a labeled dataset, train machine learning models for computer vision tasks.

Once you've organized your data in one of our supported formats (YOLO, COCO, Pascal VOC, or KITTI), you can provide Latent Agent with the dataset in one of two ways:

  • Place the dataset in a directory on your local machine and provide the full path to that directory.
  • Upload your dataset to an Amazon S3 bucket and provide a presigned URL.

Accepted dataset formats

Latent Agent can ingest datasets in the following formats: YOLO, COCO, Pascal, and KITTI.

YOLO dataset format

Directory structure

my_dataset
├── train
│   ├── images
|   │   ├── image1.jpg
|   │   └── image2.jpg
│   └── labels
|       ├── image1.txt
|       └── image2.txt
└── val
    ├── images
    │   ├── image3.jpg
    │   └── image4.jpg
    └── labels
        ├── image3.txt
        └── image4.txt

Annotation file format

Each .txt file in labels/ corresponds to an image in images/ with the same filename.

# train/labels/image1.txt
class_name1 bbox1_x_center bbox1_y_center bbox1_width bbox1_height
class_name1 bbox2_x_center bbox2_y_center bbox2_width bbox2_height

x_center, y_center, width, and height are normalized (0–1) relative to image dimensions.

COCO dataset format

Directory structure

my_dataset
├── annotations
│   ├── instances_train.json
│   └── instances_val.json
├── train
│   ├── image1.jpg
│   └── image2.jpg
└── val
    ├── image3.jpg
    └── image4.jpg

Annotation file format

Each JSON file in annotations/ defines the dataset's images, annotations, and categories.

# annotations/instances_train.json
{
  "images": [
    {
      "id": 1,
      "width": <image1_width>,
      "height": <image1_height>,
      "file_name": "image1.jpg"
    },
    ...
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [<xmin>, <ymin>, <width>, <height>], 
      "area": <bbox_width * bbox_height>,
      "iscrowd": 0
    },
    ...
  ],
  "categories": [
    {
      "id": 1,
      "name": "<class_name1>"
    },
    ...
  ]
}

Bounding boxes are absolute pixel coordinates [xmin, ymin, width, height].

Pascal VOC dataset format

Directory structure

my_dataset
├── Annotations
│   ├── image1.xml
│   └── image2.xml
├── JPEGImages
│   ├── image1.jpg
│   └── image2.jpg
└── ImageSets
    ├── train.txt
    └── val.txt

Annotation file format

Each .xml file in Annotations/ corresponds to an image in JPEGImages/ with the same filename.

# Annotations/image1.xml
<annotation>
    <folder></folder>
    <filename>image1.jpg</filename>
    <size>
        <width>image1_width</width>
        <height>image1_height</height>
        <depth>image1_num_channels</depth>
    </size>
    <object>
        <name>class_name1</name>
        ...
        <bndbox>
            <xmin>bounding_box_1_xmin</xmin>
            <ymin>bounding_box_1_ymin</ymin>
            <xmax>bounding_box_1_xmax</xmax>
            <ymax>bounding_box_1_ymax</ymax>
        </bndbox>
        ...
    </object>
</annotation>

# ImageSets is an optional folder and provides a split list of image names for each dataset.
# ImageSets/train.txt
image1
image2

Bounding boxes are absolute pixel coordinates.

KITTI dataset format

Directory structure

my_dataset
├── train
│   ├── images
|   │   ├── image1.jpg
|   │   └── image2.jpg
│   └── labels
|       ├── image1.txt
|       └── image2.txt
└── val
    ├── images
    │   ├── image3.jpg
    │   └── image4.jpg
    └── labels
        ├── image3.txt
        └── image4.txt

Annotation file format

Each .txt file in labels/ corresponds to an image in images/ with the same filename, where each object is described on a separate line with 15 space-separated values. 2D tasks typically use the first eight fields only.

# labels/image1.txt
<class> <truncated> <occluded> <alpha> <xmin> <ymin> <xmax> <ymax> <height> <width> <length> <x> <y> <z> <rotation_y>
  • Bounding boxes (xmin, ymin, xmax, ymax) are absolute pixel coordinates.
  • 3D dimensions and positions (height, width, length, x, y, z) are in meters.
  • truncated is the fraction (0–1) of the object outside the image; occluded is {0=visible, 1=partly, 2=mostly, 3=unknown}.