Preparing datasets 
You can use Latent Agent to explore recipes and, with a labeled dataset, train machine learning models for computer vision tasks.
Once you've organized your data in one of our supported formats (YOLO, COCO, Pascal VOC, or KITTI), you can provide Latent Agent with the dataset in one of two ways:
- Place the dataset in a directory on your local machine and provide the full path to that directory.
 - Upload your dataset to an Amazon S3 bucket and provide a presigned URL.
 
Object detection datasets 
Latent Agent can ingest object detection datasets in the following formats: YOLO, COCO, Pascal, and KITTI.
YOLO dataset format
Directory structure
my_dataset
├── train
│   ├── images
|   │   ├── image1.jpg
|   │   └── image2.jpg
│   └── labels
|       ├── image1.txt
|       └── image2.txt
└── val
    ├── images
    │   ├── image3.jpg
    │   └── image4.jpg
    └── labels
        ├── image3.txt
        └── image4.txtAnnotation file format
Each .txt file in labels/ corresponds to an image in images/ with the same filename.
# train/labels/image1.txt
class_name1 bbox1_x_center bbox1_y_center bbox1_width bbox1_height
class_name1 bbox2_x_center bbox2_y_center bbox2_width bbox2_heightx_center, y_center, width, and height are normalized (0–1) relative to image dimensions.
COCO dataset format
Directory structure
my_dataset
├── annotations
│   ├── instances_train.json
│   └── instances_val.json
├── train
│   ├── image1.jpg
│   └── image2.jpg
└── val
    ├── image3.jpg
    └── image4.jpgAnnotation file format
Each JSON file in annotations/ defines the dataset's images, annotations, and categories.
# annotations/instances_train.json
{
  "images": [
    {
      "id": 1,
      "width": <image1_width>,
      "height": <image1_height>,
      "file_name": "image1.jpg"
    },
    ...
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [<xmin>, <ymin>, <width>, <height>], 
      "area": <bbox_width * bbox_height>,
      "iscrowd": 0
    },
    ...
  ],
  "categories": [
    {
      "id": 1,
      "name": "<class_name1>"
    },
    ...
  ]
}Bounding boxes are absolute pixel coordinates [xmin, ymin, width, height].
Pascal VOC dataset format
Directory structure
my_dataset
├── Annotations
│   ├── image1.xml
│   └── image2.xml
├── JPEGImages
│   ├── image1.jpg
│   └── image2.jpg
└── ImageSets
    ├── train.txt
    └── val.txtAnnotation file format
Each .xml file in Annotations/ corresponds to an image in JPEGImages/ with the same filename.
# Annotations/image1.xml
<annotation>
    <folder></folder>
    <filename>image1.jpg</filename>
    <size>
        <width>image1_width</width>
        <height>image1_height</height>
        <depth>image1_num_channels</depth>
    </size>
    <object>
        <name>class_name1</name>
        ...
        <bndbox>
            <xmin>bounding_box_1_xmin</xmin>
            <ymin>bounding_box_1_ymin</ymin>
            <xmax>bounding_box_1_xmax</xmax>
            <ymax>bounding_box_1_ymax</ymax>
        </bndbox>
        ...
    </object>
</annotation>
# ImageSets is an optional folder and provides a split list of image names for each dataset.
# ImageSets/train.txt
image1
image2Bounding boxes are absolute pixel coordinates.
KITTI dataset format
Directory structure
my_dataset
├── train
│   ├── images
|   │   ├── image1.jpg
|   │   └── image2.jpg
│   └── labels
|       ├── image1.txt
|       └── image2.txt
└── val
    ├── images
    │   ├── image3.jpg
    │   └── image4.jpg
    └── labels
        ├── image3.txt
        └── image4.txtAnnotation file format
Each .txt file in labels/ corresponds to an image in images/ with the same filename, where each object is described on a separate line with 15 space-separated values. 2D tasks typically use the first eight fields only.
# labels/image1.txt
<class> <truncated> <occluded> <alpha> <xmin> <ymin> <xmax> <ymax> <height> <width> <length> <x> <y> <z> <rotation_y>- Bounding boxes (
xmin,ymin,xmax,ymax) are absolute pixel coordinates. - 3D dimensions and positions (
height,width,length,x,y,z) are in meters. truncatedis the fraction (0–1) of the object outside the image;occludedis {0=visible, 1=partly, 2=mostly, 3=unknown}.
Image classification datasets 
Latent Agent can ingest image classification datasets in the following format:
Image classification dataset format
Directory structure
my_dataset
├── train
│   ├── class1
│   │   ├── image1.jpg
│   │   └── image2.jpg
│   └── class2
│       ├── image3.jpg
│       └── image4.jpg
└── val
    ├── class1
    │   ├── image5.jpg
    │   └── image6.jpg
    └── class2
        ├── image7.jpg
        └── image8.jpg