Preparing datasets
You can use Latent Agent to explore recipes and, with a labeled dataset, train machine learning models for computer vision tasks.
Once you've organized your data in one of our supported formats (YOLO, COCO, Pascal VOC, or KITTI), you can provide Latent Agent with the dataset in one of two ways:
- Place the dataset in a directory on your local machine and provide the full path to that directory.
- Upload your dataset to an Amazon S3 bucket and provide a presigned URL.
Accepted dataset formats
Latent Agent can ingest datasets in the following formats: YOLO, COCO, Pascal, and KITTI.
YOLO dataset format
Directory structure
my_dataset
├── train
│ ├── images
| │ ├── image1.jpg
| │ └── image2.jpg
│ └── labels
| ├── image1.txt
| └── image2.txt
└── val
├── images
│ ├── image3.jpg
│ └── image4.jpg
└── labels
├── image3.txt
└── image4.txt
Annotation file format
Each .txt
file in labels/
corresponds to an image in images/
with the same filename.
# train/labels/image1.txt
class_name1 bbox1_x_center bbox1_y_center bbox1_width bbox1_height
class_name1 bbox2_x_center bbox2_y_center bbox2_width bbox2_height
x_center
, y_center
, width
, and height
are normalized (0–1) relative to image dimensions.
COCO dataset format
Directory structure
my_dataset
├── annotations
│ ├── instances_train.json
│ └── instances_val.json
├── train
│ ├── image1.jpg
│ └── image2.jpg
└── val
├── image3.jpg
└── image4.jpg
Annotation file format
Each JSON file in annotations/
defines the dataset's images, annotations, and categories.
# annotations/instances_train.json
{
"images": [
{
"id": 1,
"width": <image1_width>,
"height": <image1_height>,
"file_name": "image1.jpg"
},
...
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [<xmin>, <ymin>, <width>, <height>],
"area": <bbox_width * bbox_height>,
"iscrowd": 0
},
...
],
"categories": [
{
"id": 1,
"name": "<class_name1>"
},
...
]
}
Bounding boxes are absolute pixel coordinates [xmin, ymin, width, height]
.
Pascal VOC dataset format
Directory structure
my_dataset
├── Annotations
│ ├── image1.xml
│ └── image2.xml
├── JPEGImages
│ ├── image1.jpg
│ └── image2.jpg
└── ImageSets
├── train.txt
└── val.txt
Annotation file format
Each .xml
file in Annotations/
corresponds to an image in JPEGImages/
with the same filename.
# Annotations/image1.xml
<annotation>
<folder></folder>
<filename>image1.jpg</filename>
<size>
<width>image1_width</width>
<height>image1_height</height>
<depth>image1_num_channels</depth>
</size>
<object>
<name>class_name1</name>
...
<bndbox>
<xmin>bounding_box_1_xmin</xmin>
<ymin>bounding_box_1_ymin</ymin>
<xmax>bounding_box_1_xmax</xmax>
<ymax>bounding_box_1_ymax</ymax>
</bndbox>
...
</object>
</annotation>
# ImageSets is an optional folder and provides a split list of image names for each dataset.
# ImageSets/train.txt
image1
image2
Bounding boxes are absolute pixel coordinates.
KITTI dataset format
Directory structure
my_dataset
├── train
│ ├── images
| │ ├── image1.jpg
| │ └── image2.jpg
│ └── labels
| ├── image1.txt
| └── image2.txt
└── val
├── images
│ ├── image3.jpg
│ └── image4.jpg
└── labels
├── image3.txt
└── image4.txt
Annotation file format
Each .txt
file in labels/
corresponds to an image in images/
with the same filename, where each object is described on a separate line with 15 space-separated values. 2D tasks typically use the first eight fields only.
# labels/image1.txt
<class> <truncated> <occluded> <alpha> <xmin> <ymin> <xmax> <ymax> <height> <width> <length> <x> <y> <z> <rotation_y>
- Bounding boxes (
xmin
,ymin
,xmax
,ymax
) are absolute pixel coordinates. - 3D dimensions and positions (
height
,width
,length
,x
,y
,z
) are in meters. truncated
is the fraction (0–1) of the object outside the image;occluded
is {0=visible, 1=partly, 2=mostly, 3=unknown}.