Bring Your Own Data (BYOD)¶
In this guide, we’ll walk you through how to integrate your own dataset into a LEIP Design recipe. We will start by reviewing the datasets we offer, followed by a step-by-step demonstration using the Road Sign Detection dataset from Kaggle. These steps can be applied to any dataset you choose to work with.
First, we generate our pantry and create a recipe to work with as shown in the Getting Started tutorial.
from pathlib import Path
import leip_recipe_designer as rd
# Define the workspace path
workspace = Path('./workspace')
# Build the pantry (do not rebuild if it already exists)
pantry = rd.Pantry.build(workspace / "./my_combined_pantry/", force_rebuild=False)
recipe = rd.create.from_recipe_id('44702', pantry=pantry, allow_upgrade=True)
2024-10-29 11:37:52,363 | WARNING | pantry.build-119 | You requested to build a Pantry, but haven't specified the desired execution contexts. Therefore, will use the installed ones ['leip_af', 'leip_forge', 'leip_stub_gen'] 0%| | 0/155 [00:00<?, ?it/s]/home/sai/code/repos/leip-recipe-designer-api/leip_recipe_designer/core/pantry_builder/source_readers.py:33: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 with initialize_config_dir(config_dir=str(os.path.abspath(self._configs_folder)), job_name="test_app"): 1%| | 1/155 [00:00<00:17, 8.96it/s]/home/sai/code/repos/leip-recipe-designer-api/leip_recipe_designer/core/pantry_builder/source_readers.py:33: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 with initialize_config_dir(config_dir=str(os.path.abspath(self._configs_folder)), job_name="test_app"): 100%|██████████| 155/155 [00:08<00:00, 18.09it/s] 0%| | 0/8 [00:00<?, ?it/s]/home/sai/miniconda3/envs/latest_dev/lib/python3.8/site-packages/forge/onnx/calibrate.py:24: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console) from tqdm.autonotebook import tqdm WARNING:root:TensorRT timing cache is unset. Set with 'forge.set_tensorrt_time_cache()'. 100%|██████████| 8/8 [00:02<00:00, 3.14it/s] 100%|██████████| 3/3 [00:00<00:00, 14380.47it/s]
Downloading goldenrecipedb with name "xval_det" and variant "Xval0.3" (0)... Download completed: workspace/goldenrecipedbs/xval_det/Xval0.3 This is the Cross-validation volume. Available methods are- get_golden_df describe_table
Before bringing your own dataset, you might want to explore the datasets we provide. These can be a good starting point if you’re looking for a quick test setup.
recipe.options("data_generator")
> Help: Dataset. > Ingredients that fit: Index Parameter Type Version UUID 0 BYOD from Url - PASCAL format data_generator.vision.detection.2d 1.0.0 da9ec6daa3a287173c17307eb727a033c356677976662fca84c1d0697c5960ef 1 BYOD from Url - COCO format data_generator.vision.detection.2d 1.0.0 0b9bca30d0fe77ee7734e56dc7291272c7e69894d3419ac64f70fe32add19f32 2 BYOD from Url - YOLO format data_generator.vision.detection.2d 1.0.0 104f9721404c4653f2eddbe04023b4858a72aff3cec2937e05c714ac1d2d91c8 3 BYOD from Url - KITTI format data_generator.vision.detection.2d 1.0.0 6305d7f53f4ae0bc7b669cb105172621f25daaed3934a96063a02dd489ebbf1a 4 BYOD - PASCAL format data_generator.vision.detection.2d 1.0.0 26c7b7a439669ce524a8d1128c51df0385bb09460eb26f174e61f5e0dd43ba8a 5 BYOD - COCO format data_generator.vision.detection.2d 1.0.0 f1b19f142448b17b02883bcc1663032ad8240928816f4f7db18ff86ba5651238 6 BYOD - YOLO format data_generator.vision.detection.2d 1.0.0 cac615b7fb149b5ea1a270c4782eb59202d40e20f77f3caa45e6ce40b221394c 7 BYOD - KITTI format data_generator.vision.detection.2d 1.0.0 2eeaaf51d0243838e482dc8f84d8ca3f9c180121c4e281f1758bca5e586b10a5 8 VAID (Overhead Car Detection) data_generator.vision.detection.2d 1.0.0 7a04de11614146936318448b40446d2dde40b8ddf2bd695e968778ab35ff36b2 9 Pothole Detection (data/sets/kaggle/detection/andrewmvd-pothole-detection) data_generator.vision.detection.2d 1.0.0 7965dc0775eedd4191928e5a9a376ab9617bc72fa0c032459fe2dfef5c90738c 10 COCO data_generator.vision.detection.2d 1.0.0 6e5bcaa5ddb35c0dd91cead96615c5859ea6f207cd0c6a121cc49cf9e85852db 11 Smoke (data/sets/url/detection/smoke-pascal-like) data_generator.vision.detection.2d 1.0.0 64b984853aefcba0f0bad3315f83bdd034215d6c8f12e833d8242a0ea3fb0d36 12 Fire and smoke (data/sets/url/detection/fire-and-smoke-coco-like) data_generator.vision.detection.2d 1.0.0 f95a890cc9c2102d0ae2268e2dd771ee6da23608650f0627d2ec459f1b23fc44 13 PASCAL VOC data_generator.vision.detection.2d 1.0.0 d91692760a3fa454bc5aad0ae247fc320c97ad5d0f57fa3d334bac23298bd953 14 COCO Car Detection (data/sets/kaggle/detection/coco-car-dataset) data_generator.vision.detection.2d 1.0.0 a7c5c41ec12a9c70f6b8264226968b7498bc9b5d2df8914b0aa5518fd3b661c1 15 Dials and gauges (data/sets/url/detection/dials-and-gauges-pascal-like) data_generator.vision.detection.2d 1.0.0 f4271f5edda5ead883767c87bd6564b1eccf4b9021ce36633ecf3d156956923e 16 Dice, Car, Battery Detection (data/sets/kaggle/detection/kitti-dice-car-battery-detection-new) data_generator.vision.detection.2d 1.0.0 0a0fdb2af87d607288594bc486b767480ac88fab9c04323faa5d74bb15ad9593 17 Ship Detection (data/sets/kaggle/detection/pascal-ship-dataset) data_generator.vision.detection.2d 1.0.0 6ab1c630fc6f71b72489354a10ca8fafe65570334a71f0a7734bf56325d0cc58 18 Zenodo wheat (data/sets/url/detection/zenodo-wheat) data_generator.vision.detection.2d 1.0.0 d9ab9d814df4321828ec0336c3f9ae926362eb977fa07d74871f1b054179d860 19 Attach FiftyOne Dataset data_generator.vision.detection.2d 1.0.0 7a12bd14e3ffe82142f62ddae46c9f8bdb0d92c6c0a9fb4c3ae5d50a49fd6908 20 Car Detection (data/sets/kaggle/detection/sshikamaru-car-object-detection) data_generator.vision.detection.2d 1.0.0 3ea6d7e169e173e42987169b929a6296175d18f891523463f8695ceabb1d3763 21 Face Mask Detection (data/sets/kaggle/detection/andrewmvd-face-mask-detection) data_generator.vision.detection.2d 1.0.0 f5737272f8595201f098a40490631b89416f20a3a9c6112ec3421405a698ea2b 22 Dice Detection (data/sets/kaggle/detection/pascal-dice-dataset) data_generator.vision.detection.2d 1.0.0 ee99df1f35e58cf28bf78049eda3da6a42df44a507a86d23cf66890d05b934ef 23 Fruits Detection (data/sets/kaggle/detection/mbkinaci-fruit-images-for-object-detection) data_generator.vision.detection.2d 1.0.0 8ecee62ebce897e400bb6d78e57e8724e83e943ba41b5687b6600e0a191cbe99 24 Smoke Detection (data/sets/kaggle/detection/pascal-smoke-dataset) data_generator.vision.detection.2d 1.0.0 f0fe4e825584c227287d20e0b3b870aa238e31af4e269cbb28183402d2d6245a 25 Composite - Mosaic data_generator.vision.detection.2d 1.3.0 f2ac92f31f08c8219590ea89a37d9b9b1bc21160af22a70b524b70c5ef80e140 26 Composite - Random subset data_generator.vision.detection.2d 1.1.0 5c1fb1649a698eeaa89b459ed07646065b56b59f7e2c932ee722a9cc0039a9af 27 Composite - Data joiner data_generator.vision.detection.2d 1.1.0 413dbe101b5a31107994d52fe45ad2ed7fd55623df05c4cff640227392512ff8 28 Composite - Matting data_generator.vision.detection.2d 1.2.0 495590fd21e17c66705e69820f4416bf1c46bf08e01c6c47d3713f054b4f0b58 29 Composite - Class selector data_generator.vision.detection.2d 1.1.0 d11aa4d1d87054674999de3bafb22370582bb73e179f0474ec8b9d25b201793f 30 Unlabeled Dataset data_generator.vision.detection.2d 1.0.2 8f26f69bbdbdd5c8bd8d8c960d9449c5a415507383dc1b27a545f37c97da49f9 > Use recipe.assign_ingredients('data_generator', ingredient_name) to add it to the recipe. > Or alternatively, use recipe['data_generator'] = ingredient_id.
Assigning Ingredients to Your Recipe:
To assign a data generator to your recipe, use the assign_ingredients
method. This approach is recommended when building a recipe from scratch using one of our provided datasets.
recipe.assign_ingredients('data_generator', "COCO Car Detection")
[{'choice_id': 'a7c5c41ec12a9c70f6b8264226968b7498bc9b5d2df8914b0aa5518fd3b661c1', 'choice_name': 'COCO Car Detection (data/sets/kaggle/detection/coco-car-dataset)', 'synonym': 'data_generator', 'parent': 'Basic Adaptor', 'slot': 'slot:module.dataset_generator', 'path': ['slot:data', 'slot:module.dataset_generator']}]
For more details on initializing an empty recipe for your tasks, refer to the Recipe Creators documentation.
Note: The
assign_ingredients
function is best used when creating a new recipe, as it clears and initializes preprocessing steps such as augmentations from scratch.
If you are modifying one of our pre-validated "golden recipes" and wish to retain advanced augmentations like mosaicing that contribute to optimal performance, use the replace_data_generator
method instead:
recipe = rd.create.from_recipe_id('44702', pantry=pantry, allow_upgrade=True)
data = rd.helpers.data.get_data_generator_by_name(pantry=pantry, regex_ingredient_name="COCO Car Detection")
rd.helpers.data.replace_data_generator(recipe, data)
Skipped downloading goldenrecipedb with name "xval_det" and variant "Xval0.3" (0), as it already exists. This is the Cross-validation volume. Available methods are- get_golden_df describe_table
Example Dataset: Road Sign Detection (Kaggle)¶
The steps below will help you retrieve and set up the Road Sign Detection dataset. You can download it directly from Kaggle.
Steps to Use Your Own Dataset:¶
Download your dataset:
- If using the Road Sign Detection dataset from Kaggle, navigate to the dataset page, log in with your Kaggle credentials, and click "Download."
- Unzip the downloaded file and place the dataset in a local directory.
Set the
root_path
for your dataset:- After unzipping, set the
root_path
in your code to point to the folder containing your dataset. - Example for the Road Sign Detection dataset:
root_path = "/path/to/road_sign_detection_dataset/"
- After unzipping, set the
Supported Dataset Formats:
- LEIP Design supports various formats such as YOLO, COCO, and PASCAL.
- You can also integrate datasets from FiftyOne.
If your dataset is in any of these formats, you can easily ingest it into LEIP Design using the provided helpers.
Ingest the Dataset into the Recipe:
- Once the dataset is prepared, you can attach it to the recipe using our data ingestion helpers:
data = rd.helpers.data.new_pascal_data_generator() # fill based on docs rd.helpers.data.replace_data_generator(recipe, data)
- Once the dataset is prepared, you can attach it to the recipe using our data ingestion helpers:
BYOD Example: Road Sign Detection Dataset¶
For convenience, if you are using the Road Sign Detection dataset, you can mirror it by running the following command:
# Create a new data generator for the Pascal VOC dataset format - ensure root_path is set
data = rd.helpers.data.new_pascal_data_generator(
pantry=pantry,
root_path="${paths.cache_dir}/road-sign-data",
images_dir="images",
annotations_dir="annotations",
nclasses=4,
is_split=False,
trainval_split_ratio=0.80,
trainval_split_seed=42,
dataset_name="road-sign-data",
download_url="https://s3.us-west-1.amazonaws.com/leip-showcase.latentai.io/recipes/andrewmvd_road-sign-detection.zip" # skip if pre-downloaded
)
rd.helpers.data.replace_data_generator(recipe, data)
recipe["data_generator"]
VBox(children=(HTML(value='\n<style>\n .recipe-accordion-style > div[class*="jupyter-widget-Accordion-"] > …
Additional Resources:¶
Once your dataset is loaded, you can proceed to training the recipe just like any other dataset supported in LEIP Design.