Upload dataset images and labels and handle the split in HUB #120

mehlkelm · 2022-11-23T13:59:32Z

Search before asking

I have searched the HUB issues and found no similar feature requests.

Description

It would be helpful if I could upload datasets without managing a fixed train/valid/test split myself. The HUB could distribute the images randomly according to a fixed or desired ratio.

Use case

As a dataset manager I don't want to distribute jpeg and annotation manually into train / validation / test subfolders.

Additional

No response

github-actions · 2022-11-23T14:00:07Z

👋 Hello @mehlkelm, thank you for raising an issue about Ultralytics HUB 🚀! Please visit https://ultralytics.com/hub to learn more, and see our ⭐️ HUB Guidelines to quickly get started uploading datasets and training YOLOv5 models.

If this is a 🐛 Bug Report, please provide screenshots and steps to recreate your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

glenn-jocher · 2022-11-26T16:38:18Z

@mehlkelm got it. This is a good feature request, I'll add it to our product roadmap.

@kalenmike @AyushExel @sergiossm see idea above to allow users to upload a single dataset split and then allow HUB to automatically split it according to pre-set ratios, i.e. 90/10 train/val probably.

@mehlkelm we have an autosplit function in YOLOv5 you can use as a quick fix in the interim:
https://github.com/ultralytics/yolov5/blob/350e8eb69e01bb162ec0b22d1d13a1d1c2752853/utils/dataloaders.py#L961-L968

def autosplit(path=DATASETS_DIR / 'coco128/images', weights=(0.9, 0.1, 0.0), annotated_only=False):
    """ Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
    Usage: from utils.dataloaders import *; autosplit()
    Arguments
        path:            Path to images directory
        weights:         Train, val, test weights (list, tuple)
        annotated_only:  Only use images with an annotated txt file
    """

mehlkelm · 2022-11-27T10:52:52Z

Awesome, thanks

@mehlkelm we have an autosplit function in YOLOv5 you can use as a quick fix in the interim: https://github.com/ultralytics/yolov5/blob/350e8eb69e01bb162ec0b22d1d13a1d1c2752853/utils/dataloaders.py#L961-L968

I saw that, but assumed this is not applicable for training done from the hub?

glenn-jocher · 2022-11-27T13:07:23Z

@mehlkelm autosplit() isn't connected to anything right now, it's mainly there as a function that users can run manually on an existing dataset.

kalenmike · 2023-08-31T21:59:58Z

This feature request is being scheduled for work.

mehlkelm added the enhancement New feature or request label Nov 23, 2022

glenn-jocher added the todo Further action is needed by Ultralytics label Nov 26, 2022

kalenmike self-assigned this Dec 8, 2022

kalenmike changed the title ~~Autosplit dataset~~ Upload dataset images and labels and handle the split in HUB Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload dataset images and labels and handle the split in HUB #120

Upload dataset images and labels and handle the split in HUB #120

mehlkelm commented Nov 23, 2022

github-actions bot commented Nov 23, 2022

glenn-jocher commented Nov 26, 2022 •

edited

Loading

mehlkelm commented Nov 27, 2022

glenn-jocher commented Nov 27, 2022

kalenmike commented Aug 31, 2023

Upload dataset images and labels and handle the split in HUB #120

Upload dataset images and labels and handle the split in HUB #120

Comments

mehlkelm commented Nov 23, 2022

Search before asking

Description

Use case

Additional

github-actions bot commented Nov 23, 2022

glenn-jocher commented Nov 26, 2022 • edited Loading

mehlkelm commented Nov 27, 2022

glenn-jocher commented Nov 27, 2022

kalenmike commented Aug 31, 2023

glenn-jocher commented Nov 26, 2022 •

edited

Loading