-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload dataset images and labels and handle the split in HUB #120
Comments
👋 Hello @mehlkelm, thank you for raising an issue about Ultralytics HUB 🚀! Please visit https://ultralytics.com/hub to learn more, and see our ⭐️ HUB Guidelines to quickly get started uploading datasets and training YOLOv5 models. If this is a 🐛 Bug Report, please provide screenshots and steps to recreate your problem to help us get started working on a fix. If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response. We try to respond to all issues as promptly as possible. Thank you for your patience! |
@mehlkelm got it. This is a good feature request, I'll add it to our product roadmap. @kalenmike @AyushExel @sergiossm see idea above to allow users to upload a single dataset split and then allow HUB to automatically split it according to pre-set ratios, i.e. 90/10 train/val probably. @mehlkelm we have an autosplit function in YOLOv5 you can use as a quick fix in the interim: def autosplit(path=DATASETS_DIR / 'coco128/images', weights=(0.9, 0.1, 0.0), annotated_only=False):
""" Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
Usage: from utils.dataloaders import *; autosplit()
Arguments
path: Path to images directory
weights: Train, val, test weights (list, tuple)
annotated_only: Only use images with an annotated txt file
""" |
Awesome, thanks
I saw that, but assumed this is not applicable for training done from the hub? |
@mehlkelm autosplit() isn't connected to anything right now, it's mainly there as a function that users can run manually on an existing dataset. |
This feature request is being scheduled for work. |
Search before asking
Description
It would be helpful if I could upload datasets without managing a fixed train/valid/test split myself. The HUB could distribute the images randomly according to a fixed or desired ratio.
Use case
As a dataset manager I don't want to distribute jpeg and annotation manually into train / validation / test subfolders.
Additional
No response
The text was updated successfully, but these errors were encountered: