-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset not shuffled (when not in DDP mode) #5622
Comments
👋 Hello @werner-duvaud, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com. RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@werner-duvaud thanks for the detailed analysis. Yes I agree we probably want to shuffle by default (we already have enough command line arguments, no reason to add more here). The complication stems from the infinite dataloader in place and the nested shuffle arguments as you mentioned. The impact on default training is unclear as the follow-on mosaic images are not just shuffled by randomly selected from the dataset. Thank you for the PR! Since this impacts training we should train a model before and after the PR to quantify the effects before we can merge. |
Search before asking
YOLOv5 Component
Training
Bug
Hi,
Thanks for the great implementation.
I think there is a bug, the shuffling of the DataLoader during training (when not in DDP mode) is not working. The dataset is sampled in index order from 0 to dataset length.
See Minimal Reproducible Example, when printing the index in the
__getitem__
method in theLoadImagesAndLabels(Dataset)
class during training, the index goes from 0 to dataset length at each epoch.Because of that, samples are presented to the network in the sorted order of their filename.
Expected behaviour:
Index is sampled randomly.
Nb: Using the mosaic hyperparameter adds some add hoc randomness to the image returned by the dataloader but someone might disable it.
Environment
Minimal Reproducible Example
It will print index during epoch from 0 to dataset length.
Additional
Attempt to explain the bug:
It happens here:
yolov5/utils/datasets.py
Lines 111 to 120 in def7a0f
We are using
torch.utils.data.DataLoader
which according to the documentation has ashuffle
argument that defaults toFalse
. (Not a great PyTorch choice)So to have our dataset shuffled, we want to set
shuffle
toTrue
. But in distributed case, we are passing a custom sampler to thesampler
argument and from the documentation of the sampler argument:If specified, shuffle must not be specified.
.So a solutions, similar to what the pytorch documentation is doing at the end of here:
is to add the following argument to the datalaoder
shuffle=(sampler is None)
.Adding a shuffle argument to
create_dataloader
might be useful if we want to keep the current behaviour for the val task.Nb: It is also confusing that
DistributedSampler
has shuffle defaulting to True but not DataLoader.I have submitted the one line PR:
I know shuffling or not the training dataset can alter the results, I don't know if the yolov5v6 results used the shuffled dataset or not.
Maybe not shuffling the dataset is the desired feature but then DistributedSampler should not be shuffled too and adding a CLI option could be useful.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: