Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use drop_last=True in test (and val) dataloader? #7

Closed
oguiza opened this issue Jan 10, 2023 · 4 comments
Closed

Why use drop_last=True in test (and val) dataloader? #7

oguiza opened this issue Jan 10, 2023 · 4 comments

Comments

@oguiza
Copy link

oguiza commented Jan 10, 2023

Hi,
First, thanks for the excellent paper and for sharing this repo. Great work!

I want to ask why do you set the test dataloader drop_last=True? By doing that performance is not reported for all samples in the test dataset (some samples will be dropped, which is not what we want). In addition to this, changes in the batch size would lead to reporting performance on a different number of samples.
I've tested the difference by setting drop_last=False with the ILI dataset and the result is worse than the published one, although it still is the best-published result I've seen so far.

I saw the same issue in the Autoformer repo and logged an issue (thuml/Autoformer#104). As a result they've now updated the code.

BTW, this is likely to also occur with other papers that seem to use a similar code base to Autoformer's.

@namctin
Copy link
Collaborator

namctin commented Jan 11, 2023

Hi @oguiza, that is a great catch. Thank you very much! Our thought was to use the same setting in the dataloader they had for a fair comparison. But you are correct that setting drop_last=True will lead to the number of test samples not to be consistent with different choices of the batch size, although this difference may not cause a significant performance drop. We will change that setting in the code.

For self-supervised learning, we accidentally use the default drop_last=False which turns out to be the right choice, and the reported self-supervised results in the paper is for this setting.

@oguiza
Copy link
Author

oguiza commented Jan 11, 2023

Thanks for your quick response @namctin.
I just run the script on the ILI dataset and the difference was significant in my opinion. If the performance with self-supervised is measured with all test samples as you say, this might indicate that the difference between supervised and self-supervised would be bigger than what's reflected in your paper. I'm sharing this just in case you want to investigate it further.

@namctin
Copy link
Collaborator

namctin commented Jan 12, 2023

Thank you for confirming the result @oguiza! Since ILI is a fairly small dataset, so the setting of drop_last can affect the result. I would think this will be diminished for larger datasets. But in any case, we should set the drop_last=False and we will fix that in the code.

@ts-kim
Copy link

ts-kim commented Mar 31, 2023

I discovered that a number of the performance gain in the supervised patchTST comes from using 'drop_last=True' for the test dataset.

With 'drop_last=False', for the ETTh1, Traffic, and Illness datasets, there was a drop in performance, while there was no drop for the Weather dataset.
I have not yet tested on the other datasets.

I think this can be a significant problem, as it would require the most of the tables in the paper to be rewritten and this could be considered as a cheating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants