Why use drop_last=True in test (and val) dataloader? #7

oguiza · 2023-01-10T10:20:53Z

Hi,
First, thanks for the excellent paper and for sharing this repo. Great work!

I want to ask why do you set the test dataloader drop_last=True? By doing that performance is not reported for all samples in the test dataset (some samples will be dropped, which is not what we want). In addition to this, changes in the batch size would lead to reporting performance on a different number of samples.
I've tested the difference by setting drop_last=False with the ILI dataset and the result is worse than the published one, although it still is the best-published result I've seen so far.

I saw the same issue in the Autoformer repo and logged an issue (thuml/Autoformer#104). As a result they've now updated the code.

BTW, this is likely to also occur with other papers that seem to use a similar code base to Autoformer's.

namctin · 2023-01-11T04:10:38Z

Hi @oguiza, that is a great catch. Thank you very much! Our thought was to use the same setting in the dataloader they had for a fair comparison. But you are correct that setting drop_last=True will lead to the number of test samples not to be consistent with different choices of the batch size, although this difference may not cause a significant performance drop. We will change that setting in the code.

For self-supervised learning, we accidentally use the default drop_last=False which turns out to be the right choice, and the reported self-supervised results in the paper is for this setting.

oguiza · 2023-01-11T11:21:52Z

Thanks for your quick response @namctin.
I just run the script on the ILI dataset and the difference was significant in my opinion. If the performance with self-supervised is measured with all test samples as you say, this might indicate that the difference between supervised and self-supervised would be bigger than what's reflected in your paper. I'm sharing this just in case you want to investigate it further.

namctin · 2023-01-12T04:06:39Z

Thank you for confirming the result @oguiza! Since ILI is a fairly small dataset, so the setting of drop_last can affect the result. I would think this will be diminished for larger datasets. But in any case, we should set the drop_last=False and we will fix that in the code.

ts-kim · 2023-03-31T08:40:42Z

I discovered that a number of the performance gain in the supervised patchTST comes from using 'drop_last=True' for the test dataset.

With 'drop_last=False', for the ETTh1, Traffic, and Illness datasets, there was a drop in performance, while there was no drop for the Weather dataset.
I have not yet tested on the other datasets.

I think this can be a significant problem, as it would require the most of the tables in the paper to be rewritten and this could be considered as a cheating.

yuqinie98 closed this as completed Jan 12, 2023

ikvision mentioned this issue Mar 14, 2023

Comparisons might not be consistent across batch sizes #25

Closed

rajatsen91 mentioned this issue Mar 15, 2023

Comparisons are not consistent across different batch sizes cure-lab/LTSF-Linear#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use drop_last=True in test (and val) dataloader? #7

Why use drop_last=True in test (and val) dataloader? #7

oguiza commented Jan 10, 2023

namctin commented Jan 11, 2023

oguiza commented Jan 11, 2023

namctin commented Jan 12, 2023

ts-kim commented Mar 31, 2023

Why use drop_last=True in test (and val) dataloader? #7

Why use drop_last=True in test (and val) dataloader? #7

Comments

oguiza commented Jan 10, 2023

namctin commented Jan 11, 2023

oguiza commented Jan 11, 2023

namctin commented Jan 12, 2023

ts-kim commented Mar 31, 2023