Comparisons might not be consistent across batch sizes #25

rajatsen91 · 2023-03-14T21:31:40Z

The prediction and actual sizes here are not the same across batch sizes. This means that the metrics calculated are not exactly comparable across different models trained with different batch sizes.

Therefore if my understanding is correct all models might need to be reevaluated.

I think the culprit is here: https://github.com/cure-lab/LTSF-Linear/blob/main/data_provider/data_factory.py#L20

The drop_last should be False during the test evaluations.

ikvision · 2023-03-14T22:05:10Z

Indeed for different batch sizes the test set will be different as we drop the last one.
I think this issues is similar to #7

rajatsen91 · 2023-03-14T22:40:27Z

Yes indeed. Sorry I missed the previous issue. I also found that this does change results somewhat significantly in ETTh2 dataset as well.

namctin · 2023-03-15T18:36:38Z

Yes that is a valid question. We noticed that when @oguiza raised the issue. With self-supervised model, the result is with respect to drop_last=False and you can see the performance did not affect much, as shown in the paper.

yuqinie98 closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparisons might not be consistent across batch sizes #25

Comparisons might not be consistent across batch sizes #25

rajatsen91 commented Mar 14, 2023 •

edited

ikvision commented Mar 14, 2023

rajatsen91 commented Mar 14, 2023

namctin commented Mar 15, 2023

Comparisons might not be consistent across batch sizes #25

Comparisons might not be consistent across batch sizes #25

Comments

rajatsen91 commented Mar 14, 2023 • edited

ikvision commented Mar 14, 2023

rajatsen91 commented Mar 14, 2023

namctin commented Mar 15, 2023

rajatsen91 commented Mar 14, 2023 •

edited