Wrongly valid on test_loader. Unfair evaluation. #27

dqgdqg · 2022-09-13T15:55:50Z

It seems like the validation is on test_loader while not vali_loader, which is unfair to some extent and would make the results a little bit different.

Anomaly-Transformer/solver.py

Line 196 in 72a71e5

vali_loss1, vali_loss2 = self.vali(self.test_loader)

Moreover, directly using thre_loader to find thresholds would cause test datasets leakage, since thre_loader is built on test_data while not valid_data.

Anomaly-Transformer/solver.py

Line 254 in 72a71e5

for i, (input_data, labels) in enumerate(self.thre_loader):

Anomaly-Transformer/data_factory/data_loader.py

Lines 66 to 69 in 72a71e5

    
           else: 
        
               return np.float32(self.test[ 
        
                                 index // self.step * self.win_size:index // self.step * self.win_size + self.win_size]), np.float32( 
        
                   self.test_labels[index // self.step * self.win_size:index // self.step * self.win_size + self.win_size])

The text was updated successfully, but these errors were encountered:

wuhaixu2016 · 2022-09-14T02:46:53Z

Thanks for your question!

Yes, you can split the dataset into three subsets, train, val and test. And then change this dataloader as valid_loader.

In our code, the valid loss only affects the early stop.
(1) All the baselines use the same early stop strategy. Fair comparison.
(2) All the experiments do not use the early stop at all. All the training processes will continue to the last epoch. Thus, you can view our code as "we use the same hyper-parameters without any extra information from validation set". Fair evaluation.

dqgdqg · 2022-09-14T07:39:57Z

Thanks for your prompt reply!

In your code, you split the dataset into three subsets: train, val, and test. Actually, only train and test are used in your code, while the valid set is not used at all.

What's my concern is that the validation, the testing and the threshold selection are all evaluated on the test set in your code. It would be better if the validation and the threshold selection are evaluated on the valid set or whatever set not overlapped with the test set.

Please correct me if I misunderstood. Thanks.

wuhaixu2016 · 2022-09-14T08:29:06Z

Yes, you are right. It is better if the validation and the threshold selection are evaluated on the validation set.
You can re-split the train set that we used for train and valid sets.

(1) Since our paper focuses on unsupervised settings, we merge train and valid at last to enlarge the dataset.
(2) You can also select the threshold on the train set. Actually, the final results will be the same whatever which subset you used for threshold selection.

dqgdqg · 2022-09-15T08:23:13Z

Thanks for your explanation.

dqgdqg changed the title ~~Wrongly valid on dataloader_test~~ Wrongly valid on test_loader Sep 13, 2022

dqgdqg changed the title ~~Wrongly valid on test_loader~~ Wrongly valid on test_loader. Unfair evaluation. Sep 13, 2022

dqgdqg closed this as completed Sep 15, 2022

carrtesy mentioned this issue Nov 11, 2022

questions on validation set and threshold selection algorithm #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrongly valid on test_loader. Unfair evaluation. #27

Wrongly valid on test_loader. Unfair evaluation. #27

dqgdqg commented Sep 13, 2022 •

edited

wuhaixu2016 commented Sep 14, 2022

dqgdqg commented Sep 14, 2022 •

edited

wuhaixu2016 commented Sep 14, 2022

dqgdqg commented Sep 15, 2022

Wrongly valid on test_loader. Unfair evaluation. #27

Wrongly valid on test_loader. Unfair evaluation. #27

Comments

dqgdqg commented Sep 13, 2022 • edited

wuhaixu2016 commented Sep 14, 2022

dqgdqg commented Sep 14, 2022 • edited

wuhaixu2016 commented Sep 14, 2022

dqgdqg commented Sep 15, 2022

dqgdqg commented Sep 13, 2022 •

edited

dqgdqg commented Sep 14, 2022 •

edited