Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrongly valid on test_loader. Unfair evaluation. #27

Closed
dqgdqg opened this issue Sep 13, 2022 · 4 comments
Closed

Wrongly valid on test_loader. Unfair evaluation. #27

dqgdqg opened this issue Sep 13, 2022 · 4 comments

Comments

@dqgdqg
Copy link

dqgdqg commented Sep 13, 2022

It seems like the validation is on test_loader while not vali_loader, which is unfair to some extent and would make the results a little bit different.

vali_loss1, vali_loss2 = self.vali(self.test_loader)

Moreover, directly using thre_loader to find thresholds would cause test datasets leakage, since thre_loader is built on test_data while not valid_data.

for i, (input_data, labels) in enumerate(self.thre_loader):

else:
return np.float32(self.test[
index // self.step * self.win_size:index // self.step * self.win_size + self.win_size]), np.float32(
self.test_labels[index // self.step * self.win_size:index // self.step * self.win_size + self.win_size])

@dqgdqg dqgdqg changed the title Wrongly valid on dataloader_test Wrongly valid on test_loader Sep 13, 2022
@dqgdqg dqgdqg changed the title Wrongly valid on test_loader Wrongly valid on test_loader. Unfair evaluation. Sep 13, 2022
@wuhaixu2016
Copy link
Collaborator

Thanks for your question!

Yes, you can split the dataset into three subsets, train, val and test. And then change this dataloader as valid_loader.

In our code, the valid loss only affects the early stop.
(1) All the baselines use the same early stop strategy. Fair comparison.
(2) All the experiments do not use the early stop at all. All the training processes will continue to the last epoch. Thus, you can view our code as "we use the same hyper-parameters without any extra information from validation set". Fair evaluation.

@dqgdqg
Copy link
Author

dqgdqg commented Sep 14, 2022

Thanks for your prompt reply!

In your code, you split the dataset into three subsets: train, val, and test. Actually, only train and test are used in your code, while the valid set is not used at all.

What's my concern is that the validation, the testing and the threshold selection are all evaluated on the test set in your code. It would be better if the validation and the threshold selection are evaluated on the valid set or whatever set not overlapped with the test set.

Please correct me if I misunderstood. Thanks.

@wuhaixu2016
Copy link
Collaborator

Yes, you are right. It is better if the validation and the threshold selection are evaluated on the validation set.
You can re-split the train set that we used for train and valid sets.

(1) Since our paper focuses on unsupervised settings, we merge train and valid at last to enlarge the dataset.
(2) You can also select the threshold on the train set. Actually, the final results will be the same whatever which subset you used for threshold selection.

@dqgdqg
Copy link
Author

dqgdqg commented Sep 15, 2022

Thanks for your explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants