Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce model results #17

Closed
JasAva opened this issue Jul 21, 2019 · 15 comments
Closed

Reproduce model results #17

JasAva opened this issue Jul 21, 2019 · 15 comments
Labels
bug Something isn't working

Comments

@JasAva
Copy link

JasAva commented Jul 21, 2019

Thanks for the inspiring work and code :)

I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?

The results I have reproduced are as following:

  1. ImageNet-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.204
Averaged F-measure: 0.160
Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006

Stage1(open-setting):
Open-set Accuracy: 0.178
Evaluation_accuracy_micro_top1: 0.199
Averaged F-measure: 0.291
Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.339
Averaged F-measure: 0.322
Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167

Stage2(open-setting):
Open-set Accuracy: 0.245
Evaluation_accuracy_micro_top1: 0.327
Averaged F-measure: 0.455
Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159

  1. Places-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.268
Averaged F-measure: 0.248
Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058

Stage1(open-setting):
Open-set Accuracy: 0.018
Evaluation_accuracy_micro_top1: 0.267
Averaged F-measure: 0.373
Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.349
Averaged F-measure: 0.338
Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263

Stage2(open-setting):
Open-set Accuracy: 0.120
Evaluation_accuracy_micro_top1: 0.342
Averaged F-measure: 0.477
Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254

@JasAva
Copy link
Author

JasAva commented Jul 21, 2019

And can you also give some insights on how the evaluation metrics in the code corresponding to the ones reported in the paper? And also got a little confused why overall is reported for close setting, F-measure is reported for open setting.

@zhmiao
Copy link
Owner

zhmiao commented Jul 23, 2019

Hello @JasAva besides the randomness of each training session, I think the version of pytorch might also causing troubles sometimes. In addition, I am thinking maybe the learing rate we published is a little bit different than the ones we used for the experiments. Sometimes the numbers can be mixed. We are very sorry about this. About the F-measure, we follow this paper: https://arxiv.org/abs/1511.06233 , please check it out. Thank you very much.

@JasAva
Copy link
Author

JasAva commented Jul 23, 2019

Hi @zhmiao , Thanks for answering, I also think this might caused by the learning rate.
Moreover, can you provide the trained models for stage1 and stage2, I'd like to benchmark the reported results.

@jchhuang
Copy link

hi, @zhmiao I also have the same problem I cann't repeat your result by this version code, so may you provide you used lr of feature network and classifier network respectively? Thanks a lot.

@zhmiao
Copy link
Owner

zhmiao commented Jul 26, 2019

@JasAva @jchhuang Yes. we will publish our pretrained models to maybe later this weekend or earlier next week. We will notify you as soon as they are published. Thanks.

@jchhuang
Copy link

@zhmiao thanks for your quickly reply. May you also provide the detailed hyper-parameters, I think many researcher also would like to repeat the experiments by theirselves. tks

@jchhuang
Copy link

@JasAva dear, have you reproduce the results as claimed in the paper, may you share some insights to me?

@zhmiao
Copy link
Owner

zhmiao commented Jul 31, 2019

@JasAva @jchhuang We found some bugs in the current published code. It is somewhere in the MetaEmbeddingClassifier. It is caused by renaming the variables to be consistent with the paper during code releasing process. We will fix the bug asap. Thanks

@zhmiao zhmiao mentioned this issue Jul 31, 2019
@zhmiao zhmiao added the bug Something isn't working label Aug 2, 2019
@zhmiao
Copy link
Owner

zhmiao commented Aug 5, 2019

@JasAva @jchhuang we posted a reimplemented imagenet-lt weights using current config, the numbers are very close to what we reported. We are reimplementing Places right now. Will keep you updated. Thanks

@JasAva
Copy link
Author

JasAva commented Aug 6, 2019

@zhmiao Thanks for updating the models. Just curious, there seems no changes in the code itself (you mentioned there is a bug somewhere in the MetaEmbeddingClassifier?), are the reimplemented models are obtained using the current release?

@jchhuang
Copy link

jchhuang commented Aug 6, 2019

@zhmiao Thanks for updating, however, the method of producing the results claimed in your paper is more appreciated other than just post a re-pretrain model weights, because peoples will doubt the performances of the re-pretrain model weights maybe benefits from a larger datasets other than the algorithm itself?

@zhmiao
Copy link
Owner

zhmiao commented Aug 6, 2019

@JasAva Yes, we have gone through the code, it seems that there there was a bug in the evaluation functions instead of the classifier.

@jchhuang
Copy link

@zhmiao hi,the bug you mentioned in the evaluation functions is replace the>> and << as >>+ and <<= in the function of shot_acc().

@JasAva JasAva closed this as completed Sep 14, 2019
@zhmiao
Copy link
Owner

zhmiao commented Dec 19, 2019

@JasAva @jchhuang @drcege Hello! Sorry for the late reply! As described in #50 (comment) , we finally debugged the published code and current open set performance is:

============
Phase: test

Evaluation_accuracy_micro_top1: 0.361
Averaged F-measure: 0.501
Many_shot_accuracy_top1: 0.442 Median_shot_accuracy_top1: 0.352 Low_shot_accuracy_top1: 0.175

==========

This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again.

For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible.

@zhmiao
Copy link
Owner

zhmiao commented Feb 11, 2020

@JasAva @jchhuang @drcege Hello, we have updated configuration files for Places. Currently, the reproduced results are a little better than reported. Please check out the updates. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants