Reproduce model results #17

JasAva · 2019-07-21T02:37:40Z

Thanks for the inspiring work and code :)

I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?

The results I have reproduced are as following:

ImageNet-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.204
Averaged F-measure: 0.160
Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006

Stage1(open-setting):
Open-set Accuracy: 0.178
Evaluation_accuracy_micro_top1: 0.199
Averaged F-measure: 0.291
Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.339
Averaged F-measure: 0.322
Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167

Stage2(open-setting):
Open-set Accuracy: 0.245
Evaluation_accuracy_micro_top1: 0.327
Averaged F-measure: 0.455
Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159

Places-LT

Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.268
Averaged F-measure: 0.248
Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058

Stage1(open-setting):
Open-set Accuracy: 0.018
Evaluation_accuracy_micro_top1: 0.267
Averaged F-measure: 0.373
Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057

Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.349
Averaged F-measure: 0.338
Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263

Stage2(open-setting):
Open-set Accuracy: 0.120
Evaluation_accuracy_micro_top1: 0.342
Averaged F-measure: 0.477
Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254

JasAva · 2019-07-21T02:41:49Z

And can you also give some insights on how the evaluation metrics in the code corresponding to the ones reported in the paper? And also got a little confused why overall is reported for close setting, F-measure is reported for open setting.

zhmiao · 2019-07-23T03:33:14Z

Hello @JasAva besides the randomness of each training session, I think the version of pytorch might also causing troubles sometimes. In addition, I am thinking maybe the learing rate we published is a little bit different than the ones we used for the experiments. Sometimes the numbers can be mixed. We are very sorry about this. About the F-measure, we follow this paper: https://arxiv.org/abs/1511.06233 , please check it out. Thank you very much.

JasAva · 2019-07-23T06:02:17Z

Hi @zhmiao , Thanks for answering, I also think this might caused by the learning rate.
Moreover, can you provide the trained models for stage1 and stage2, I'd like to benchmark the reported results.

jchhuang · 2019-07-26T05:24:40Z

hi, @zhmiao I also have the same problem I cann't repeat your result by this version code, so may you provide you used lr of feature network and classifier network respectively? Thanks a lot.

zhmiao · 2019-07-26T05:30:02Z

@JasAva @jchhuang Yes. we will publish our pretrained models to maybe later this weekend or earlier next week. We will notify you as soon as they are published. Thanks.

jchhuang · 2019-07-26T05:33:02Z

@zhmiao thanks for your quickly reply. May you also provide the detailed hyper-parameters, I think many researcher also would like to repeat the experiments by theirselves. tks

jchhuang · 2019-07-31T03:22:08Z

@JasAva dear, have you reproduce the results as claimed in the paper， may you share some insights to me？

zhmiao · 2019-07-31T03:26:58Z

@JasAva @jchhuang We found some bugs in the current published code. It is somewhere in the MetaEmbeddingClassifier. It is caused by renaming the variables to be consistent with the paper during code releasing process. We will fix the bug asap. Thanks

zhmiao · 2019-08-05T20:04:00Z

@JasAva @jchhuang we posted a reimplemented imagenet-lt weights using current config, the numbers are very close to what we reported. We are reimplementing Places right now. Will keep you updated. Thanks

JasAva · 2019-08-06T00:21:32Z

@zhmiao Thanks for updating the models. Just curious, there seems no changes in the code itself (you mentioned there is a bug somewhere in the MetaEmbeddingClassifier?), are the reimplemented models are obtained using the current release?

jchhuang · 2019-08-06T00:54:40Z

@zhmiao Thanks for updating, however, the method of producing the results claimed in your paper is more appreciated other than just post a re-pretrain model weights, because peoples will doubt the performances of the re-pretrain model weights maybe benefits from a larger datasets other than the algorithm itself?

zhmiao · 2019-08-06T15:37:48Z

@JasAva Yes, we have gone through the code, it seems that there there was a bug in the evaluation functions instead of the classifier.

jchhuang · 2019-08-19T04:01:06Z

@zhmiao hi，the bug you mentioned in the evaluation functions is replace the>> and << as >>+ and <<= in the function of shot_acc().

zhmiao · 2019-12-19T13:51:47Z

@JasAva @jchhuang @drcege Hello! Sorry for the late reply! As described in #50 (comment) , we finally debugged the published code and current open set performance is:

============
Phase: test

Evaluation_accuracy_micro_top1: 0.361
Averaged F-measure: 0.501
Many_shot_accuracy_top1: 0.442 Median_shot_accuracy_top1: 0.352 Low_shot_accuracy_top1: 0.175

==========

This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again.

For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible.

zhmiao · 2020-02-11T21:54:58Z

@JasAva @jchhuang @drcege Hello, we have updated configuration files for Places. Currently, the reproduced results are a little better than reported. Please check out the updates. Thanks!

zhmiao mentioned this issue Jul 31, 2019

Code Error #7

Closed

zhmiao added the bug Something isn't working label Aug 2, 2019

JasAva closed this as completed Sep 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce model results #17

Reproduce model results #17

JasAva commented Jul 21, 2019

JasAva commented Jul 21, 2019

zhmiao commented Jul 23, 2019

JasAva commented Jul 23, 2019

jchhuang commented Jul 26, 2019

zhmiao commented Jul 26, 2019

jchhuang commented Jul 26, 2019

jchhuang commented Jul 31, 2019

zhmiao commented Jul 31, 2019

zhmiao commented Aug 5, 2019

JasAva commented Aug 6, 2019

jchhuang commented Aug 6, 2019

zhmiao commented Aug 6, 2019

jchhuang commented Aug 19, 2019

zhmiao commented Dec 19, 2019

zhmiao commented Feb 11, 2020

Reproduce model results #17

Reproduce model results #17

Comments

JasAva commented Jul 21, 2019

JasAva commented Jul 21, 2019

zhmiao commented Jul 23, 2019

JasAva commented Jul 23, 2019

jchhuang commented Jul 26, 2019

zhmiao commented Jul 26, 2019

jchhuang commented Jul 26, 2019

jchhuang commented Jul 31, 2019

zhmiao commented Jul 31, 2019

zhmiao commented Aug 5, 2019

JasAva commented Aug 6, 2019

jchhuang commented Aug 6, 2019

zhmiao commented Aug 6, 2019

jchhuang commented Aug 19, 2019

zhmiao commented Dec 19, 2019

zhmiao commented Feb 11, 2020