-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce model results #17
Comments
And can you also give some insights on how the evaluation metrics in the code corresponding to the ones reported in the paper? And also got a little confused why overall is reported for close setting, F-measure is reported for open setting. |
Hello @JasAva besides the randomness of each training session, I think the version of pytorch might also causing troubles sometimes. In addition, I am thinking maybe the learing rate we published is a little bit different than the ones we used for the experiments. Sometimes the numbers can be mixed. We are very sorry about this. About the F-measure, we follow this paper: https://arxiv.org/abs/1511.06233 , please check it out. Thank you very much. |
Hi @zhmiao , Thanks for answering, I also think this might caused by the learning rate. |
hi, @zhmiao I also have the same problem I cann't repeat your result by this version code, so may you provide you used lr of feature network and classifier network respectively? Thanks a lot. |
@zhmiao thanks for your quickly reply. May you also provide the detailed hyper-parameters, I think many researcher also would like to repeat the experiments by theirselves. tks |
@JasAva dear, have you reproduce the results as claimed in the paper, may you share some insights to me? |
@zhmiao Thanks for updating the models. Just curious, there seems no changes in the code itself (you mentioned there is a bug somewhere in the MetaEmbeddingClassifier?), are the reimplemented models are obtained using the current release? |
@zhmiao Thanks for updating, however, the method of producing the results claimed in your paper is more appreciated other than just post a re-pretrain model weights, because peoples will doubt the performances of the re-pretrain model weights maybe benefits from a larger datasets other than the algorithm itself? |
@JasAva Yes, we have gone through the code, it seems that there there was a bug in the evaluation functions instead of the classifier. |
@zhmiao hi,the bug you mentioned in the evaluation functions is replace the>> and << as >>+ and <<= in the function of shot_acc(). |
@JasAva @jchhuang @drcege Hello! Sorry for the late reply! As described in #50 (comment) , we finally debugged the published code and current open set performance is: ============ Evaluation_accuracy_micro_top1: 0.361 ========== This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again. For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible. |
Thanks for the inspiring work and code :)
I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?
The results I have reproduced are as following:
Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.204
Averaged F-measure: 0.160
Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006
Stage1(open-setting):
Open-set Accuracy: 0.178
Evaluation_accuracy_micro_top1: 0.199
Averaged F-measure: 0.291
Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006
Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.339
Averaged F-measure: 0.322
Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167
Stage2(open-setting):
Open-set Accuracy: 0.245
Evaluation_accuracy_micro_top1: 0.327
Averaged F-measure: 0.455
Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159
Stage1(close-setting):
Evaluation_accuracy_micro_top1: 0.268
Averaged F-measure: 0.248
Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058
Stage1(open-setting):
Open-set Accuracy: 0.018
Evaluation_accuracy_micro_top1: 0.267
Averaged F-measure: 0.373
Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057
Stage2(close-setting):
Evaluation_accuracy_micro_top1: 0.349
Averaged F-measure: 0.338
Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263
Stage2(open-setting):
Open-set Accuracy: 0.120
Evaluation_accuracy_micro_top1: 0.342
Averaged F-measure: 0.477
Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254
The text was updated successfully, but these errors were encountered: