Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparisons #28

Closed
matttrd opened this issue Aug 6, 2019 · 4 comments
Closed

comparisons #28

matttrd opened this issue Aug 6, 2019 · 4 comments

Comments

@matttrd
Copy link

matttrd commented Aug 6, 2019

Hello, reading carefully your code (for ImageNet-LT), it seems that the plain models was trained in 30 epochs while your own model in 90 epochs (30 stage1 + 60 stage2). Could you please confirm this? And moreover, was all the comparisons (focal loss, etc) performed with 30 or 90 epochs?

Thanks!

@zhmiao
Copy link
Owner

zhmiao commented Aug 6, 2019

@matttrd Thanks for asking. The plain was only trained for 30 epochs and the rest of the methods were trained for 90 epochs. As discussed here: #4 (comment) , more epochs would not help the performance of plain model since it was already converged around 30 epochs.

@matttrd
Copy link
Author

matttrd commented Aug 7, 2019

@zhmiao thanks for answering. Probably I'm missing something but I trained (using your code) the plain model (stage 1) with 90 epochs with the usual drop of the learning rate every 30 epochs getting 32.7% of overall top1 test accuracy and:

  • few shot: 4%
  • Median: 23.9%
  • Many: 53.6%

I understand that the loss converges around 30 epochs but this is true only if you let the LR drop every 10 epochs as you did. I think this should be the fair comparison. Am I wrong?

@liuziwei7
Copy link
Collaborator

Thanks for reporting these results! Actually we have obtained some similar observations in our follow-up project.

A model trained on long-tailed dataset has some interesting behavior change with different initializations and epoches:

  1. Large learning rate tends to make the model bias to many-shot classes;
  2. Intermediate and late epoches are critical for the performance of few-shot classes.

We speculate the root of these phenomena come from the learning dynamics of long-tail-trained models. Therefore, we are considering update our manuscript to report a sequence of snapshot accuracies instead of a single final accuracy. It is yet an open question, and we believe it is definitely an interesting direction to further investigate.

Our aim of the open long-tailed recognition paper is to formally define and make clear this important real-world problem, instead of providing a silver-bullet solution. We welcome everyone to work on this topic and further improve the underlying approaches :)

@matttrd
Copy link
Author

matttrd commented Aug 7, 2019

@liuziwei7 thanks for your clarifications! We are actually working on a similar project and we obtained the same observations. I really like (and agree with) the approach of tackling open long-tailed recognition problems with a single framework. It is still an ongoing project but we were able to obtain good results with a theory-driven input sampling (without attention, hallucinator, etc). We will try our algorithm on your framework. Thanks again!

@matttrd matttrd closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants