-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comparisons #28
Comments
@matttrd Thanks for asking. The plain was only trained for 30 epochs and the rest of the methods were trained for 90 epochs. As discussed here: #4 (comment) , more epochs would not help the performance of plain model since it was already converged around 30 epochs. |
@zhmiao thanks for answering. Probably I'm missing something but I trained (using your code) the plain model (stage 1) with 90 epochs with the usual drop of the learning rate every 30 epochs getting 32.7% of overall top1 test accuracy and:
I understand that the loss converges around 30 epochs but this is true only if you let the LR drop every 10 epochs as you did. I think this should be the fair comparison. Am I wrong? |
Thanks for reporting these results! Actually we have obtained some similar observations in our follow-up project. A model trained on long-tailed dataset has some interesting behavior change with different initializations and epoches:
We speculate the root of these phenomena come from the learning dynamics of long-tail-trained models. Therefore, we are considering update our manuscript to report a sequence of snapshot accuracies instead of a single final accuracy. It is yet an open question, and we believe it is definitely an interesting direction to further investigate. Our aim of the open long-tailed recognition paper is to formally define and make clear this important real-world problem, instead of providing a silver-bullet solution. We welcome everyone to work on this topic and further improve the underlying approaches :) |
@liuziwei7 thanks for your clarifications! We are actually working on a similar project and we obtained the same observations. I really like (and agree with) the approach of tackling open long-tailed recognition problems with a single framework. It is still an ongoing project but we were able to obtain good results with a theory-driven input sampling (without attention, hallucinator, etc). We will try our algorithm on your framework. Thanks again! |
Hello, reading carefully your code (for ImageNet-LT), it seems that the plain models was trained in 30 epochs while your own model in 90 epochs (30 stage1 + 60 stage2). Could you please confirm this? And moreover, was all the comparisons (focal loss, etc) performed with 30 or 90 epochs?
Thanks!
The text was updated successfully, but these errors were encountered: