New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DCN #73
Comments
It seems the grad explodes. Maybe tha gap between them is too largre. If the loss keeps too large, you can try to decrease the grad_norm to avoid explosion. |
Thanks for your reply! But it still grad explodes when I try to make the teacher and student model the same, that is make the model distill itself. I think this case may not be due to the gap between the teacher and student. I also try to distill the model itself without adding DCN, it trains well. |
I also try on your further fantastic work of MGD, it also has the same situation when I add DCN. |
Fine, I don't know the reason either. However, RepPoints can be trained with DCN in the congig |
OK, thanks a lot! I will do more experiments later. |
Hello, Thank you for your insight work! I have a question when I add a DCN structure for the teacher or student, e.g, faster_rcnn_r50_fpn_dconv_c3-c5, cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5, loss_fgd_fpn and total loss will become too large to train? I am wondering the reason behind it.
Here is the log output:
The text was updated successfully, but these errors were encountered: