-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offset mean is larger than 100 in PCD Align Module' DCNV2.Could you give me some advice to minimize it? #16
Comments
Same to me. |
Yes, indeed we also found the training with DCN is unstable. During the competition, we trained the large model from smaller ones and used a smaller learning rate for dcn. Even with these tricks, the over-large offsets are occasionally met. And we just resumed it from a normal checkpoint if we met. |
What dou mean that you trained the large model from smaller ones.Or this one:"We initialize deeper networks by parameters from shallower ones for faster convergence"in your paper. For instance. We use kaiming_normal initialize all parameter,then freeze TSA and Reconstruction Module,only request_grad in the PCD align and PreDeblur Module. Thanks for your attention. |
|
谢谢大佬的回复,确实是很厉害的工作和研究。 我们正在尝试先把可变卷积换成正常的卷积,然后训练得到的初始model,然后用这个模型训练网络。 |
Actually, DCN is relatively important. So you can first train a small network with DCN (w/o TSA). |
1、“We trained the large model from smaller ones and used a smaller learning rate for dcn.” 2、"You can first train a small network with DCN (w/o TSA)" 3、This pretrained-DCN-trick can't make the final model D with a deeper or wider(I mean, change the feature extraction layers before DCN) DCN module compared with model S, because DCN paramters are needed to be copied. Is it right? 4、For the second step, there are two choices for DCN. The first one, smaller lr for DCN. The second one, freeze DCN module. The second choice can save many time and GPU memory for training. Is it suitable? |
We have updated the training codes and configs. We provide training scripts for the model with Channel=128, Back RB=10.
You can try this. |
have you succeed?how about the effect? |
No description provided.
The text was updated successfully, but these errors were encountered: