-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to properly deal with 'Offset mean more than 100' Warning #91
Comments
Also, one of the suggestion seems to be using lower learning rate for PCD module -- can you please share the code file to do that. And also what is appropriate learning rate for PCD module? |
First of all, we find that 1) we can train 'M' models from scratch without such offset warnings. That is, the training is stable. 2) However, if we train 'L' models, it is very fragile and the offset warnings appear occasionally. If the offset is large than 100, it means that the offsets in dcn are wrongly predicted (too large offsets are meaningless ). The performance of these models is also poor. We think the reason is that when we train the large model with dcn, the offsets in dcn is more fragile. In the competition, we train such large models from smaller models (from C=64 models to C=128 models and then to B=40 models). Even with such training schemes, we still encounter the wrong/too large offsets. We do not have a nice solution and just stop it and resume from the nearest normal model. Here, normal model means that their offsets are normal and are not too large. The training procedures in the competitions are complex and actually we do not remember the concrete steps. We now provide the training schemes for the "M" models.
We are developing more stable and efficient models, but the work is still in progress. |
Thank you! this clears my doubts. |
For me it is becoming more an more frequent as the training progresses -- seems to be very unstable. |
For large models, the unstable offset phenomenon is indeed very frequent. 1) start from the most recent normal model 2) may try to use a smaller learning rate. (Sometimes, too large restart learning rate can also lead to this problem. You can use a smaller learning rate for restarts (by setting |
Hello, I am trying to train this chain C64B10woTSA -> C128B10woTSA -> C128B40woTSA -> C128B40wTSA. I encounter this kind of errors on the second stage
I am trying to change nf from 64 to 128, strict_load is false. Am I doing everything right? I searched this on the internet and it seems it is not possible to use load_state_dict() for loading nf=64 model into nf=128 model. Here is the config file which produces the error
Pretrained model G is same but nf=64 |
Hi Xinntao,
Can you please comment on how to specifically deal with the offset warning in the PCD alignment module? I am trying to train an 'L' model and have ambiguities from the details I could piece together from issues #16 and #22. I have put together the workflow required to train an 'L' model below, can you please comment about its correctness.
The text was updated successfully, but these errors were encountered: