Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAN in training process #7

Open
Nikumata opened this issue Jan 21, 2021 · 5 comments
Open

NAN in training process #7

Nikumata opened this issue Jan 21, 2021 · 5 comments

Comments

@Nikumata
Copy link

image
Hi, when I training the network of NWPU dataset, the results indicates NAN in all following cases. I set the training batch size to 6 for preventing out of memory.

@taohan10200
Copy link
Owner

You can lower the learning rate of the threshold encoder in config.py, such as 1e-7.

if __C.OPT == 'Adam':
    __C.LR_BASE_NET = 1e-5  # learning rate
    __C.LR_BM_NET =  1e-7    #1e-6  # learning rate'

Thanks for your attention!

@Nikumata
Copy link
Author

adjust the learning rate does works! Thanks for your reply.

@Nikumata
Copy link
Author

Hi taohan@taohan10200 , after lowing the learning rate, NAN still appeared after 87 iterations. I saved the model and weights every 20 iterations, and felt amazed that based on 80th model, the model can be trained normally without NAN. Do you have any good suggestions?

By the way, there is no read_pred_and_gt module in misc.utils.py, causes vis4val.py cannot work properly, would you please commit this part codes?Thanks。

@taohan10200
Copy link
Owner

In our training, NaN would appear even if we lowered the threshold some times. At this time, we usually lower the threshold again to avoid this problem. We recommend using the experimental configuration we provide under folder saved_exp_results. In general, it may be the inverse gradients that make the module's training is instability. We have tried to solve this problem by optimizing the threshold learner, but it is still in testing, and we will update the new solution in the future.

We have updated the read_pred_and_gt module in misc.utils.py.

Thanks~

@henbucuoshanghai
Copy link

where is the path of the save model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants