You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training DeFlow on my own generated data.
During the train I sometimes get a raise due to reaching maximum number of Nans of the NLL (100).
Also, on some of my validation images, I get Nans during training, and also on translate.py (and results are of course bad).
Do you have an idea of the root cause of those Nans, and maybe an idea of how to resolve them?
Thanks,
Mani
The text was updated successfully, but these errors were encountered:
We also ran into the problem NaNs arising in some training runs.
This problem arises when some learned scaling in the network becomes too large or too close to 0, which then lead an overflow/underflow in the computation.
The easiest thing you can try to avoid this is to decrease the learning rate as we found that this stabilises the training.
An other thing you can experiment with is to bound the learned scalings, we did however not find a good way to do so without decreasing performance.
I am training DeFlow on my own generated data.
During the train I sometimes get a raise due to reaching maximum number of Nans of the NLL (100).
Also, on some of my validation images, I get Nans during training, and also on translate.py (and results are of course bad).
Do you have an idea of the root cause of those Nans, and maybe an idea of how to resolve them?
Thanks,
Mani
The text was updated successfully, but these errors were encountered: