-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conv2d gives NaN gradients with float16 input #7226
Comments
It is normal to see that float16 doesn't have enough range especially in the beginning of the training. So this is an intended behavior instead of a bug... If you want to ask around about how to train with float 16, please go to stackoverflow... Thanks. |
Are you kidding me or what? How can it not have enough capacity when we start we just one convolution? Fine, let's modify an example. Choose conv layer with zero weights
and as well zero batch:
It still fails. Do you imply that float16 does not have enough capacity to backpropagate on zero batch through all zero convolution? It is clearly a bug somewhere in native code. Please reopen this issue, it can't be intended behaviour. |
For others with this issue, see here: Setting the Adam epsilon to 1e-4 works for me. |
Environment info
Operating System: Ubuntu 16 LTS
breaks already on CPU
If installed from binary pip package, provide:
python -c "import tensorflow; print(tensorflow.__version__)"
.If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)
So basically it breaks on the second step of SGD because loss is NaN. If I change dtype in float32, it works. It should have nothing to do with CUDA, I tested it on CPU version as well as on GPU with CUDA8, CuDNN5.1.
What other attempted solutions have you tried?
I have no idea what to try here. Now I continue with float32.
Logs or other output that would be helpful
(If logs are large, please upload as attachment or provide link).
The text was updated successfully, but these errors were encountered: