-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133
Comments
Please find the gist of colab when tried executing the given code.Thanks! |
@allenlavoie Any idea on how this would happen (for gradients)? |
I don't think this has anything to do with gradient infrastructure, which conceptually is just queuing up some ops. Sounds like some op used in a gradient does not give the same result every time. We don't generally guarantee exact results; if you're using deterministic CuDNN, possibly we're not using CuDNN in some case? @iganichev (who works on GPUs) could you decide whether this is a problem, or if epsilon differences are expected here? |
There can be many reasons for non-determinism. As Allen pointed out TF uses many libraries and hand-written kernels besides cuDNN on GPU including Eigen and cuBLAS. For example, in certain convolutions, it is faster to execute them using a GEMM function in cuBLAS. In general, getting TF to behave deterministically is pretty hard. This is a known issue. Does this non-determinism cause a serious issue? |
Closing this based on above comments. Thanks all! |
System information
Describe the current behavior
(1) The following code produces the same 'numpy_data0.pkl', 'initial_params0.pkl', 'loss0.pkl' all the times (which means same data, same parameter, same loss), but 'grad0.pkl' changes. I checked it with 'diff' command between generated files.
(2) It seems only with tensorflow 2.0 GPU version, this happens. I checked the code with tf-nightly-2.0-preview==2.0.0.dev20190830 (CPU version), it was ok. (= shows deterministic result)
(3) Using custom dense layer + tf.keras.layers.ReLU() was ok also. (= shows deterministic result) Custom dense layer was
And net with
(+) When 'use_bias=False' option applied on hidden layers, is was ok. (= shows deterministic result)
Describe the expected behavior
Since CUDNN force to behave determinisically (os.environ['TF_CUDNN_DETERMINISTIC'] = 'true'), and all the data/parameter/loss are the same, grad is expected to be same.
Code to reproduce the issue
The text was updated successfully, but these errors were encountered: