Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

Closed
movinghoon opened this issue Aug 31, 2019 · 6 comments
Assignees
Labels
comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@movinghoon
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip install tf-nightly-gpu-2.0-preview==2.0.0.dev20190826
  • TensorFlow version (use command below): v1.12.1-9705-g0fbc138 2.0.0-dev20190826
  • Python version: 3.6.9
  • CUDA/cuDNN version: 10.0.0/7.3.1
  • GPU model and memory: Titan Xp 11Gb

Describe the current behavior
(1) The following code produces the same 'numpy_data0.pkl', 'initial_params0.pkl', 'loss0.pkl' all the times (which means same data, same parameter, same loss), but 'grad0.pkl' changes. I checked it with 'diff' command between generated files.
(2) It seems only with tensorflow 2.0 GPU version, this happens. I checked the code with tf-nightly-2.0-preview==2.0.0.dev20190830 (CPU version), it was ok. (= shows deterministic result)
(3) Using custom dense layer + tf.keras.layers.ReLU() was ok also. (= shows deterministic result) Custom dense layer was

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs
    def build(self, input_shape):
        self.kernel = self.add_variable("kernel", initializer=tf.keras.initializers.GlorotUniform(),
                                        shape=[int(input_shape[-1]),
                                               self.num_outputs])
        self.bias = self.add_variable("bias", initializer=tf.zeros_initializer,
                                        shape=[self.num_outputs])
    def call(self, input):
        return tf.matmul(input, self.kernel) + self.bias

And net with

net = tf.keras.Sequential()
net.add(MyDenseLayer(100))
net.add(tf.keras.layers.ReLU())
net.add(MyDenseLayer(100))
net.add(tf.keras.layers.ReLU())
net.add(MyDenseLayer(1))
net.build((None, input_dim))

(+) When 'use_bias=False' option applied on hidden layers, is was ok. (= shows deterministic result)

Describe the expected behavior
Since CUDNN force to behave determinisically (os.environ['TF_CUDNN_DETERMINISTIC'] = 'true'), and all the data/parameter/loss are the same, grad is expected to be same.

Code to reproduce the issue

import os
import pickle
import random
import numpy as np
import tensorflow as tf

os.environ['TF_CUDNN_DETERMINISTIC'] = 'true'

seed = 1234
np.random.seed(seed)
tf.random.set_seed(seed)
random.seed(seed)

# NN Model
input_dim = 5
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(100, activation=tf.nn.relu, kernel_initializer=None))
net.add(tf.keras.layers.Dense(100, activation=tf.nn.relu, kernel_initializer=None))
net.add(tf.keras.layers.Dense(1, activation=None, kernel_initializer=None))
net.build((None, input_dim))

# Initial v_params
initial_v_params = net.variables

# Update NN Model one-step
x = np.random.normal(loc=0, scale=1., size=[1000, input_dim])
y = np.random.normal(loc=0, scale=1., size=[1000])

with tf.GradientTape() as tape:
    loss = tf.reduce_mean(tf.square(y - net(x)))
grad = tape.gradient(loss, net.trainable_variables)

# Tag for comparing files
tag = 1

with open('./numpy_data{}.pkl'.format(tag), 'wb') as f:
    pickle.dump([x, y], f)

with open('./initial_params{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(initial_v_params, f)

with open('./loss{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(loss, f)

with open('./grad{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(grad, f)
@oanush oanush self-assigned this Sep 3, 2019
@oanush oanush added comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug labels Sep 3, 2019
@oanush
Copy link

oanush commented Sep 3, 2019

Please find the gist of colab when tried executing the given code.Thanks!

@oanush oanush assigned jvishnuvardhan and unassigned oanush Sep 6, 2019
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 6, 2019
@tanzhenyu
Copy link
Contributor

@allenlavoie Any idea on how this would happen (for gradients)?

@allenlavoie
Copy link
Member

I don't think this has anything to do with gradient infrastructure, which conceptually is just queuing up some ops. Sounds like some op used in a gradient does not give the same result every time. We don't generally guarantee exact results; if you're using deterministic CuDNN, possibly we're not using CuDNN in some case?

@iganichev (who works on GPUs) could you decide whether this is a problem, or if epsilon differences are expected here?

@allenlavoie allenlavoie assigned iganichev and unassigned allenlavoie Mar 31, 2020
@iganichev
Copy link
Contributor

There can be many reasons for non-determinism. As Allen pointed out TF uses many libraries and hand-written kernels besides cuDNN on GPU including Eigen and cuBLAS. For example, in certain convolutions, it is faster to execute them using a GEMM function in cuBLAS. In general, getting TF to behave deterministically is pretty hard. This is a known issue.

Does this non-determinism cause a serious issue?

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 2, 2020
@tanzhenyu
Copy link
Contributor

Closing this based on above comments. Thanks all!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants