TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

movinghoon · 2019-08-31T14:08:42Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): pip install tf-nightly-gpu-2.0-preview==2.0.0.dev20190826
TensorFlow version (use command below): v1.12.1-9705-g0fbc138 2.0.0-dev20190826
Python version: 3.6.9
CUDA/cuDNN version: 10.0.0/7.3.1
GPU model and memory: Titan Xp 11Gb

Describe the current behavior
(1) The following code produces the same 'numpy_data0.pkl', 'initial_params0.pkl', 'loss0.pkl' all the times (which means same data, same parameter, same loss), but 'grad0.pkl' changes. I checked it with 'diff' command between generated files.
(2) It seems only with tensorflow 2.0 GPU version, this happens. I checked the code with tf-nightly-2.0-preview==2.0.0.dev20190830 (CPU version), it was ok. (= shows deterministic result)
(3) Using custom dense layer + tf.keras.layers.ReLU() was ok also. (= shows deterministic result) Custom dense layer was

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs
    def build(self, input_shape):
        self.kernel = self.add_variable("kernel", initializer=tf.keras.initializers.GlorotUniform(),
                                        shape=[int(input_shape[-1]),
                                               self.num_outputs])
        self.bias = self.add_variable("bias", initializer=tf.zeros_initializer,
                                        shape=[self.num_outputs])
    def call(self, input):
        return tf.matmul(input, self.kernel) + self.bias

And net with

net = tf.keras.Sequential()
net.add(MyDenseLayer(100))
net.add(tf.keras.layers.ReLU())
net.add(MyDenseLayer(100))
net.add(tf.keras.layers.ReLU())
net.add(MyDenseLayer(1))
net.build((None, input_dim))

(+) When 'use_bias=False' option applied on hidden layers, is was ok. (= shows deterministic result)

Describe the expected behavior
Since CUDNN force to behave determinisically (os.environ['TF_CUDNN_DETERMINISTIC'] = 'true'), and all the data/parameter/loss are the same, grad is expected to be same.

Code to reproduce the issue

import os
import pickle
import random
import numpy as np
import tensorflow as tf

os.environ['TF_CUDNN_DETERMINISTIC'] = 'true'

seed = 1234
np.random.seed(seed)
tf.random.set_seed(seed)
random.seed(seed)

# NN Model
input_dim = 5
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(100, activation=tf.nn.relu, kernel_initializer=None))
net.add(tf.keras.layers.Dense(100, activation=tf.nn.relu, kernel_initializer=None))
net.add(tf.keras.layers.Dense(1, activation=None, kernel_initializer=None))
net.build((None, input_dim))

# Initial v_params
initial_v_params = net.variables

# Update NN Model one-step
x = np.random.normal(loc=0, scale=1., size=[1000, input_dim])
y = np.random.normal(loc=0, scale=1., size=[1000])

with tf.GradientTape() as tape:
    loss = tf.reduce_mean(tf.square(y - net(x)))
grad = tape.gradient(loss, net.trainable_variables)

# Tag for comparing files
tag = 1

with open('./numpy_data{}.pkl'.format(tag), 'wb') as f:
    pickle.dump([x, y], f)

with open('./initial_params{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(initial_v_params, f)

with open('./loss{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(loss, f)

with open('./grad{}.pkl'.format(tag), 'wb') as f:
    pickle.dump(grad, f)

The text was updated successfully, but these errors were encountered:

oanush · 2019-09-03T08:51:11Z

Please find the gist of colab when tried executing the given code.Thanks!

tanzhenyu · 2020-03-31T09:52:24Z

@allenlavoie Any idea on how this would happen (for gradients)?

allenlavoie · 2020-03-31T16:15:15Z

I don't think this has anything to do with gradient infrastructure, which conceptually is just queuing up some ops. Sounds like some op used in a gradient does not give the same result every time. We don't generally guarantee exact results; if you're using deterministic CuDNN, possibly we're not using CuDNN in some case?

@iganichev (who works on GPUs) could you decide whether this is a problem, or if epsilon differences are expected here?

iganichev · 2020-03-31T22:10:22Z

There can be many reasons for non-determinism. As Allen pointed out TF uses many libraries and hand-written kernels besides cuDNN on GPU including Eigen and cuBLAS. For example, in certain convolutions, it is faster to execute them using a GEMM function in cuBLAS. In general, getting TF to behave deterministically is pretty hard. This is a known issue.

Does this non-determinism cause a serious issue?

tanzhenyu · 2020-04-10T04:14:06Z

Closing this based on above comments. Thanks all!

google-ml-butler · 2020-04-10T04:14:08Z

Are you satisfied with the resolution of your issue?
Yes
No

oanush self-assigned this Sep 3, 2019

oanush added comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug labels Sep 3, 2019

oanush assigned jvishnuvardhan and unassigned oanush Sep 6, 2019

jvishnuvardhan assigned tanzhenyu and unassigned jvishnuvardhan Sep 6, 2019

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 6, 2019

tanzhenyu assigned allenlavoie Mar 31, 2020

allenlavoie assigned iganichev and unassigned allenlavoie Mar 31, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 2, 2020

tanzhenyu closed this as completed Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

movinghoon commented Aug 31, 2019

oanush commented Sep 3, 2019 •

edited

Loading

tanzhenyu commented Mar 31, 2020

allenlavoie commented Mar 31, 2020

iganichev commented Mar 31, 2020

tanzhenyu commented Apr 10, 2020

google-ml-butler bot commented Apr 10, 2020

TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

TF 2.0 - Gradient of 'tf.keras.layers.Dense with bias' produces non-deterministic result #32133

Comments

movinghoon commented Aug 31, 2019

oanush commented Sep 3, 2019 • edited Loading

tanzhenyu commented Mar 31, 2020

allenlavoie commented Mar 31, 2020

iganichev commented Mar 31, 2020

tanzhenyu commented Apr 10, 2020

google-ml-butler bot commented Apr 10, 2020

oanush commented Sep 3, 2019 •

edited

Loading