tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

Palazor · 2018-03-28T02:28:30Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, simple short code
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): both Ubuntu 14.04 / Centos 7
TensorFlow installed from (source or binary): pip binary on Ubuntu, source on Centos
TensorFlow version (use command below): 1.4.1
Python version: 3.5.2
Bazel version (if compiling from source): release 0.8.1
GCC/Compiler version (if compiling from source): 4.8.5
CUDA/cuDNN version: 6.0.21
GPU model and memory: GTX 750 / GTX 1080
Exact command to reproduce: tf.sparse_tensor_dense_matmul

Describe the problem

Given a sparse tensor sp and a dense tensor mat, both of tf.float32,
Compute thier product with tf.sparse_tensor_dense_matmul(sp, mat),
The product varies slightly.

Source code / logs

import tensorflow as tf
import numpy as np

s = tf.Session()

num = 10
dim = 10
total_out = 100

indices = [
    [1, 0],
    [2, 0],
    [3, 0],
    [5, 0], [5, 1], [5, 2],
    [6, 0], [6, 1], [6, 2], [6, 3], [6, 4], [6, 7],
    [7, 0], [7, 1], [7, 2], [7, 7], [7, 8],
    [8, 0],
    [9, 0], [9, 1], [9, 2], [9, 7]
]
values = np.array([1.0] * len(indices), np.float32)
feature = tf.SparseTensor(indices, values, [tf.cast(num, tf.int64), tf.cast(dim, tf.int64)])

dense = tf.sparse_tensor_to_dense(feature, validate_indices=False)
mat = tf.contrib.stateless.stateless_random_uniform([dim, total_out], seed=[1, 2], dtype=tf.float32)
prod = tf.sparse_tensor_dense_matmul(feature, mat)
# prod2 = tf.sparse_matmul(dense, mat, False, True, True, False, name='cross_sum')

T = ['dense', 'mat', 'prod']
results = s.run([dense, mat, prod])

comp0 = []
comp1 = []
for i, r in enumerate(results):
    try:
        comp0.append(np.sum(np.load('npy_{}.npy'.format(T[i]))) - np.sum(r))
        comp1.append(np.load('npy_{}.npy'.format(T[i])) - r)
    except:
        np.save('npy_{}.npy'.format(T[i]), r)
for i in range(len(comp0)):
    print(T[i])
    print(comp0[i])
    print(comp1[i])
    print('\n')

Run the code several times, you will see that the product will vary slightly. like this:

dense
0.0
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


mat
0.0
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
...
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]]


prod
0.0
[[ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   2.3841858e-07 -4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -4.7683716e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  4.7683716e-07  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  2.3841858e-07  2.3841858e-07  0.0000000e+00
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
  -2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00 -2.3841858e-07  0.0000000e+00
  -2.3841858e-07  4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  2.3841858e-07  0.0000000e+00
   2.3841858e-07  0.0000000e+00  4.7683716e-07  2.3841858e-07
   0.0000000e+00  4.7683716e-07  2.3841858e-07  4.7683716e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
   2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  2.3841858e-07  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  2.3841858e-07  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  4.7683716e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
...
]

This only happens on GPU with float32. It should be a bug I guess.

The text was updated successfully, but these errors were encountered:

Palazor · 2018-03-30T09:16:57Z

After further test, I found that float64 also has the same problem, if the dense shape fo the sparse matrix is large enough.

fabregaszy · 2018-04-18T02:36:30Z

Any updates regarding with this issue?

tatatodd · 2018-05-17T00:58:54Z

Assigning to @asimshankar, who might be able to find someone to take a look.

asimshankar · 2018-06-18T20:33:12Z

@zheng-xq for triage

duncanriach · 2020-08-27T19:28:33Z

Hi, @wenscarl and I have reproduced this nondeterminism for fp32. We were not able to repro for fp64, and there does not seem to be any code above showing how to do that. We're reasonably confident that the source of nondeterminism is the use of CUDA atomicAdd in sparse_tensor_dense_matmul_op_gpu.cu.cc. I just wanted to let it be known that this item is on our radar and we plan to resolve it at some point.

Also, this source of nondeterminism has been documented in github/NVIDIA/framework-determinism.

sushreebarsa · 2022-01-04T12:02:59Z

@Palazor
We see that you are using old version of tensorflow ( 1.x) ,which is not actively supported, We recommend that you upgrade to 2.4 or later version.Attaching migration guide for reference. Thanks!

google-ml-butler · 2022-01-11T12:51:28Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2022-01-18T13:01:11Z

Closing as stale. Please reopen if you'd like to work on this further.

tensorflowbutler assigned tatatodd Apr 3, 2018

tatatodd assigned asimshankar and unassigned tatatodd May 17, 2018

asimshankar assigned zheng-xq and unassigned asimshankar Jun 18, 2018

asimshankar assigned chsigg Sep 3, 2018

tensorflowbutler unassigned zheng-xq Sep 17, 2018

tttthomasssss mentioned this issue Sep 29, 2021

Inconsistent GPU model results RasaHQ/rasa#9737

Closed

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Jan 4, 2022

sushreebarsa self-assigned this Jan 4, 2022

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jan 11, 2022

google-ml-butler bot closed this as completed Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

Palazor commented Mar 28, 2018 •

edited

Palazor commented Mar 30, 2018

fabregaszy commented Apr 18, 2018

tatatodd commented May 17, 2018

asimshankar commented Jun 18, 2018

duncanriach commented Aug 27, 2020 •

edited

sushreebarsa commented Jan 4, 2022

google-ml-butler bot commented Jan 11, 2022

google-ml-butler bot commented Jan 18, 2022

tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

Comments

Palazor commented Mar 28, 2018 • edited

System information

Describe the problem

Source code / logs

Palazor commented Mar 30, 2018

fabregaszy commented Apr 18, 2018

tatatodd commented May 17, 2018

asimshankar commented Jun 18, 2018

duncanriach commented Aug 27, 2020 • edited

sushreebarsa commented Jan 4, 2022

google-ml-butler bot commented Jan 11, 2022

google-ml-butler bot commented Jan 18, 2022

Palazor commented Mar 28, 2018 •

edited

duncanriach commented Aug 27, 2020 •

edited