Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.sparse_tensor_dense_matmul makes small errors with tf.float32 matrices on GPU #18037

Closed
Palazor opened this issue Mar 28, 2018 · 8 comments
Closed
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author

Comments

@Palazor
Copy link

Palazor commented Mar 28, 2018


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, simple short code
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): both Ubuntu 14.04 / Centos 7
  • TensorFlow installed from (source or binary): pip binary on Ubuntu, source on Centos
  • TensorFlow version (use command below): 1.4.1
  • Python version: 3.5.2
  • Bazel version (if compiling from source): release 0.8.1
  • GCC/Compiler version (if compiling from source): 4.8.5
  • CUDA/cuDNN version: 6.0.21
  • GPU model and memory: GTX 750 / GTX 1080
  • Exact command to reproduce: tf.sparse_tensor_dense_matmul

Describe the problem

  1. Given a sparse tensor sp and a dense tensor mat, both of tf.float32,
  2. Compute thier product with tf.sparse_tensor_dense_matmul(sp, mat),
  3. The product varies slightly.

Source code / logs

import tensorflow as tf
import numpy as np

s = tf.Session()

num = 10
dim = 10
total_out = 100

indices = [
    [1, 0],
    [2, 0],
    [3, 0],
    [5, 0], [5, 1], [5, 2],
    [6, 0], [6, 1], [6, 2], [6, 3], [6, 4], [6, 7],
    [7, 0], [7, 1], [7, 2], [7, 7], [7, 8],
    [8, 0],
    [9, 0], [9, 1], [9, 2], [9, 7]
]
values = np.array([1.0] * len(indices), np.float32)
feature = tf.SparseTensor(indices, values, [tf.cast(num, tf.int64), tf.cast(dim, tf.int64)])

dense = tf.sparse_tensor_to_dense(feature, validate_indices=False)
mat = tf.contrib.stateless.stateless_random_uniform([dim, total_out], seed=[1, 2], dtype=tf.float32)
prod = tf.sparse_tensor_dense_matmul(feature, mat)
# prod2 = tf.sparse_matmul(dense, mat, False, True, True, False, name='cross_sum')

T = ['dense', 'mat', 'prod']
results = s.run([dense, mat, prod])

comp0 = []
comp1 = []
for i, r in enumerate(results):
    try:
        comp0.append(np.sum(np.load('npy_{}.npy'.format(T[i]))) - np.sum(r))
        comp1.append(np.load('npy_{}.npy'.format(T[i])) - r)
    except:
        np.save('npy_{}.npy'.format(T[i]), r)
for i in range(len(comp0)):
    print(T[i])
    print(comp0[i])
    print(comp1[i])
    print('\n')

Run the code several times, you will see that the product will vary slightly. like this:

dense
0.0
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


mat
0.0
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
...
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]]


prod
0.0
[[ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   2.3841858e-07 -4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -4.7683716e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  4.7683716e-07  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  2.3841858e-07  2.3841858e-07  0.0000000e+00
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
  -2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00 -2.3841858e-07  0.0000000e+00
  -2.3841858e-07  4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  2.3841858e-07  0.0000000e+00
   2.3841858e-07  0.0000000e+00  4.7683716e-07  2.3841858e-07
   0.0000000e+00  4.7683716e-07  2.3841858e-07  4.7683716e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07
   2.3841858e-07  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  2.3841858e-07  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  2.3841858e-07  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00 -2.3841858e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  4.7683716e-07
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -2.3841858e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00 -4.7683716e-07  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  2.3841858e-07]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
...
]

This only happens on GPU with float32. It should be a bug I guess.

@Palazor
Copy link
Author

Palazor commented Mar 30, 2018

After further test, I found that float64 also has the same problem, if the dense shape fo the sparse matrix is large enough.

@fabregaszy
Copy link

Any updates regarding with this issue?

@tatatodd tatatodd assigned asimshankar and unassigned tatatodd May 17, 2018
@tatatodd
Copy link
Contributor

Assigning to @asimshankar, who might be able to find someone to take a look.

@asimshankar
Copy link
Contributor

@zheng-xq for triage

@duncanriach
Copy link
Contributor

duncanriach commented Aug 27, 2020

Hi, @wenscarl and I have reproduced this nondeterminism for fp32. We were not able to repro for fp64, and there does not seem to be any code above showing how to do that. We're reasonably confident that the source of nondeterminism is the use of CUDA atomicAdd in sparse_tensor_dense_matmul_op_gpu.cu.cc. I just wanted to let it be known that this item is on our radar and we plan to resolve it at some point.

Also, this source of nondeterminism has been documented in github/NVIDIA/framework-determinism.

@sushreebarsa
Copy link
Contributor

@Palazor
We see that you are using old version of tensorflow ( 1.x) ,which is not actively supported, We recommend that you upgrade to 2.4 or later version.Attaching migration guide for reference. Thanks!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Jan 4, 2022
@sushreebarsa sushreebarsa self-assigned this Jan 4, 2022
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jan 11, 2022
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

8 participants