Inaccuracies in tf.reduce_sum #5527

zergylord · 2016-11-10T20:00:51Z

I've been trying to normalize some vectors by dividing by their sum, but have noticed that this doesn't always result in vectors that sum to one.

For example:

foo = np.random.rand(100)*100 #random floats between 0-100
print(sess.run(tf.reduce_sum(foo/tf.reduce_sum(foo)))) #should return 1
print((foo/foo.sum()).sum()) #does return 1

While I realize that this is possible with finite precision floating point operations, I find it odd how readily it occurs, especially when the same operation in numpy doesn't encounter this issue. Is this a known issue? I've also noticed that tf.nn.softmax doesn't appear to suffer from this problem, is there a different op I need to use to ensure proper normalization?

The text was updated successfully, but these errors were encountered:

ppwwyyxx · 2016-11-10T21:04:29Z

Met similar problem recently. the following code:

data = np.random.rand(1e4, 1e4) + np.random.rand(1e4, 1)
data = data.astype('float32')
vec = tf.placeholder(tf.float32, data.shape)
s = tf.reduce_sum(vec)
with tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=1)) as sess:
    np_sum = np.sum(data)
    tf_sum = sess.run(s, feed_dict={vec: data})
    diff = abs(tf_sum - np_sum)
    print np_sum, diff

gives an error of sometimes 2e5, which is about 2e-3 error per element. This seems a bit large.
The difference is not very large on GPU though. I wonder how the reduction is implemented and what's its difference from numpy.

shoyer · 2016-11-10T22:10:00Z

NumPy uses pairwise summation in some cases:
https://en.wikipedia.org/wiki/Pairwise_summation

Not sure what TensorFlow does on the CPU.

sherrym · 2016-11-11T00:19:24Z

@asimshankar could you please have someone look into this? Thanks!

zergylord · 2016-11-11T00:34:02Z

@ppwwyyxx Thanks for the quick responses! Just wanted to mention that the inaccuracy is still much higher than numpy when running on the GPU, even though it sounds like its less so than TF CPU.

concretevitamin · 2016-11-14T17:46:11Z

@zergylord @ppwwyyxx I cannot reproduce either of these snippets myself. In the first snippet, both Numpy and TF return 1.0. In the second, the error seems to be of the magnitude in O(100), not as drastic as reported.

Could you give some indications on how you built your program and other environment/compiler info?

asimshankar · 2016-11-14T18:51:16Z

@zergylord : To add to what @concretevitamin said - if you're seeing this on GPU a lot, do let us know about the CUDA version and the driver version. Mismatches between them often lead to messed up computations.

ppwwyyxx · 2016-11-14T20:06:50Z

@concretevitamin Running on GPU my snippet got the error of O(100). But on CPU I have O(1e5).

I'm using the binary at https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0rc2-cp27-none-linux_x86_64.whl. Tested on both i7-5930K+ubuntu16.04 and i7-6700HQ+Archlinux. My numpy.__version__ == '1.11.2'.

concretevitamin · 2016-11-16T18:50:15Z

One weird thing is that this seems reproducible only non-deterministically. Sometimes I see negligible error, sometimes I see 1%.

It seems this is working as intended. When ~1e8 numbers are being accumulated, at some point the accumulator value becomes >> next value being accumulated (in [0,1]). On GPU the Eigen impl has
more accumulators so this is mitigated a bit. "Fixing" this probably requires significant algo revamp in https://bitbucket.org/eigen/eigen/src/caedc0b002df3c53c265b603c0f537511a88b15f/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h?at=default&fileviewer=file-view-default.

aselle · 2017-02-13T17:59:39Z

@benoitsteiner, are you planning to improve the numerical accuracy of reduce_sum, or should we mark it contributions welcome. The naive implementation is certainly not "incorrect" or a "bug" but it would be better for it to use multiple accumulators, magnitude ordering partial sums or whatever.

girving · 2017-06-16T18:11:34Z

Duplicate of #2625.

OnlyBelter · 2021-02-24T06:33:12Z

Met same problem, any solution?

ppwwyyxx mentioned this issue Nov 12, 2016

The numerial difference between the tensorflow and numpy tensorpack/tensorpack#49

Closed

girving added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 14, 2016

asimshankar added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Nov 14, 2016

ppwwyyxx mentioned this issue Dec 3, 2016

Precision of reduce_sum operation #6039

Closed

aselle assigned benoitsteiner Feb 13, 2017

aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:feature Feature requests and removed stat:awaiting response Status - Awaiting response from author labels Feb 13, 2017

girving closed this as completed Jun 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccuracies in tf.reduce_sum #5527

Inaccuracies in tf.reduce_sum #5527

zergylord commented Nov 10, 2016 •

edited by asimshankar

ppwwyyxx commented Nov 10, 2016

shoyer commented Nov 10, 2016

sherrym commented Nov 11, 2016

zergylord commented Nov 11, 2016

concretevitamin commented Nov 14, 2016

asimshankar commented Nov 14, 2016

ppwwyyxx commented Nov 14, 2016 •

edited

concretevitamin commented Nov 16, 2016

aselle commented Feb 13, 2017 •

edited

girving commented Jun 16, 2017

OnlyBelter commented Feb 24, 2021

Inaccuracies in tf.reduce_sum #5527

Inaccuracies in tf.reduce_sum #5527

Comments

zergylord commented Nov 10, 2016 • edited by asimshankar

ppwwyyxx commented Nov 10, 2016

shoyer commented Nov 10, 2016

sherrym commented Nov 11, 2016

zergylord commented Nov 11, 2016

concretevitamin commented Nov 14, 2016

asimshankar commented Nov 14, 2016

ppwwyyxx commented Nov 14, 2016 • edited

concretevitamin commented Nov 16, 2016

aselle commented Feb 13, 2017 • edited

girving commented Jun 16, 2017

OnlyBelter commented Feb 24, 2021

zergylord commented Nov 10, 2016 •

edited by asimshankar

ppwwyyxx commented Nov 14, 2016 •

edited

aselle commented Feb 13, 2017 •

edited