Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in rsqrt #51043

Closed
BlueSkyyyyyy opened this issue Jul 30, 2021 · 16 comments
Closed

bug in rsqrt #51043

BlueSkyyyyyy opened this issue Jul 30, 2021 · 16 comments
Assignees
Labels
comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug

Comments

@BlueSkyyyyyy
Copy link

BlueSkyyyyyy commented Jul 30, 2021

---------------------------------------------------------------------------part 1-----------------------------------------------------------------
my code:

import tensorflow as tf
import os
import numpy as np
np.set_printoptions(threshold=np.inf)

def test():
    with tf.device("/device:CPU:0"):
    # with tf.device('/gpu:1'):
        a=tf.constant([1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38])
        b = tf.rsqrt(a)
    return b

if __name__=='__main__':
    with tf.Session() as sess:
        print(sess.run(test()))

expect the right result:
[9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18, 9.223371843921341e+18]

in the env: tensorflow== 1.14.0 python=3.6.8
get the wrong result:
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18]

in the env: tensorflow== 1.14.0 python=3.6.8
get the wrong result:
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18]

in the env: tensorflow== 1.14.0 python=2.7.18
get the wrong result:
[ inf inf inf inf inf
inf inf inf 9.223372e+18]

note:if I use the gpu device, also get the wrong result
----------------------------------------------------------------------part 2-----------------------------------------------------------------
my code2:

import tensorflow as tf
import os
import numpy as np
np.set_printoptions(threshold=np.inf)
import math

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
def test():
    a=tf.constant([1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38])
    b = tf.math.rsqrt(a)
    print(b)
    return
    
if __name__=='__main__':
    test()

in the env: tensorflow== 2.4.0 python=3.8.5
get the wrong result:
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18]

in the same env tensorflow== 2.4.0 python=3.8.5 if I use GPU, I mean os.environ["CUDA_VISIBLE_DEVICES"] = "1"
get the right result:
[9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18
9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18]

----------------------------------------------------------------------part 3-----------------------------------------------------------------
note:
1、the small shape tend to get the right result
2、similarity bug also in tf.floor

@tilakrayal
Copy link
Contributor

tilakrayal commented Jul 30, 2021

@BlueSkyyyyyy ,

We see that the issue template has not been filled, could you please do so as it helps us analyse the issue.Thanks!

@tilakrayal tilakrayal added TF 2.4 for issues related to TF 2.4 comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author labels Jul 30, 2021
@BlueSkyyyyyy
Copy link
Author

@BlueSkyyyyyy ,

We see that the issue template has not been filled, could you please do so as it helps us analyse the issue.Thanks!

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: cpu
  • TensorFlow installed from (source or binary):binary
  • TensorFlow version (use command below):2.4.0
  • Python version:3.8.5
  • Bazel version (if compiling from source):NA
  • GCC/Compiler version (if compiling from source):NA
  • CUDA/cuDNN version:NA
  • GPU model and memory:NA

Describe the current behavior
use tf.math.rsqrt to cal:
a=tf.constant([1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38])
b = tf.math.rsqrt(a)

get the wrong results:
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18]

Describe the expected behavior
[9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18
9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18]

Contributing

  • Do you want to contribute a PR? (yes/no):no
  • Briefly describe your candidate solution(if contributing):no

Standalone code to reproduce the issue
my code

import tensorflow as tf
import os
import numpy as np
np.set_printoptions(threshold=np.inf)
import math

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
def test():
    a=tf.constant([1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38])
    b = tf.math.rsqrt(a)
    print(b)
    return
    
if __name__=='__main__':
    test()

Other info / logs Include any logs or source code that would be helpful to
no

@BlueSkyyyyyy
Copy link
Author

BlueSkyyyyyy commented Jul 31, 2021

@BlueSkyyyyyy ,

We see that the issue template has not been filled, could you please do so as it helps us analyse the issue.Thanks!
@tilakrayal
I have used the template to describe the problem as above. not only tf 2.4 but also other version has the similarity problem( I have trytensorflow== 1.14.0+ python=3.6.8 and tensorflow== 1.14.0 python=2.7.18 ).

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 2, 2021
@tilakrayal tilakrayal added the type:bug Bug label Aug 2, 2021
@tilakrayal
Copy link
Contributor

@Saduf2019 ,
I was able to reproduce the issue in tf v2.4,v2.5 and nightly.Please find the gist of it here.

@tilakrayal tilakrayal assigned Saduf2019 and unassigned tilakrayal Aug 2, 2021
@BlueSkyyyyyy
Copy link
Author

@Saduf2019 ,@tensorflowbutler
do you know why tensorflow has this bug and how to solve it

@Saduf2019
Copy link
Contributor

@BlueSkyyyyyy
Can you try setting the dtype of the tf.constant to tf. float 64 bit floating point tensor)
2)avoid using main, check if you can get better result by removing main.
If you really have to use main then try putting your code in py file in colab and try to execute using filename.py. [the loss of precision, due to the large number you have chosen causing the issue]

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Aug 6, 2021
@BlueSkyyyyyy
Copy link
Author

BlueSkyyyyyy commented Aug 9, 2021

@Saduf2019
1)I remove main ,but have the same problem;
2) if I use tf.float64, can get the right ans, but it is not a same question; float32 should support the precision.

as i use float32 in numpy or python, can get the right ans:
code:

import numpy as np
np.set_printoptions(threshold=np.inf)
import math

if __name__=='__main__':
    a = [1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38,1.1754944e-38]

    ans_py = []
    for aa in a:
        ans_py.append(float(1.0/math.sqrt(float(aa))))
    print("ans_py:{}".format(ans_py))

    a_np = np.array(a,dtype="float32")
    ans_np = 1.0/np.sqrt(a_np)
    print("ans_np:{}".format(ans_np))

run get the ans:

ans_py:[9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18, 9.22337184392134e+18]
ans_np:[9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18
 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18
 9.223372e+18 9.223372e+18 9.223372e+18 9.223372e+18]

so i think float32 is enough for this case .

@Saduf2019
Copy link
Contributor

@BlueSkyyyyyy
This is not a bug or performance issue, please move this to closed status and for any further queries open this at the tf discussion forum.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 11, 2021
@BlueSkyyyyyy
Copy link
Author

BlueSkyyyyyy commented Aug 13, 2021

@Saduf2019
Why it is not a bug, I get obviously wrong result by tf . Is there any knowledge or I have neglect lead to the wrong use ? thanks your patient

@Saduf2019 Saduf2019 assigned ymodak and unassigned Saduf2019 Aug 16, 2021
@ymodak
Copy link
Contributor

ymodak commented Aug 24, 2021

I tested in colab both cpu and gpu implementation results match and I get

tf.Tensor(
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
 1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18], shape=(9,), dtype=float32)

I wonder why you are seeing different results. You may try test with google colab.

@ymodak ymodak added the stat:awaiting response Status - Awaiting response from author label Aug 24, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 31, 2021
@BlueSkyyyyyy
Copy link
Author

BlueSkyyyyyy commented Sep 7, 2021

@ymodak

tf.Tensor(
[1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19 1.383168e+19
1.383168e+19 1.383168e+19 1.383168e+19 9.223372e+18], shape=(9,), dtype=float32)

this result is not right, that is the key point.
1.383168e+19 is wrong ,9.223372e+18 is right.

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Sep 7, 2021
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 7, 2021
@cantonios
Copy link
Contributor

cantonios commented Sep 16, 2021

This is likely due to an Eigen approximation. If EIGEN_FAST_MATH = 1 (the default), then Eigen uses the fast reciprocal sqrt approximation, which is much faster, but less accurate (particularly as you deviate away from zero). See here.

If compiling TF from source, you can try explicitly disabling this (adding -DEIGEN_FAST_MATH=0 to the set of compile flags).

If not building TF from source, the work-around is to use tf.sqrt() instead, and do divisions as necessary.

@BlueSkyyyyyy
Copy link
Author

@cantonios
thank you so much

@sachinprasadhs sachinprasadhs removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 4, 2021
@sachinprasadhs
Copy link
Contributor

@cantonios thank you so much

If your issue is resolved, could you please close the issue. Thanks!

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Dec 4, 2021
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants