Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow unit test failure python/kernel_tests:self_adjoint_eig_op_test #52544

Closed
TheodoreRTG opened this issue Oct 18, 2021 · 15 comments
Closed
Assignees
Labels
subtype:bazel Bazel related Build_Installation issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues

Comments

@TheodoreRTG
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Aarch64 Centos-8
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): master
  • Python version: 3.6.8
  • Bazel version (if compiling from source): 3.7.2
  • GCC/Compiler version (if compiling from source): 10.3.0
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior

Running the bazel : python/kernel_tests/self_adjoint_eig_op_test fails.
Building the self_adjoint_eig_op_test with the following command:
bazel build --flaky_test_attempts=3 --test_output=all --cache_test_results=no --noremote_accept_cached --config=noaws --config=nogcp --config=nonccl --verbose_failures -- //tensorflow/python/kernel_tests:self_adjoint_eig_op_test

dtypes of float32 and complex64 with a size of 10 all fail while the rest pass:

[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_3_10_10_True
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_3_10_10_False
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_10_10_True
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_10_10_False
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_3_10_10_True
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_3_10_10_False
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_10_10_True
[  FAILED  ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_10_10_False

[       OK ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex128_3_10_10_False
[       OK ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex128_3_10_10_True
[       OK ] SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float64_3_10_10_True

This error is occurs on Aarch64 after discarding a random input: _ = RandomInput() sample in the _GetSelfAdjointEigGradTest function. Without this input, or with when there is a second input discarded before running the tests, they pass as expected.

Describe the expected behavior

For python/kernel_tests/self_adjoint_eig_op_test to pass.

@TheodoreRTG TheodoreRTG added the type:bug Bug label Oct 18, 2021
@mohantym mohantym added subtype:bazel Bazel related Build_Installation issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Oct 19, 2021
@mohantym
Copy link
Contributor

mohantym commented Oct 19, 2021

Hi @Saduf2019 ! Could you please look at this issue!

@mohantym mohantym assigned Saduf2019 and unassigned mohantym Oct 19, 2021
@mohantym mohantym added type:build/install Build and install issues and removed type:bug Bug labels Oct 20, 2021
@Saduf2019 Saduf2019 assigned chunduriv and unassigned Saduf2019 Oct 20, 2021
@elfringham
Copy link
Contributor

@cfRod @nSircombe

@TheodoreRTG
Copy link
Author

Hi all, here are the full traceback logs from running python/kernel_tests:self_adjoint_eig_op_test :

FAIL: test_SelfAdjointEigGrad_complex64_10_10_False (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_10_10_False
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0]), array([0, 0]), array([132, 169]))
not close lhs = [ 0.05875404 -0.01270969]
not close rhs = [0.06975884 0.00339105]
not close dif = [0.0110048  0.01610075]
not close tol = [0.01069759 0.01003391]
dtype = float32, shape = (1, 20, 200)
Mismatched elements: 2 / 4000 (0.05%)
Max absolute difference: 0.01610075
Max relative difference: 5.205287
x: array([[[ 0.000389,  0.      ,  0.      , ..., -0.152314,  0.173274,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...
y: array([[[-0.000969,  0.      ,  0.      , ..., -0.154051,  0.17149 ,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...

======================================================================
FAIL: test_SelfAdjointEigGrad_complex64_10_10_True (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_10_10_True
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0]), array([0, 0]), array([132, 169]))
not close lhs = [ 0.05875404 -0.01270969]
not close rhs = [0.06975884 0.00339105]
not close dif = [0.0110048  0.01610075]
not close tol = [0.01069759 0.01003391]
dtype = float32, shape = (1, 20, 200)
Mismatched elements: 2 / 4000 (0.05%)
Max absolute difference: 0.01610075
Max relative difference: 5.205287
x: array([[[ 0.000389,  0.      ,  0.      , ..., -0.152314,  0.173274,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...
y: array([[[-0.000969,  0.      ,  0.      , ..., -0.154051,  0.17149 ,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...

======================================================================
FAIL: test_SelfAdjointEigGrad_complex64_3_10_10_False (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_3_10_10_False
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0, 0, 0, 0, 0]), array([ 0,  0, 20, 20, 40, 40]), array([132, 169, 332, 369, 532, 569]))
not close lhs = [ 0.05875404 -0.01270969  0.05875404 -0.01270969  0.05875404 -0.01270969]
not close rhs = [0.06975884 0.00339105 0.06975884 0.00339105 0.06975884 0.00339105]
not close dif = [0.0110048  0.01610075 0.0110048  0.01610075 0.0110048  0.01610075]
not close tol = [0.01069759 0.01003391 0.01069759 0.01003391 0.01069759 0.01003391]
dtype = float32, shape = (1, 60, 600)
Mismatched elements: 6 / 36000 (0.0167%)
Max absolute difference: 0.01610075
Max relative difference: 5.205287
x: array([[[0.000389, 0.      , 0.      , ..., 0.      , 0.      ,
0.      ],
[0.      , 0.      , 0.      , ..., 0.      , 0.      ,...
y: array([[[-0.000969,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...

======================================================================
FAIL: test_SelfAdjointEigGrad_complex64_3_10_10_True (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_complex64_3_10_10_True
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0, 0, 0, 0, 0]), array([ 0,  0, 20, 20, 40, 40]), array([132, 169, 332, 369, 532, 569]))
not close lhs = [ 0.05875404 -0.01270969  0.05875404 -0.01270969  0.05875404 -0.01270969]
not close rhs = [0.06975884 0.00339105 0.06975884 0.00339105 0.06975884 0.00339105]
not close dif = [0.0110048  0.01610075 0.0110048  0.01610075 0.0110048  0.01610075]
not close tol = [0.01069759 0.01003391 0.01069759 0.01003391 0.01069759 0.01003391]
dtype = float32, shape = (1, 60, 600)
Mismatched elements: 6 / 36000 (0.0167%)
Max absolute difference: 0.01610075
Max relative difference: 5.205287
x: array([[[0.000389, 0.      , 0.      , ..., 0.      , 0.      ,
0.      ],
[0.      , 0.      , 0.      , ..., 0.      , 0.      ,...
y: array([[[-0.000969,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,...

======================================================================
FAIL: test_SelfAdjointEigGrad_float32_10_10_False (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_10_10_False
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0]), array([0]), array([98]))
not close lhs = [-0.00760599]
not close rhs = [-0.01889302]
not close dif = [0.01128702]
not close tol = [0.01018893]
dtype = float32, shape = (1, 10, 100)
Mismatched elements: 1 / 1000 (0.1%)
Max absolute difference: 0.01128702
Max relative difference: 5.878891
x: array([[[ 8.763281e-03,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  1.114071e-01,  3.540782e-01,...
y: array([[[ 1.307978e-02,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  1.119048e-01,  3.604206e-01,...

======================================================================
FAIL: test_SelfAdjointEigGrad_float32_10_10_True (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_10_10_True
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0]), array([0]), array([98]))
not close lhs = [-0.00760599]
not close rhs = [-0.01889302]
not close dif = [0.01128702]
not close tol = [0.01018893]
dtype = float32, shape = (1, 10, 100)
Mismatched elements: 1 / 1000 (0.1%)
Max absolute difference: 0.01128702
Max relative difference: 5.878891
x: array([[[ 8.763281e-03,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  1.114071e-01,  3.540782e-01,...
y: array([[[ 1.307978e-02,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  0.000000e+00,  0.000000e+00,
0.000000e+00,  0.000000e+00,  1.119048e-01,  3.604206e-01,...

======================================================================
FAIL: test_SelfAdjointEigGrad_float32_3_10_10_False (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_3_10_10_False
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0, 0]), array([ 0, 10, 20]), array([ 98, 198, 298]))
not close lhs = [-0.00760599 -0.00760599 -0.00760599]
not close rhs = [-0.01889302 -0.01889302 -0.01889302]
not close dif = [0.01128702 0.01128702 0.01128702]
not close tol = [0.01018893 0.01018893 0.01018893]
dtype = float32, shape = (1, 30, 300)
Mismatched elements: 3 / 9000 (0.0333%)
Max absolute difference: 0.01128702
Max relative difference: 5.878891
x: array([[[ 0.008763,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.087142,  0.      ,  0.      , ...,  0.      ,  0.      ,...
y: array([[[ 0.01308 ,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.091558,  0.      ,  0.      , ...,  0.      ,  0.      ,...

======================================================================
FAIL: test_SelfAdjointEigGrad_float32_3_10_10_True (main.SelfAdjointEigGradTest)
SelfAdjointEigGradTest.test_SelfAdjointEigGrad_float32_3_10_10_True
Traceback (most recent call last):
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py", line 241, in Test
self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 1390, in decorated
return f(*args, **kwds)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2968, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2906, in _assertAllCloseRecursive
(path_str, path_str, msg))
File "/home/builder/testing-ci/tst/tensorflow/bazel-bin/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.runfiles/org_tensorflow/tensorflow/python/framework/test_util.py", line 2860, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 1533, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib64/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched value: a is different from b.
not close where = (array([0, 0, 0]), array([ 0, 10, 20]), array([ 98, 198, 298]))
not close lhs = [-0.00760599 -0.00760599 -0.00760599]
not close rhs = [-0.01889302 -0.01889302 -0.01889302]
not close dif = [0.01128702 0.01128702 0.01128702]
not close tol = [0.01018893 0.01018893 0.01018893]
dtype = float32, shape = (1, 30, 300)
Mismatched elements: 3 / 9000 (0.0333%)
Max absolute difference: 0.01128702
Max relative difference: 5.878891
x: array([[[ 0.008763,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.087142,  0.      ,  0.      , ...,  0.      ,  0.      ,...
y: array([[[ 0.01308 ,  0.      ,  0.      , ...,  0.      ,  0.      ,
0.      ],
[ 0.091558,  0.      ,  0.      , ...,  0.      ,  0.      ,...

Ran 89 tests in 26.818s

FAILED (failures=8, skipped=1)

@everton1984
Copy link
Contributor

@cantonios Could you please take a look? From the logs this sounds like a tolerance issue imho.

@cantonios cantonios self-assigned this Oct 22, 2021
@cantonios
Copy link
Contributor

I checked the errors for intel. For those failing tests, we have:

test_SelfAdjointEigGrad_complex64_10_10_False
  Max absolute difference: 0.00879528
  Max relative difference: 4.538925

test_SelfAdjointEigGrad_complex64_10_10_True
  Max absolute difference: 0.00879528
  Max relative difference: 4.538925

test_SelfAdjointEigGrad_float32_10_10_False
   Max absolute difference: 0.00921449
   Max relative difference: 6.229456

test_SelfAdjointEigGrad_float32_3_10_10_True
   Max absolute difference: 0.00921449
   Max relative difference: 6.229456

And the comment above the tolerance in the test itself specifies:

# tolerance obtained by looking at actual differences using
# np.linalg.norm(theoretical-numerical, np.inf) on -mavx build
# after discarding one random input sample

so it's possible it's just a tolerance issue - an absolute error of 0.01 does seem a bit tight when the actual values are > 0.009. The errors you're seeing are almost double though.

Maybe try decreasing the delta parameter here first to see if that lowers your errors.

@TheodoreRTG
Copy link
Author

Hello @cantonios, if the delta is changed from .1 to .1001 the tests pass on aarch64. Is this an acceptable value?

@cantonios
Copy link
Contributor

It's a step size, so I'd prefer decreasing it. With the change from .1 to .1001, we're still likely in potential flaky territory.

Try something like .05, then check what the errors are (e.g. you can set atol for the test to 0 and let it report the error to you when the test fails). Hopefully that would bring us lower (with some buffer) than the original atol.

@TheodoreRTG
Copy link
Author

Okay got it, .05 doesn't work however .09 does work pass successfully. I'll keep testing and see what the lowest possible is without it failing.

@TheodoreRTG
Copy link
Author

I've tested several times at a delta of .08 and it also passes however any lower and the tests start to fail again so I think .08 would be the lowest possible. Does this seem reasonable @cantonios ?

@TheodoreRTG
Copy link
Author

Andrew Goodbody has also tested this and said that the tests pass by disabling Fused Multiply Add instructions via --copt=-ffp-contract=off --cxxopt=-ffp-contract=off

@cfRod
Copy link
Contributor

cfRod commented Nov 8, 2021

Are these tests built using -03? For the purpose of unit testing, I guess it is ok to disable these FMA instructions (?).
Tagging @penpornk and @nSircombe here!

@elfringham
Copy link
Contributor

By default it builds with -O2.

@chunduriv chunduriv removed their assignment Nov 26, 2021
@penpornk
Copy link
Member

Sorry for the late response! @cantonios has been on a long vacation. I think we can move forward with a 0.08 delta in the meanwhile. (We can revisit this later when @cantonios is back and if he thinks this needs more work.)

Would you mind sending a PR for this and tag me? Thank you very much!

@cantonios
Copy link
Contributor

My preference is small delta, leave FMA on.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

penpornk pushed a commit to penpornk/tensorflow that referenced this issue Apr 25, 2022
The tolerance on self_adjoint_eig_op_test seems a bit tight. The test is currently failing
on aarch64 (tensorflow#52544).

Playing around with small perturbations of the inputs and step size `delta` on x86_64,
the max error seems to be in the range 0.008-0.016. Increasing the test tolerance therefore
seems reasonable to account for this error range.

Fixes tensorflow#52544.

PiperOrigin-RevId: 439758034
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype:bazel Bazel related Build_Installation issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

9 participants