-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorFlow unit test failure python/kernel_tests:self_adjoint_eig_op_test #52544
Comments
Hi @Saduf2019 ! Could you please look at this issue! |
Hi all, here are the full traceback logs from running python/kernel_tests:self_adjoint_eig_op_test :
|
@cantonios Could you please take a look? From the logs this sounds like a tolerance issue imho. |
I checked the errors for intel. For those failing tests, we have:
And the comment above the tolerance in the test itself specifies:
so it's possible it's just a tolerance issue - an absolute error of 0.01 does seem a bit tight when the actual values are > 0.009. The errors you're seeing are almost double though. Maybe try decreasing the |
Hello @cantonios, if the delta is changed from .1 to .1001 the tests pass on aarch64. Is this an acceptable value? |
It's a step size, so I'd prefer decreasing it. With the change from .1 to .1001, we're still likely in potential flaky territory. Try something like .05, then check what the errors are (e.g. you can set atol for the test to 0 and let it report the error to you when the test fails). Hopefully that would bring us lower (with some buffer) than the original atol. |
Okay got it, .05 doesn't work however .09 does work pass successfully. I'll keep testing and see what the lowest possible is without it failing. |
I've tested several times at a delta of .08 and it also passes however any lower and the tests start to fail again so I think .08 would be the lowest possible. Does this seem reasonable @cantonios ? |
Andrew Goodbody has also tested this and said that the tests pass by disabling Fused Multiply Add instructions via |
Are these tests built using -03? For the purpose of unit testing, I guess it is ok to disable these FMA instructions (?). |
By default it builds with -O2. |
Sorry for the late response! @cantonios has been on a long vacation. I think we can move forward with a 0.08 delta in the meanwhile. (We can revisit this later when @cantonios is back and if he thinks this needs more work.) Would you mind sending a PR for this and tag me? Thank you very much! |
My preference is small delta, leave FMA on. |
The tolerance on self_adjoint_eig_op_test seems a bit tight. The test is currently failing on aarch64 (tensorflow#52544). Playing around with small perturbations of the inputs and step size `delta` on x86_64, the max error seems to be in the range 0.008-0.016. Increasing the test tolerance therefore seems reasonable to account for this error range. Fixes tensorflow#52544. PiperOrigin-RevId: 439758034
System information
Describe the current behavior
Running the bazel : python/kernel_tests/self_adjoint_eig_op_test fails.
Building the self_adjoint_eig_op_test with the following command:
bazel build --flaky_test_attempts=3 --test_output=all --cache_test_results=no --noremote_accept_cached --config=noaws --config=nogcp --config=nonccl --verbose_failures -- //tensorflow/python/kernel_tests:self_adjoint_eig_op_test
dtypes of float32 and complex64 with a size of 10 all fail while the rest pass:
This error is occurs on Aarch64 after discarding a random input:
_ = RandomInput()
sample in the _GetSelfAdjointEigGradTest function. Without this input, or with when there is a second input discarded before running the tests, they pass as expected.Describe the expected behavior
For python/kernel_tests/self_adjoint_eig_op_test to pass.
The text was updated successfully, but these errors were encountered: