Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behaviour in Range #52676

Closed
elfringham opened this issue Oct 26, 2021 · 3 comments · Fixed by #52707
Closed

Undefined behaviour in Range #52676

elfringham opened this issue Oct 26, 2021 · 3 comments · Fixed by #52707
Assignees
Labels
2.6.0 comp:ops OPs related issues type:bug Bug

Comments

@elfringham
Copy link
Contributor

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): all
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/a
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): git HEAD
  • Python version: 3.6.8
  • Bazel version (if compiling from source): 3.7.2
  • GCC/Compiler version (if compiling from source): 10.3.0
  • CUDA/cuDNN version: n/a
  • GPU model and memory: n/a

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

c->set_output(0, c->Vector(static_cast<int64_t>(size)));
has undefined behaviour when size is greater than std::numeric_limits<int64_t>::max()
This leads to the unit test RangeTest.testLargeStarts failing on AARCH64 where the g++ implements different behaviour from x86. On x86 the result of the cast is large and -ve, on AARCH64 it is large and +ve. Neither is incorrect as the behaviour of casting into a type that cannot hold the value is undefined.

Describe the expected behavior

The code should be written to avoid relying on undefined behaviour of the source.

Contributing

  • Do you want to contribute a PR? (yes/no): yes
  • Briefly describe your candidate solution(if contributing):

Test the variable 'size' for exceeding the greatest possible value that can be safely cast to int64_t and throw an error if found.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

$ bazel test --flaky_test_attempts=3 --test_output=all --cache_test_results=no --remote_http_cache="" --remote_cache_proxy="" --noremote_accept_cached --config=nonccl --verbose_failures -- //tensorflow/python/kernel_tests:init_ops_test

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

======================================================================
ERROR: testLargeStarts (main.RangeTest)
RangeTest.testLargeStarts

Traceback (most recent call last):
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/kernel_tests/init_ops_test.py", line 553, in testLargeStarts
v = math_ops.range(start=-1e+38, limit=1)
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/util/traceback_utils.py", line 141, in error_handler
return fn(*args, **kwargs)
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/util/dispatch.py", line 1092, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/ops/math_ops.py", line 2113, in range
return gen_math_ops._range(start, limit, delta, name=name)
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/ops/gen_math_ops.py", line 7737, in _range
_ops.raise_from_not_ok_status(e, name)
File "/home/builder/.cache/bazel/_bazel_builder/9dc2dbd69dc3512cedb530e1521082e7/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/python/kernel_tests/init_ops_test.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 7131, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[9223372036854775807] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [Op:Range]

@elfringham
Copy link
Contributor Author

@cfRod @nSircombe

@sushreebarsa
Copy link
Contributor

@elfringham This issue will be closed once the PR is merged. Thank you!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 comp:ops OPs related issues type:bug Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants