Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.cast does not preserve requested precision for Python types of float64/int64/complex128 etc #64548

Closed
jonas-eschle opened this issue Mar 26, 2024 · 10 comments
Assignees
Labels
comp:ops OPs related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.15 For issues related to 2.15.x type:bug Bug

Comments

@jonas-eschle
Copy link
Contributor

jonas-eschle commented Mar 26, 2024

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

any new version, up to develop

Custom code

No

Current behavior?

tf.cast truncates the precision of non-TF (python, numpy,...) types where the requested conversion is higher than the default of convert_to_tensor, i.e. float64, int64, complex128.

Example:
Casting a numpy array with float64 (or a python float) with tf.cast returns a tf.Tensor float64 with effective float32 precision (see example below)

The reason is in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/math_ops.py#L1006:
if it's not a TF type, it converts it to the default type returned by convert_to_tensor, which is for floats (as an example) float32. Later on, it up-casts (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/math_ops.py#L1019), but the precision was already lost at this point.

proposed solution

The comment suggests that providing dtype could convert things that are inconvertible. However, isn't this happening now already? And wouldn't that fail later on in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/math_ops.py#L1019 in the cast?
I do not see any downside of using dtype=dtype.
If you agree, I can make the change and see if the tests still pass.

Standalone code to reproduce the issue

import tensorflow as tf
print(tf.__version__)
oneplus = 1 + 1e-15  # this 1.0 in float32 representation, but correctly represented in float64
oneplus_cast32 = tf.cast(oneplus, dtype=tf.float32)  # 1.0, as expected
oneplus_cast64 = tf.cast(oneplus, dtype=tf.float64)  # 1.0, but should be larger
oneplus_converted64 = tf.convert_to_tensor(oneplus, dtype=tf.float64)

oneplus_cast32_as64 = tf.cast(oneplus_cast32, dtype=tf.float64)  # casting just for the equal below to work
# note that it is truncated due to being a float32 that is represented in a float64


# we can check that it's wrong
assert oneplus_cast32_as64 == oneplus_cast64, "This must fail!"  # works, but should not
assert oneplus_converted64 == oneplus_cast64, "This should work"  # fails, but should work
@Venkat6871
Copy link

Hi @jonas-eschle ,
I tried to run your code on Colab using TF v2.16.1, nightly faced same issue. Please find the gist here for reference.

Thank you!

@SuryanarayanaY SuryanarayanaY added the comp:ops OPs related issues label Apr 2, 2024
@SuryanarayanaY
Copy link
Collaborator

SuryanarayanaY commented Apr 2, 2024

Hi @jonas-eschle ,

I have tested the given code snippet with TF2.15v.

For 1st case(casting of float64 to float32 and checking both same or not) the assertion fails with InvalidArgumentError contrary to your reported behaviour.This is expected behaviour for this case. Could you please verify this again ?

InvalidArgumentError: cannot compute Equal as input # 1 (zero-based) was expected to be a float tensor but is a double tensor [Op:Equal] name:

For 2nd case (casting of float64 value to float64 and checking both same or not) the assertion fails but it should not fail IMO. Attached gist for reference.

@SuryanarayanaY SuryanarayanaY added the TF 2.15 For issues related to 2.15.x label Apr 2, 2024
@jonas-eschle
Copy link
Contributor Author

True, the equal fails. The equal was meant to show that the numbers are actually the same, I adjusted the script to include the conversion for the equal to work (note that the cast32_as64 was truncated, as it was a python float64 -> cast to tf float32 -> cast to tf float64, while the other one, python float64 -> tf float64, should not be truncated. These should not be equal)

@jonas-eschle
Copy link
Contributor Author

jonas-eschle commented Apr 2, 2024

maybe, just to be clear @SuryanarayanaY , this isn't a 2.15 only issue, it appears in all versions up to the nigthlies and probably has existed for years

@SuryanarayanaY
Copy link
Collaborator

Hi @jonas-eschle ,

I have checked the documentation and found below note regarding casting the python floats to higher precision can lead to loss of precision. This is expected behaviour which is also documented.

Note this operation can lead to a loss of precision when converting native Python float and complex variables to tf.float64 or tf.complex128 tensors, since the input is first converted to the float32 data type and then widened. It is recommended to use tf.convert_to_tensor instead of tf.cast for any non-tensor inputs.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 3, 2024
@jonas-eschle
Copy link
Contributor Author

I see, this is true, although highly unexpected behavior IMHO. I do not quite understand why https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/math_ops.py?rgh-link-date=2024-03-26T22%3A03%3A17Z#L1019 doesn't just take the dtype argument though, the comment seems not clear to me.

I think it's unexpected and should either a) fail if a Python type is given or b) converted to the tensor type first (the comment suggests that things could go wrong, but then, what does a user expect? It will obviously be converted to a tensor?), but doing things silently "wrong" seems not the ideal way? Especially since the conversion to a float32 is a bit random (i.e. why 32, could be 64, or 16?)

Feel free to close or keep open for "fix", I would regard it as a design issue and there is a "fix" comment there, so why not do it right

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 3, 2024
@SuryanarayanaY
Copy link
Collaborator

Hi @jonas-eschle ,

If you want to modify the comment in docs please feel free to create a PR. The source is here.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 4, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Apr 12, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.15 For issues related to 2.15.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants