Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Behavior of tf.raw_ops.Cos+tf.raw_ops.Erfc with jit_compile=True #62287

Closed
zoux1a opened this issue Oct 30, 2023 · 4 comments
Closed
Assignees
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF2.14 For issues related to Tensorflow 2.14.x type:bug Bug

Comments

@zoux1a
Copy link

zoux1a commented Oct 30, 2023

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.14.0

Custom code

Yes

OS platform and distribution

Ubuntu 22.04.3 LTS (x86_64)

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

11.8

GPU model and memory

GPU 0: NVIDIA GeForce RTX 2070 GPU 1: NVIDIA GeForce RTX 2070 GPU 2: NVIDIA GeForce RTX 2070 GPU 3: NVIDIA GeForce RTX 2070

Current behavior?

When the tf.raw_ops.Cos+tf.raw_ops.Erfc operation is invoked within a tf.function with JIT compilation enabled (jit_compile=True), it produces different results compared to the same operation called without JIT compilation. This inconsistency is observed when the code is executed on a CPU device.
The problem occurs when input Tensors pass through tf.raw_ops.Cos+tf.raw_ops.Erfc and raw_ops.Sin. With individual Ops there is no issue.

Standalone code to reproduce the issue

import tensorflow as tf
import traceback

class Network(tf.Module):
    def __init__(self):
        super().__init__()

    @tf.function(jit_compile=True)
    def __call__(self, x):
      
      x = tf.raw_ops.Cos(x=x, )        
      x = tf.raw_ops.Erfc(x=x, )        
      return x

m = Network()
inp = {
    "x": tf.random.normal([10, 9, 8], dtype=tf.bfloat16),
    }

with tf.device('/CPU:0'):
    tf.config.run_functions_eagerly(True)
    no_op_res = m(**inp)
    tf.config.run_functions_eagerly(False)
    with tf.device('/CPU:0'):
        op_res = m(**inp)

    tf.debugging.assert_near(tf.cast(no_op_res, tf.float64), tf.cast(op_res, tf.float64), atol=0.001, rtol=0.001)

Relevant log output

File "/home/guihuan/LLM/results/tf-2/2023-10-22-20-21/test.py", line 27, in <module>
    tf.debugging.assert_near(tf.cast(no_op_res, tf.float64), tf.cast(op_res, tf.float64), atol=0.001, rtol=0.001)
  File "/home/guihuan/.conda/envs/night/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/guihuan/.conda/envs/night/lib/python3.9/site-packages/tensorflow/python/ops/control_flow_assert.py", line 102, in Assert
    raise errors.InvalidArgumentError(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b''
b'x and y not equal to tolerance rtol = tf.Tensor(0.001, shape=(), dtype=float64), atol = tf.Tensor(0.001, shape=(), dtype=float64)'
b'x (shape=(10, 9, 8) dtype=float64) = '
0.17578125, 1.1484375, 0.267578125, ...
b'y (shape=(10, 9, 8) dtype=float64) = '
0.1767578125, 1.1484375, 0.267578125, ...
@google-ml-butler google-ml-butler bot added the type:bug Bug label Oct 30, 2023
@tilakrayal tilakrayal added TF2.14 For issues related to Tensorflow 2.14.x comp:ops OPs related issues labels Oct 31, 2023
@tilakrayal
Copy link
Contributor

@sachinprasadhs,
I was able to reproduce the issue on tensorflow v2.14 and tf-nightly. Kindly find the gist of it here.

@sachinprasadhs
Copy link
Contributor

I was able to replicate the reported behavior when jit_compile is set to True and False.

When the jit_compile is set to True, I see error close to 0.3 which is bigger than the atol and rtol value 0.001

Here is the Gist for reference https://gist.github.com/sachinprasadhs/64ec59dc673f360c840d19b3f6b8e7a5

@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 13, 2023
@cantonios
Copy link
Contributor

A "difference" is not an "error". This is likely due to fusion, where the intermediate result may be computed and kept in float32 in the case of jit-compilation, whereas without fusion it would cast to bfloat16 between the ops and produce a less precise answer. Still, both are correct.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF2.14 For issues related to Tensorflow 2.14.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants