Division Precision Problem in Graph Mode on Intel CPU

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

binary

### TensorFlow version

2.9

### Custom code

Yes

### OS platform and distribution

Linux Debian 11

### Mobile device

_No response_

### Python version

3.9

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

Result of `tf.ones([10], dtype=tf.float32)/tf.ones([10], dtype=tf.float32)` should be all 1.0, but was `[0.99999994 0.99999994 0.99999994 0.99999994 0.99999994 0.99999994
 0.99999994 0.99999994 1.         1.        ]` in graph mode.

Tensorflow versions before 2.9 don't have this problem, starting from 2.9 this problem ocurrs whether I enable or disable oneDNN.

Regarding CPU, this problem occurs on Intel 8336C, AMD 9Y24, but not on Apple M4.

Also, using Eigen's cwiseQuotient gives accurate result, whether avx2 or avx512f is used.

### Standalone code to reproduce the issue

```shell
import os

os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"

import tensorflow as tf


@tf.function
def reverse(x):
    return 1.0 / x


ones = tf.ones([10], dtype=tf.float32)
print("graph:", reverse(ones))
tf.config.run_functions_eagerly(True)
print("eager:", reverse(ones))
print("version:", tf.version.GIT_VERSION, tf.version.VERSION)
```

### Relevant log output

```shell
graph: tf.Tensor(
[0.99999994 0.99999994 0.99999994 0.99999994 0.99999994 0.99999994
 0.99999994 0.99999994 1.         1.        ], shape=(10,), dtype=float32)
eager: tf.Tensor([1. 1. 1. 1. 1. 1. 1. 1. 1. 1.], shape=(10,), dtype=float32)
version: v1.12.1-132116-gf67cb87691d 2.21.0-dev20251017
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Division Precision Problem in Graph Mode on Intel CPU #102771

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Division Precision Problem in Graph Mode on Intel CPU #102771

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions