How to build tensorflow from source, so that it works similar across INTEL and AMD CPUs (in terms of floating point math precision)

<details><summary>Click to expand!</summary> 
 
 ### Issue Type

Build/Install

### Source

source

### Tensorflow Version

1.15

### Custom Code

No

### OS Platform and Distribution

Linux Ubuntu 18.04

### Mobile device

_No response_

### Python version

3.6.9

### Bazel version

0.26.1

### GCC/Compiler version

7.5.0

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current Behaviour?


I am trying to build tensorflow from source(v1.15) as I was getting a difference in floating point math precision across INTEL and AMD CPUs(refer to issue #56529). What are the possible options to provide for the build command to fix this? I am looking for a solution for either TF v1.15 or TF v2.3. 

I tried the following build commands, 
```shell
1)  bazel --output_base=/local/mnt/workspace/gsaichai build --config=v1 --config=mkl --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mfpmath=387 --copt=-mtune=generic --copt=-march=x86-64 --host_copt=-march=x86-64 --verbose_failures //tensorflow/tools/pip_package:build_pip_package

2)  bazel --output_base=/local/mnt/workspace/gsaichai build --config=v1 --config=mkl --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mfpmath=387 --copt=-march=x86-64 --host_copt=-march=x86-64 --verbose_failures //tensorflow/tools/pip_package:build_pip_package

```
But the difference was still present.

<s>These commands are resulting in the following error,
```shell
ERROR: /local/mnt/workspace/gsaichai/qnn_src/tensorflow/tensorflow/python/BUILD:329:1: C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)
tensorflow/python/lib/core/bfloat16.cc: In function 'bool tensorflow::{anonymous}::Initialize()':
tensorflow/python/lib/core/bfloat16.cc:634:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [6], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:638:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [10], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:641:77: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [5], <unresolved overloaded function type>, const std::array<int, 3>&)'
   if (!register_ufunc("less", CompareUFunc<Bfloat16LtFunctor>, compare_types)) {
                                                                             ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:645:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [8], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:649:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [11], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
tensorflow/python/lib/core/bfloat16.cc:653:36: error: no match for call to '(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [14], <unresolved overloaded function type>, const std::array<int, 3>&)'
                       compare_types)) {
                                    ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note: candidate: tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>
                             const std::array<int, 3>& types) {
                                                            ^
tensorflow/python/lib/core/bfloat16.cc:608:60: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'PyUFuncGenericFunction {aka void (*)(char**, const long int*, const long int*, void*)}'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 232.568s, Critical Path: 63.39s
INFO: 1208 processes: 1208 local.
FAILED: Build did NOT complete successfully

``` 
</s>

<b>The above error got resolved after making some changes to my ./configure file. But the resulting tensorflow packages are still resulting in the difference between AMD and INTEL. Please let me know what are the options that I can provide while building Tensorflow from source(for v1.15 or v2.3), so that the floating point math is consistent across AMD and INTEL</b> 
### Standalone code to reproduce the issue

```shell
None
```


### Relevant log output

_No response_</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to build tensorflow from source, so that it works similar across INTEL and AMD CPUs (in terms of floating point math precision) #56586

Issue Type

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to build tensorflow from source, so that it works similar across INTEL and AMD CPUs (in terms of floating point math precision) #56586

Description

Issue Type

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions