-
Notifications
You must be signed in to change notification settings - Fork 74.2k
-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor Flow gives different results on INTEL and AMD CPUs #56529
Comments
Hi @GChaitanya2001, |
Hi @tilakrayal, |
@GChaitanya2001, |
Hi @tilakrayal, |
@GChaitanya2001, |
Hi @tilakrayal, I had to reopen this issue as I wanted to refer the code for a clarification. I tried converting the input tensors (which are in float32) in the above code (test.py) to float64 before running tf.Session and executed sess.run with float64 tensors and finally re-converted back the float64 result to float32. I used TF v2.3 to do this. The final re-converted answer of TF v2.3 for this modified test.py is matching with TF v2.9 answer of the original test.py code where only float32 tensors are used. And now for this modified test.py code, TF v2.3 doesn't have any difference across INTEL and AMD. Does that mean TF v2.9 is implicitly using 64-bit math even when the tensors are in float32 while using tf.session? Please let me know your views on this. Note: Please find the below code for this experiment, test.py (modified) import numpy as np
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0' #debug mode
np.set_printoptions(precision=7, floatmode="fixed")
print ("We are using Tensorflow version", tf.__version__)
################ Executing in non-eager mode
print(tf.executing_eagerly())
tf.compat.v1.disable_v2_behavior()
print(tf.executing_eagerly())
################ float32 inputs
# float32 NumPy array
a = np.arange(100, dtype=np.float32)
# The same array with the same dtype in TensorFlow
a_tf = tf.constant(a, dtype=tf.float32)
############### Square root with NumPy
sqrt = np.sqrt(a)
print(sqrt)
print(type(sqrt))
sqrt.tofile('../np_exp/sqrt.raw')
a.tofile('../np_exp/a_np.raw')
############### Square root with TensorFlow
with tf.compat.v1.Session() as sess:
print(a_tf.dtype)
a_tf = tf.cast(a_tf, tf.float64)
print(a_tf.dtype)
sqrt_tf = sess.run(tf.sqrt(a_tf))
print(sqrt_tf)
print(type(sqrt_tf), sqrt_tf.dtype)
sqrt_tf = sqrt_tf.astype('float32')
print(sqrt_tf)
print(type(sqrt_tf), sqrt_tf.dtype)
sqrt_tf.tofile('./sqrt.raw') Thank you! |
Hi @tilakrayal, I found an example where even TF v2.9 doesn't give same result on AMD and INTEL CPUs. Please use the same compare.py file to the compare the output dumps. For the below example, I could see a difference in TF v2.9 dumps but not in Numpy dumps. Please let me know how to resolve this. Thank you! test_1.pyimport numpy as np
import tensorflow as tf
print(tf.__version__)
print(tf.executing_eagerly())
tf.compat.v1.disable_v2_behavior()
print(tf.executing_eagerly())
x = np.array([
24.538005828857422,
18.491443634033203,
87.84298706054688,
14.998174667358398,
19.956972122192383,
8.451898574829102,
13.764676094055176,
8.09897518157959,
7.404653549194336,
27.5488338470459,
24.450223922729492,
30.647415161132812,
141.6356658935547,
12.053257942199707,
312.13714599609375,
3.22926926612854,
4.638835906982422,
5.29040002822876,
17.960458755493164,
24.698667526245117,
15.1753511428833,
63.419395446777344,
19.004623413085938,
1.5544555187225342,
83.3165054321289,
126.32454681396484,
88.05239868164062,
21.876977920532227,
0.2955206036567688,
7.340531349182129,
52.52980422973633,
9.621030807495117,
12.051604270935059,
783.888427734375,
34.49824523925781,
5.958560466766357,
11.733049392700195,
330.6155700683594,
17.118879318237305,
15.840741157531738,
4.088183879852295,
24.647945404052734,
0.08141398429870605,
17.852745056152344,
2.8441035747528076,
1.9566246271133423,
27.920806884765625,
15.26904010772705,
0.43852341175079346,
4.032613754272461,
5.299862384796143,
4.817346096038818,
23.03229331970215,
41.89274978637695,
21.65822410583496,
2.5178189277648926,
5.134634017944336,
11.193047523498535,
2.8467650413513184,
63.14961242675781,
26.825477600097656,
106.63838958740234,
14.791544914245605,
13.02008056640625,
0.5024715065956116,
9.236507415771484,
4.46705961227417,
12.719013214111328,
1.9253795146942139,
23.949853897094727,
1.091653823852539,
11.166877746582031,
90.74140167236328,
3.9178109169006348,
171.3546142578125,
18.018465042114258,
112.39656066894531,
24.57540512084961,
11.636059761047363,
18.60203742980957,
1.9045045375823975,
37.4972038269043,
87.38375854492188,
21.526424407958984,
19.116044998168945,
3.452946186065674,
8.576736450195312,
28.705116271972656,
23.794410705566406,
9.662251472473145,
4.736318111419678,
23.988988876342773,
6.401980400085449,
216.71377563476562,
27.839820861816406,
4.771946430206299
], np.float32)
y = np.array([0.0010000000474974513], np.float32)
x1 = np.array([
1.0001689195632935,
0.8780577182769775,
0.8868989944458008,
0.894711434841156,
0.944614827632904,
1.033501148223877,
2.1012930870056152,
1.6228097677230835,
1.1083400249481201,
1.2399554252624512,
1.5854648351669312,
0.7947320342063904,
0.9162285923957825,
1.3240278959274292,
0.8412530422210693,
1.233343482017517,
1.02128005027771,
0.9676418900489807,
1.2707772254943848,
1.1182206869125366,
1.1589807271957397,
1.3174852132797241,
0.8395540118217468,
0.5670318603515625,
1.9759927988052368,
1.3380725383758545,
2.000969886779785,
0.7145503759384155,
0.6299505233764648,
1.2315846681594849,
1.009231448173523,
0.6337988972663879,
0.6951360702514648,
0.9353156685829163,
0.9055959582328796,
0.9804213643074036,
0.609941303730011,
1.3276393413543701,
0.9258073568344116,
1.2211651802062988,
1.1829502582550049,
1.3320400714874268,
0.5720718502998352,
1.000986099243164,
1.21437668800354,
1.0850956439971924,
0.8027282357215881,
1.002602458000183,
2.269151449203491,
0.9027953743934631,
0.9045624136924744,
1.8146674633026123,
0.9488409757614136,
1.5350325107574463,
0.9284224510192871,
1.146897792816162,
1.0752787590026855,
0.7175253629684448,
1.0148614645004272,
0.9238283038139343,
1.018907904624939,
1.0761818885803223,
3.391054153442383,
1.348015546798706,
0.5962985754013062,
1.0098881721496582,
1.26979398727417,
1.4007360935211182,
1.1175923347473145,
2.063180685043335,
0.6979321837425232,
1.17123281955719,
0.9045684933662415,
1.1967644691467285,
1.0601277351379395,
1.3861143589019775,
1.0138226747512817,
0.8322133421897888,
1.0374521017074585,
1.0076591968536377,
0.6770474910736084,
1.9207227230072021,
1.059730052947998,
0.7887228727340698,
0.9072004556655884,
1.187535285949707,
1.1741851568222046,
1.766964316368103,
1.5409915447235107,
1.1530601978302002,
0.9437809586524963,
0.9482491612434387,
1.4560399055480957,
1.234002947807312,
1.2194430828094482,
1.246254324913025
], np.float32)
x2 = np.array([
-5.180166244506836,
3.7005016803741455,
-10.90788745880127,
4.012871265411377,
4.429512977600098,
0.3497477173805237,
-3.9566190242767334,
-1.8509852886199951,
-0.21952253580093384,
-0.06212245300412178,
-5.093024253845215,
-6.36647367477417,
10.501260757446289,
-0.23397648334503174,
26.599699020385742,
-0.043404266238212585,
-0.039516013115644455,
-0.23252145946025848,
-0.3733890950679779,
-8.798702239990234,
-1.458726167678833,
-10.43361759185791,
2.4384193420410156,
-0.4310355484485626,
-11.485552787780762,
-35.93326187133789,
-11.777127265930176,
2.0529074668884277,
0.17680668830871582,
0.3521166443824768,
-16.207395553588867,
3.020942211151123,
-2.683389902114868,
-31.045930862426758,
6.103168487548828,
0.9636617302894592,
-1.7291356325149536,
28.085447311401367,
-4.286806583404541,
-0.5121002793312073,
-0.05453529953956604,
-0.004477453883737326,
0.07280711829662323,
4.284749507904053,
-0.012386074289679527,
-0.1143072172999382,
4.45491361618042,
-0.6973512172698975,
0.061323754489421844,
0.31690242886543274,
0.02550177276134491,
-0.9771645665168762,
5.0788984298706055,
-5.232454299926758,
-5.046350479125977,
-0.2617790400981903,
-0.0006600793567486107,
-2.8525986671447754,
-0.12362352758646011,
8.20291519165039,
4.711668491363525,
10.515141487121582,
-4.041153430938721,
-0.16640359163284302,
-0.33183470368385315,
4.820765972137451,
0.13425064086914062,
0.009547995403409004,
-0.1804608851671219,
-5.927944660186768,
0.13916325569152832,
-0.5243402123451233,
-16.569602966308594,
-0.6729446649551392,
11.393829345703125,
-0.26097437739372253,
-21.679431915283203,
4.195556163787842,
0.19142165780067444,
3.6716952323913574,
0.3988632559776306,
-7.73115348815918,
-8.86845874786377,
3.6867213249206543,
-4.281424522399902,
-0.7083483338356018,
-17.665390014648438,
-6.130855560302734,
-7.333799839019775,
-0.12031694501638412,
-0.02339952066540718,
-5.354321479797363,
0.047780346125364304,
19.933006286621094,
0.3705201745033264,
-0.4028393626213074
], np.float32)
x3 = np.array([
1.2550855875015259,
0.8654487729072571,
0.6308754682540894,
0.9816827178001404,
1.0548738241195679,
0.012524792924523354,
-0.839571475982666,
-0.0006424338207580149,
-0.04649466648697853,
-0.037287574261426926,
0.06486305594444275,
0.9137144684791565,
0.8428932428359985,
-0.01175174955278635,
0.8376722931861877,
-0.030319122597575188,
-0.04582984372973442,
-0.018306005746126175,
0.09186185151338577,
0.17019498348236084,
-0.020119082182645798,
0.22337554395198822,
0.4282861351966858,
3.9934964179992676,
-0.4508689045906067,
0.4686657786369324,
-0.1897992491722107,
0.29383572936058044,
0.2193099409341812,
-0.1484094113111496,
0.31675228476524353,
0.5295975804328918,
1.8150707483291626,
2.0527231693267822,
1.086689829826355,
-0.017538614571094513,
2.948042869567871,
0.4877329468727112,
4.95603084564209,
-0.029376821592450142,
-0.07688342779874802,
-0.037382662296295166,
0.17505115270614624,
1.08339524269104,
-0.07510468363761902,
-0.024961132556200027,
0.540581464767456,
0.7190515398979187,
-3.110440492630005,
-0.057315193116664886,
-0.01705225743353367,
0.6796309947967529,
1.0900826454162598,
0.39591458439826965,
4.2813401222229,
-0.07372663915157318,
-0.04945394769310951,
2.074993133544922,
-0.020864002406597137,
0.39534062147140503,
1.0341440439224243,
1.0517014265060425,
-1.7383136749267578,
-0.033969223499298096,
3.984095335006714,
-0.300537109375,
-0.054953258484601974,
-0.05139319971203804,
-0.0513877235352993,
-0.6203671097755432,
-1.0038613080978394,
-0.060277827084064484,
-0.0579550676047802,
-0.03466716781258583,
1.1362496614456177,
-0.08225323259830475,
1.2022327184677124,
0.7995136976242065,
0.01857675611972809,
1.041009783744812,
0.25889095664024353,
-0.6596677303314209,
1.6385159492492676,
0.739734947681427,
4.812783718109131,
-0.05927223339676857,
-0.12917836010456085,
-0.684905469417572,
-0.426142156124115,
-0.0003853600355796516,
-0.020870279520750046,
4.774664878845215,
-0.11399747431278229,
0.8400925397872925,
0.018678396940231323,
-0.10171803832054138
], np.float32)
graph = tf.Graph()
with graph.as_default():
xt = tf.constant(x, dtype=tf.float32)
yt = tf.constant(y, dtype=tf.float32)
x1t = tf.constant(x1, dtype=tf.float32)
x2t = tf.constant(x2, dtype=tf.float32)
x3t = tf.constant(x3, dtype=tf.float32)
t1 = tf.add(xt, yt)
t2 = tf.math.rsqrt(t1)
t4 = tf.math.multiply(x1t, t2)
t5 = tf.math.multiply(x2t, t4)
t6 = tf.math.subtract(x3t, t5)
print(tf.__version__)
with tf.compat.v1.Session(graph = graph) as session:
ans = session.run(t6)
ans.tofile('./AMD/ans.raw')
ans = np.subtract(x3, np.multiply(np.multiply(np.divide(1, np.sqrt(np.add(x, y))), x1), x2))
print(ans, ans.dtype)
ans.tofile('../np_test/AMD/ans.raw')
|
@GChaitanya2001, |
Hi @tilakrayal, -- tf_exp
-- AMD
-- INTEL
-- np_exp
-- AMD
-- INTEL Run the script inside tf_exp folder, it will create relevant raw dumps in the respective folders. Change the './AMD/ans.raw' to './INTEL/ans.raw' inside test.py while using INTEL CPU. You can change the relevant paths accordingly. Thank you! |
Hi @tilakrayal, @gadagashwini, I explored the Tensorflow source code and finally got a TF build which wasn't showing any difference across INTEL and AMD CPUs. I disabled EIGEN_FAST_MATH macro (which is enabled by default) while building TF from source which worked (Tried this for v1.15). Thank you! |
@GChaitanya2001, |
@GChaitanya2001 Hey, I came across the same inconsistency issue when building TF from source. Do you mind sharing where did you make the change to disable the macro? Thanks. |
@xinario Hi, you can disable the macro by setting it to 0. You can use a -copt option for doing that while building tensorflow from source. |
@GChaitanya2001 Thank you very much for the reply. Do you mind explain a bit the cause for the difference, like why the EIGEN_FAST_MATH would result this difference on different CPU? I also experienced tiny difference across a customized build across Mac and Windows. Wondering if there're other type of optimization going on there. |
Click to expand!
Issue Type
Bug
Source
sourceTensorflow Version
2.3
Custom Code
Yes
OS Platform and Distribution
Linux Ubuntu 18.04
Mobile device
No response
Python version
Bazel version
0.26.1GCC/Compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
I tried running a sample sqrt computation on INTEL and AMD CPUs. I used a tolerance of 0.0000001 i.e., the values are printed in the log output, only if the difference between corresponding values is >= 0.0000001.
The values obtained on INTEL and AMD CPUs are not matching.
Note:
Steps to reproduce building TF from source: ~~Provided “-march=x86-64” for --copt and --host_copt flags while doing bazel build and set --config=v1.Standalone code to reproduce the issue
test.py
compare.py - script to compare raw files
Sample commands to run
Relevant log output
The text was updated successfully, but these errors were encountered: