Why is my CPU Performance of Float64 tf.matmul in TensorFlow2 significantly lower than the NumPy matmul, even in the graph mode? #53798
Labels
comp:ops
OPs related issues
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
TF 2.7
Issues related to TF 2.7.0
type:performance
Performance Issue
I'm comparing the single thread performance of the matrix-matrix products in TensorFlow 2 and NumPy. I compare separately for single precision (float32) and double precision (float64). I find that the NumPy performance is almost equivalent to the Intel MKL C++ implementation (used as a benchmark for matrix multiplication) for both single and double precision (DGEMM and SGEMM). But in TensorFlow, only the single precision (float32) performance is equivalent to the MKL, and the double precision (float64) performance is significantly slower. Why is Tensorflow slower when used with double precision data?
Sample Scripts:
I consider the following instance to reproduce my observation. Consider the matrix multiplication:
The TensorFlow2 and NumPy code are given below:
Tensorflow2 code
Numpy code
System and Installation settings:
The performance was compared on Intel Xeon Skylake 2.1 GHz with CentOS 7 and also on MacBook Pro 2018 with BigSur. The performance was compared on both Tensorflow 2.7 and 2.8, which were built with Intel MKL. Python 3.9.7 and 3.7.4 were checked. I compare the single thread performance so that the results can be reliably reproduced. I observe similar performance numbers in all the settings:
Single precision performance is as expected:
But Double precision performance:
The text was updated successfully, but these errors were encountered: