Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is my CPU Performance of Float64 tf.matmul in TensorFlow2 significantly lower than the NumPy matmul, even in the graph mode? #53798

Open
as641651 opened this issue Jan 17, 2022 · 1 comment
Assignees
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:performance Performance Issue

Comments

@as641651
Copy link

I'm comparing the single thread performance of the matrix-matrix products in TensorFlow 2 and NumPy. I compare separately for single precision (float32) and double precision (float64). I find that the NumPy performance is almost equivalent to the Intel MKL C++ implementation (used as a benchmark for matrix multiplication) for both single and double precision (DGEMM and SGEMM). But in TensorFlow, only the single precision (float32) performance is equivalent to the MKL, and the double precision (float64) performance is significantly slower. Why is Tensorflow slower when used with double precision data?

Sample Scripts:

I consider the following instance to reproduce my observation. Consider the matrix multiplication:

C = AB where A and B are of size 3000x3000

The TensorFlow2 and NumPy code are given below:

Tensorflow2 code

import tensorflow as tf
import os
import time


#Check if MKL is enabled
import tensorflow.python.framework as tff
print("MKL Enabled : ", tff.test_util.IsMklEnabled())


#Set threads
tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)

#Problem size
N = 3000
REPS = 20
DTYPE = tf.float64
#DTYPE = tf.float32


@tf.function
def gemm_implicit_noup(A, B):
    #C = A @ B
    start = tf.timestamp()
    with tf.control_dependencies([start]):
        C = tf.matmul(A,B)
    with tf.control_dependencies([C]):
        end = tf.timestamp()
    tf.print(end-start)
    return C

tf.config.run_functions_eagerly(False)

A = tf.random.normal([N, N], dtype=DTYPE)
B = tf.random.normal([N, N], dtype=DTYPE)


#Building Trace
C = gemm_implicit_noup(A,B)

for i in range(REPS):
   C = gemm_implicit_noup(A,B)

Numpy code

import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import time

N = 3000
REPS = 20
DTYPE = np.float64
#DTYPE = np.float32

def gemm_implicit_noup(A, B):
    #C = A @ B
    C = np.matmul(A,B)
    return C



A = np.random.randn(N,N).astype(DTYPE)
B = np.random.randn(N,N).astype(DTYPE)

for i in range(REPS):
   start = time.perf_counter()
   C = gemm_implicit_noup(A,B)
   end = time.perf_counter()
   print(end-start)

System and Installation settings:

The performance was compared on Intel Xeon Skylake 2.1 GHz with CentOS 7 and also on MacBook Pro 2018 with BigSur. The performance was compared on both Tensorflow 2.7 and 2.8, which were built with Intel MKL. Python 3.9.7 and 3.7.4 were checked. I compare the single thread performance so that the results can be reliably reproduced. I observe similar performance numbers in all the settings:

Single precision performance is as expected:

  • Intel MKL C++ SGEMM ~ 0.5s
  • NumPy float32 ~ 0.5s
  • TensorFlow float32 ~ 0.5s

But Double precision performance:

  • Intel MKL C++ DGEMM ~ 0.9s
  • NumPy float64 ~ 1s
  • TensorFlow float64 > 2.5s (Much Slower!!)
@as641651 as641651 added the type:performance Performance Issue label Jan 17, 2022
@mohantym mohantym added TF 2.7 Issues related to TF 2.7.0 comp:ops OPs related issues labels Jan 18, 2022
@mohantym
Copy link
Contributor

Hi @Saduf2019 ! Could you please look at this issue. Attaching gist for reference.

@mohantym mohantym assigned Saduf2019 and unassigned mohantym Jan 19, 2022
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

5 participants