model.run_eagerly=False is much slower than model.run_eagerly=True #35052

doldre · 2019-12-12T07:44:43Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.6.1810
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.0
Python version: Python 3.7.5

Describe the current behavior
I try to implemente a simple fm algorithm in tensorflow 2.0. I found use keras.fit is very slow in default params.
If I change the model.run_eagerly to True the performance will be better.
Then I tried turn off eager_execution by tf.compat.v1.disable_eager_execution(), the performance is the same as tf1.14 with estimator.

default

2019-12-12 15:37:04.152742: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:37:04.174603: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:37:04.187437: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556f783dc5b0 executing computations on platform Host. Devices:
2019-12-12 15:37:04.187474: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Epoch 1/1000
    121/Unknown - 31s 256ms/step - loss: 0.6940 - AUC: 0.4324

model.run_eagerly=True

2019-12-12 15:38:36.014767: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:38:36.038416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:38:36.051835: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b826158d40 executing computations on platform Host. Devices:
2019-12-12 15:38:36.051874: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Epoch 1/1000
     96/Unknown - 7s 72ms/step - loss: 0.6902 - AUC: 0.3739

tf.compat.v1.disable_eager_execution()

WARNING:tensorflow:OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
2019-12-12 15:39:18.986171: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:39:19.008642: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:39:19.020877: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561448d34bb0 executing computations on platform Host. Devices:
2019-12-12 15:39:19.020914: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /home/luoxinchen/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 1s 1ms/step - loss: 0.6942 - AUC: 0.4701  
Epoch 2/1000
1000/1000 [==============================] - 1s 633us/step - loss: 0.6939 - AUC: 0.4797
Epoch 3/1000
1000/1000 [==============================] - 1s 640us/step - loss: 0.6936 - AUC: 0.4908
Epoch 4/1000

Describe the expected behavior
I think keras.fit with model.run_eagerly=False will use tf.function to wrap the training loop, and it's performance should be close to the disable eager execution. But it's perform awfully, it even slower than model.run_eagerly=True.

Code to reproduce the issue

import os
import sys
import timeit

import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras

# tf.compat.v1.disable_eager_execution()
tf.config.threading.set_inter_op_parallelism_threads(8)
os.environ['OMP_NUM_THREADS'] = '1'

bucket = int(1e7)

class MyModel(keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()

  def build(self, input_shape):
    self.user_emb = self.add_weight(
        shape=(bucket + 1, 32),
        dtype=tf.float32,
        initializer=tf.keras.initializers.TruncatedNormal(),
        name="user_emb")
    self.item_emb = self.add_weight(
        shape=(bucket + 1, 32),
        dtype=tf.float32,
        initializer=tf.keras.initializers.TruncatedNormal(),
        name="item_emb")
    self.bias = tf.Variable(0.0)

  def call(self, inputs):
    user_id, item_id = inputs
    user_id = tf.reshape(user_id, [-1])
    item_id = tf.reshape(item_id, [-1])
    out = tf.gather(self.user_emb, user_id) * tf.gather(self.item_emb, item_id)
    out = tf.reduce_sum(out, axis=1, keepdims=True) + self.bias
    out = tf.sigmoid(out)
    return out


def main():

  def py_func(feats):
    label = feats['labels']
    return (feats['user_id'], feats['item_id']), label

  model = MyModel()

  dataset = tf.data.Dataset.from_tensor_slices({
      "user_id": np.random.randint(bucket, size=[1000, 1]),
      "item_id": np.random.randint(bucket, size=[1000, 1]),
      "labels": np.random.randint(2, size=[1000, 1])
  }).map(py_func)

  model.compile(
      keras.optimizers.SGD(0.01), 'binary_crossentropy', metrics=['AUC'])

  # model.run_eagerly = True
  model.fit(
      dataset,
      shuffle=False,
      workers=1,
      epochs=1000)

if __name__ == '__main__':
  main()

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

The text was updated successfully, but these errors were encountered:

gadagashwini-zz · 2019-12-17T07:48:54Z

Issue is replicating with Tf 2.0.
Please find the gist here. Thanks!

jvishnuvardhan · 2019-12-19T21:38:18Z

@doldre I was able to reproduce the issue with TF2.0. However, I could not reproduce the issue with tf-nightly. There were couple of important updates that are related to keras model performance. I think those updates resolved the issue you are noticing with TF2.0. Here is the gist with default settings with tf-nightly. Here is the gist with setting model.run_eagerly=True and tf-nightly.

1.default

Train for 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 2s 2ms/step - loss: 0.6939 - AUC: 0.4710
Epoch 2/1000
1000/1000 [==============================] - 1s 1ms/step - loss: 0.6935 - AUC: 0.4834

model.run_eagerly=True

Train for 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 11s 11ms/step - loss: 0.6941 - AUC: 0.4770
Epoch 2/1000
1000/1000 [==============================] - 11s 11ms/step - loss: 0.6938 - AUC: 0.4881

Please close this issue if it was resolved by tf-nightly. Thanks!

doldre · 2019-12-20T03:46:21Z

@jvishnuvardhan thanks for your help. I reproduced your result with tf-nightly(2.1.0-dev20191125). It looks like the problem was fixed.

doldre · 2020-01-08T10:41:08Z

I found this problem is still existed when using RMSprop optimizer.
Here is the gist with RMSprop.

tanzhenyu · 2020-02-08T09:51:36Z

This issue has been fixed, @doldre you can verify it through tf-nightly. If this is not do-able, you can also wait for 2.2.

doldre changed the title ~~model.run_eagerly=False is much slower than model.run_eagerly=True~~ model.run_eagerly=False is much slower than model.run_eagerly=True #tf2.0 Dec 13, 2019

doldre changed the title ~~model.run_eagerly=False is much slower than model.run_eagerly=True #tf2.0~~ model.run_eagerly=False is much slower than model.run_eagerly=True Dec 13, 2019

gadagashwini-zz self-assigned this Dec 13, 2019

gadagashwini-zz added TF 2.0 Issues relating to TensorFlow 2.0 comp:eager Eager related issues type:performance Performance Issue comp:keras Keras related issues labels Dec 13, 2019

gadagashwini-zz assigned jvishnuvardhan and unassigned gadagashwini-zz Dec 17, 2019

jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Dec 19, 2019

doldre closed this as completed Dec 20, 2019

doldre reopened this Jan 8, 2020

jvishnuvardhan assigned tanzhenyu and unassigned jvishnuvardhan Jan 8, 2020

jvishnuvardhan added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Jan 8, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 8, 2020

tanzhenyu closed this as completed Feb 8, 2020

goldiegadde added this to Done in TensorFlow 2.2.0 Feb 10, 2020

Saduf2019 mentioned this issue Jun 15, 2020

Building Tensorflow from the source for the second time fails #40418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

doldre commented Dec 12, 2019 •

edited

gadagashwini-zz commented Dec 17, 2019

jvishnuvardhan commented Dec 19, 2019

doldre commented Dec 20, 2019

doldre commented Jan 8, 2020

tanzhenyu commented Feb 8, 2020

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

Comments

doldre commented Dec 12, 2019 • edited

gadagashwini-zz commented Dec 17, 2019

jvishnuvardhan commented Dec 19, 2019

doldre commented Dec 20, 2019

doldre commented Jan 8, 2020

tanzhenyu commented Feb 8, 2020

doldre commented Dec 12, 2019 •

edited