Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

Closed
doldre opened this issue Dec 12, 2019 · 5 comments
Closed

model.run_eagerly=False is much slower than model.run_eagerly=True #35052

doldre opened this issue Dec 12, 2019 · 5 comments
Assignees
Labels
comp:eager Eager related issues comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:performance Performance Issue

Comments

@doldre
Copy link

doldre commented Dec 12, 2019

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.6.1810
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.0
  • Python version: Python 3.7.5

Describe the current behavior
I try to implemente a simple fm algorithm in tensorflow 2.0. I found use keras.fit is very slow in default params.
If I change the model.run_eagerly to True the performance will be better.
Then I tried turn off eager_execution by tf.compat.v1.disable_eager_execution(), the performance is the same as tf1.14 with estimator.

  1. default
2019-12-12 15:37:04.152742: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:37:04.174603: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:37:04.187437: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556f783dc5b0 executing computations on platform Host. Devices:
2019-12-12 15:37:04.187474: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Epoch 1/1000
    121/Unknown - 31s 256ms/step - loss: 0.6940 - AUC: 0.4324   
  1. model.run_eagerly=True
2019-12-12 15:38:36.014767: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:38:36.038416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:38:36.051835: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b826158d40 executing computations on platform Host. Devices:
2019-12-12 15:38:36.051874: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Epoch 1/1000
     96/Unknown - 7s 72ms/step - loss: 0.6902 - AUC: 0.3739    
  1. tf.compat.v1.disable_eager_execution()
WARNING:tensorflow:OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.
2019-12-12 15:39:18.986171: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-12 15:39:19.008642: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-12-12 15:39:19.020877: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561448d34bb0 executing computations on platform Host. Devices:
2019-12-12 15:39:19.020914: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /home/luoxinchen/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 1s 1ms/step - loss: 0.6942 - AUC: 0.4701  
Epoch 2/1000
1000/1000 [==============================] - 1s 633us/step - loss: 0.6939 - AUC: 0.4797
Epoch 3/1000
1000/1000 [==============================] - 1s 640us/step - loss: 0.6936 - AUC: 0.4908
Epoch 4/1000

Describe the expected behavior
I think keras.fit with model.run_eagerly=False will use tf.function to wrap the training loop, and it's performance should be close to the disable eager execution. But it's perform awfully, it even slower than model.run_eagerly=True.

Code to reproduce the issue

import os
import sys
import timeit

import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras

# tf.compat.v1.disable_eager_execution()
tf.config.threading.set_inter_op_parallelism_threads(8)
os.environ['OMP_NUM_THREADS'] = '1'

bucket = int(1e7)

class MyModel(keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()

  def build(self, input_shape):
    self.user_emb = self.add_weight(
        shape=(bucket + 1, 32),
        dtype=tf.float32,
        initializer=tf.keras.initializers.TruncatedNormal(),
        name="user_emb")
    self.item_emb = self.add_weight(
        shape=(bucket + 1, 32),
        dtype=tf.float32,
        initializer=tf.keras.initializers.TruncatedNormal(),
        name="item_emb")
    self.bias = tf.Variable(0.0)

  def call(self, inputs):
    user_id, item_id = inputs
    user_id = tf.reshape(user_id, [-1])
    item_id = tf.reshape(item_id, [-1])
    out = tf.gather(self.user_emb, user_id) * tf.gather(self.item_emb, item_id)
    out = tf.reduce_sum(out, axis=1, keepdims=True) + self.bias
    out = tf.sigmoid(out)
    return out


def main():

  def py_func(feats):
    label = feats['labels']
    return (feats['user_id'], feats['item_id']), label

  model = MyModel()

  dataset = tf.data.Dataset.from_tensor_slices({
      "user_id": np.random.randint(bucket, size=[1000, 1]),
      "item_id": np.random.randint(bucket, size=[1000, 1]),
      "labels": np.random.randint(2, size=[1000, 1])
  }).map(py_func)

  model.compile(
      keras.optimizers.SGD(0.01), 'binary_crossentropy', metrics=['AUC'])

  # model.run_eagerly = True
  model.fit(
      dataset,
      shuffle=False,
      workers=1,
      epochs=1000)

if __name__ == '__main__':
  main()

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@doldre doldre changed the title model.run_eagerly=False is much slower than model.run_eagerly=True model.run_eagerly=False is much slower than model.run_eagerly=True #tf2.0 Dec 13, 2019
@doldre doldre changed the title model.run_eagerly=False is much slower than model.run_eagerly=True #tf2.0 model.run_eagerly=False is much slower than model.run_eagerly=True Dec 13, 2019
@gadagashwini-zz gadagashwini-zz self-assigned this Dec 13, 2019
@gadagashwini-zz gadagashwini-zz added TF 2.0 Issues relating to TensorFlow 2.0 comp:eager Eager related issues type:performance Performance Issue comp:keras Keras related issues labels Dec 13, 2019
@gadagashwini-zz
Copy link
Contributor

Issue is replicating with Tf 2.0.
Please find the gist here. Thanks!

@jvishnuvardhan
Copy link
Contributor

@doldre I was able to reproduce the issue with TF2.0. However, I could not reproduce the issue with tf-nightly. There were couple of important updates that are related to keras model performance. I think those updates resolved the issue you are noticing with TF2.0. Here is the gist with default settings with tf-nightly. Here is the gist with setting model.run_eagerly=True and tf-nightly.

1.default

Train for 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 2s 2ms/step - loss: 0.6939 - AUC: 0.4710
Epoch 2/1000
1000/1000 [==============================] - 1s 1ms/step - loss: 0.6935 - AUC: 0.4834
  1. model.run_eagerly=True
Train for 1000 steps
Epoch 1/1000
1000/1000 [==============================] - 11s 11ms/step - loss: 0.6941 - AUC: 0.4770
Epoch 2/1000
1000/1000 [==============================] - 11s 11ms/step - loss: 0.6938 - AUC: 0.4881

Please close this issue if it was resolved by tf-nightly. Thanks!

@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Dec 19, 2019
@doldre
Copy link
Author

doldre commented Dec 20, 2019

@jvishnuvardhan thanks for your help. I reproduced your result with tf-nightly(2.1.0-dev20191125). It looks like the problem was fixed.

@doldre doldre closed this as completed Dec 20, 2019
@doldre
Copy link
Author

doldre commented Jan 8, 2020

I found this problem is still existed when using RMSprop optimizer.
Here is the gist with RMSprop.

@doldre doldre reopened this Jan 8, 2020
@jvishnuvardhan jvishnuvardhan added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Jan 8, 2020
@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 8, 2020
@tanzhenyu
Copy link
Contributor

This issue has been fixed, @doldre you can verify it through tf-nightly. If this is not do-able, you can also wait for 2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:eager Eager related issues comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:performance Performance Issue
Projects
Development

No branches or pull requests

5 participants