Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: symbolic layer triggers device creation #25946

Closed
ppwwyyxx opened this issue Feb 20, 2019 · 12 comments
Closed

BUG: symbolic layer triggers device creation #25946

ppwwyyxx opened this issue Feb 20, 2019 · 12 comments
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.13 Issues related to TF 1.13 type:bug Bug

Comments

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Feb 20, 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):linux ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:n/a
  • TensorFlow installed from (source or binary):binary
  • TensorFlow version (use command below):b'v1.13.0-rc2-0-gc865ec5621' 1.13.0-rc2
  • Python version:3.7
  • Bazel version (if compiling from source):n/a
  • GCC/Compiler version (if compiling from source):n/a
  • CUDA/cuDNN version:10.0 / 7.4.2
  • GPU model and memory:gtx960M

Describe the current behavior
The following code:

import tensorflow as tf
a = tf.placeholder(tf.float32, [100, 100, 100, 100])
b = tf.layers.Conv2DTranspose(3, 3, data_format='channels_first')
output = b.apply(a)

prints:

2019-02-20 10:20:05.505595: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-20 10:20:05.578782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-20 10:20:05.579477: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fd579f65d0 executing computations on platform CUDA. Devices:
2019-02-20 10:20:05.579513: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
2019-02-20 10:20:05.606095: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz                                
2019-02-20 10:20:05.606746: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fd57b39b00 executing computations on platform Host. Devices:
2019-02-20 10:20:05.606785: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>               
2019-02-20 10:20:05.607093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:                              
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.0975
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2019-02-20 10:20:05.607118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0                                
2019-02-20 10:20:05.608205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-20 10:20:05.608229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0                                                        
2019-02-20 10:20:05.608240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N                                                       
2019-02-20 10:20:05.608504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1742 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)        

It can be seen that it initializes the GPU devices. However this should not happen in symbolic functions.

Initializing the GPU devices has many side effects.
It can lead to different types of failures, such as #8136 (comment). The largest side effect is that: any GPU-related flags given to a tf.Session created after device initialization will not take effect.
It will also make it much harder to use horovod because horovod requires initializing the GPU in specific ways (with visible_device_list). If a graph with Conv2DTranspose was created before creating the session (which is the standard way of using TF 1.0), horovod will fail to initialize the session. (cc @alsrgv ).

This bug exists for Conv2DTranspose, but not for Conv2D.
This bug exists in 1.13.0rc0. It does not exist in 1.12.0

@ppwwyyxx
Copy link
Contributor Author

ppwwyyxx commented Feb 20, 2019

This bug comes from keras/backend.py, where conv2d_transpose listed available devices to check data_format.

In fact, the entire keras/backend.py file heavily relies on looking at the available devices.

@alsrgv
Copy link
Contributor

alsrgv commented Feb 21, 2019

I'm guessing we'll have to stick with https://github.com/horovod/horovod/blob/master/examples/keras_imagenet_resnet50.py#L59 in a preamble for any Keras API usage.

@facaiy facaiy added type:bug Bug comp:keras Keras related issues TF 1.13 Issues related to TF 1.13 labels Feb 22, 2019
@ppwwyyxx
Copy link
Contributor Author

3 weeks with no response?

@facaiy facaiy added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 15, 2019
@facaiy
Copy link
Member

facaiy commented Mar 15, 2019

@qlzh727 Are you a good person to look at this?

@qlzh727
Copy link
Member

qlzh727 commented Mar 15, 2019

I am quite occupied right now with some RNN work, but I will reroute this to the correct owner.

@robieta
Copy link

robieta commented Mar 15, 2019

It's not obvious to me how one would get around this given that checking devices triggers initialization code if the device is not already initialized. NHWC vs. NCHW device compatibility issues are one of the more common difficulties encountered, hence why we check for it. Ultimately, I think @alsrgv's solution is probably correct: if you need to set specific process level config it will have to be done at the very start.

That said, if you can think of a better solution feel free to suggest it or open a PR.

@ppwwyyxx
Copy link
Contributor Author

Device initialization is not the only issue here.
A summary of the cause:
Certain Keras layers call the following function:

def _has_nchw_support():
  explicitly_on_cpu = _is_current_explicit_device('CPU')
  gpus_available = bool(_get_available_gpus())
  return not explicitly_on_cpu and gpus_available

in keras/backend.py. When the function returns False but the layer is called with NCHW format, the layer will apply some format conversions, such as transpose.

There are at least three issues with this approach:

  1. The function _has_nchw_support is clearly wrong.
    Many of the involved ops supports NCHW on CPUs with MKL build, and on TPUs.

    Consequences: These Keras layers do not behave properly (transpose may be added) on CPUs with MKL build or on TPUs.

  2. Graph construction should be conceptually independent of execution.
    -- This IMHO is the core beauty of a graph computation framework.

    By looking at available devices for graph construction, it is making an implicit assumption that the graph will be executed on the same device, which is often not a valid assumption.

    Consequences: These Keras layers do not behave properly if the graph is not executed on the same device. Examples include:
    (1) Creating a graph for deployment (on different machines)
    (2) Architecture search (where some worker generates graphs and other workers run it)
    (3) Distributed graph with heterogeneous workers, where the whole graph can be constructed on one single worker.

    The automatic format conversion, if needed, should be done on the execution level instead.

  3. Looking at GPU devices has side effects. This is an unfortunate fact.

    Consequences: After constructing the graph with these Keras layers, users cannot create sessions with custom configs, and as a result cannot use Horovod, set memory fraction, and many others.

    Workaround: Create session before graph. But this would break the define-and-run standard paradigm of TF 1.0. Most code using TF is not written like this.

My recommendations:

  • The first issue obviously needs to be addressed.
  • For backward compability with previous versions, adds a switch so that these layers do not look at devices when called from tf.layers, but can look at devices when called from tf.keras.layers.
    I personally prefer to see the code crash (rather than secretly transpose many times) when there are no appropriate kernels registered on the devices.
  • In the long run it's best to not look at devices at all and transform the graph in execution.

@ppwwyyxx
Copy link
Contributor Author

ppwwyyxx commented Mar 31, 2019

The implementation of

def _has_nchw_support():
  explicitly_on_cpu = _is_current_explicit_device('CPU')
  gpus_available = bool(_get_available_gpus())
  return not explicitly_on_cpu and gpus_available

appears to have more bugs than what I pointed out above: it does not handle DeviceSpec correctly. This makes valid code to crash, reported in #27259 and #23197.

These issues do not exist in TF 1.12 when the implementation of Conv2DTranspose is not backed by Keras.

@rchao rchao self-assigned this Aug 19, 2019
@rmothukuru rmothukuru added stat:awaiting tensorflower Status - Awaiting response from tensorflower stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Mar 12, 2021
@rmothukuru rmothukuru self-assigned this Mar 17, 2021
@rmothukuru
Copy link
Contributor

@ppwwyyxx,
Sorry for the delayed response. When we execute the code,

import tensorflow as tf
a = tf.placeholder(tf.float32, [100, 100, 100, 100])
b = tf.layers.Conv2DTranspose(3, 3, data_format='channels_first')
output = b.apply(a)

using the latest version of Tensorflow with slight modifications with respect to Compatibility, we see that GPUs are no more initialized.

Please find the Gist of the working code. Thanks!

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 24, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 1.13 Issues related to TF 1.13 type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants