New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: symbolic layer triggers device creation #25946
Comments
This bug comes from In fact, the entire |
I'm guessing we'll have to stick with https://github.com/horovod/horovod/blob/master/examples/keras_imagenet_resnet50.py#L59 in a preamble for any Keras API usage. |
3 weeks with no response? |
@qlzh727 Are you a good person to look at this? |
I am quite occupied right now with some RNN work, but I will reroute this to the correct owner. |
It's not obvious to me how one would get around this given that checking devices triggers initialization code if the device is not already initialized. NHWC vs. NCHW device compatibility issues are one of the more common difficulties encountered, hence why we check for it. Ultimately, I think @alsrgv's solution is probably correct: if you need to set specific process level config it will have to be done at the very start. That said, if you can think of a better solution feel free to suggest it or open a PR. |
Device initialization is not the only issue here. def _has_nchw_support():
explicitly_on_cpu = _is_current_explicit_device('CPU')
gpus_available = bool(_get_available_gpus())
return not explicitly_on_cpu and gpus_available in There are at least three issues with this approach:
My recommendations:
|
The implementation of
appears to have more bugs than what I pointed out above: it does not handle These issues do not exist in TF 1.12 when the implementation of |
@ppwwyyxx, import tensorflow as tf
a = tf.placeholder(tf.float32, [100, 100, 100, 100])
b = tf.layers.Conv2DTranspose(3, 3, data_format='channels_first')
output = b.apply(a) using the latest version of Please find the Gist of the working code. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
System information
Describe the current behavior
The following code:
prints:
It can be seen that it initializes the GPU devices. However this should not happen in symbolic functions.
Initializing the GPU devices has many side effects.
It can lead to different types of failures, such as #8136 (comment). The largest side effect is that: any GPU-related flags given to a
tf.Session
created after device initialization will not take effect.It will also make it much harder to use horovod because horovod requires initializing the GPU in specific ways (with
visible_device_list
). If a graph withConv2DTranspose
was created before creating the session (which is the standard way of using TF 1.0), horovod will fail to initialize the session. (cc @alsrgv ).This bug exists for Conv2DTranspose, but not for Conv2D.
This bug exists in 1.13.0rc0. It does not exist in 1.12.0
The text was updated successfully, but these errors were encountered: