-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using tf.device when creating Keras layers does not move layer computation to that device #53671
Comments
@Saduf2019 , Updated |
Hi @Saduf2019! The Colab notebook shared above was run with a single GPU, but the code provided moves layers onto I've just modified the Colab notebook to add the following code to create two virtual GPUs, allowing you to see the issue in that environment: tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_physical_devices('GPU')
if not gpus:
raise ValueError("At least one GPU required for this test!")
if len(gpus) == 1:
# Create two virtual GPUs for this test:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=1024),
tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.list_logical_devices('GPU')
print(f"{len(gpus)} physical GPUs, split into {len(logical_gpus)} logical GPUs")
print(logical_gpus) |
I tried the code snippet above but couldn't reproduce your log. Could you provide a Colab with the log as a reproduce? My trial is similar to @tilakrayal's gist above. There are no "MatMul" anywhere in the prints. |
Hi @wangpengmit! Please see this Colab which combines the two code snippets above. The logs I pasted in above seem to be written by TensorFlow directly to the C++ stderr stream, which doesn't show up inline when executing the code in Colab - I've included instructions on how to find those logs in the Colab. |
Thanks for the reproduce! The two lines
only determine where the variables are placed. The four lines (or some subset of them) afterwards
determine where real computations like
|
Thanks @wangpengmit - unfortunately, I don't think that's a useful solution here, especially in more complex models that spread their variables (and/or layers) across multiple GPUs. Consider an example like this: _input = tf.keras.layers.Input(shape=(1,), dtype=tf.float32)
with tf.device("/GPU:0"):
x = _input
x = tf.keras.layers.Dense(10, name="should_be_on_first_gpu")(x)
x = tf.keras.layers.Dense(10, name="should_also_be_on_first_gpu")(x)
gpu0 = x
with tf.device("/GPU:1"):
x = _input
x = tf.keras.layers.Dense(10, name="should_be_on_second_gpu")(x)
x = tf.keras.layers.Dense(10, name="should_also_be_on_second_gpu")(x)
gpu1 = x
model = tf.keras.models.Model(inputs=[_input], outputs=[gpu0, gpu1])
model.compile('adam', 'mse')
model.summary()
model.fit([2], [4]) In this case, all of the computation in this model will occur on GPU0, and the current API has no ability to move that computation onto each GPU independently. (This could be possible by manually writing a training loop, manually calling the required layers in a Is there any way with the current TensorFlow and Keras APIs to force computations to be colocated on the GPU where their inputs are? |
This touches on an unfortunate TF design flaw: TF op's placement is statically determined by In your case, you can use custom layers, something like
But I understand this is less ideal than a "follow-inputs" placement system. We are investigating supporting "follow-inputs" placement semantics, but it'll take a while. |
That's great detail, thanks! Using custom layers like that will indeed work just fine for computing the forward pass, but unfortunately not the backward pass. I can't find anything in the documentation that allows layers to customize how/where to run their backward passes in TF2, and the best solution I've been able to come up with involves creating a |
Sorry for the delayed reply! Yes, this is another unfortunate problem with TF's device placement. The backward pass basically ignores all |
System information
Describe the current behavior
When creating a Keras model using the
tf.device
context manager, the resources used by that layer are placed on the requested device, but the TensorFlow ops involved in that layer do not execute on that device.Describe the expected behavior
Using the
tf.device
context manager when constructing a Keras layer should perform that layer's computation on the specified device.Standalone code to reproduce the issue
Hundreds of log lines are printed showing the placement of each op on device, but crucially, the
/GPU:1
device stores theDense
layers (i.e.:ReadVariableOp
) but is not where the computation (MatMul
, in this case) happens:The text was updated successfully, but these errors were encountered: