New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.device scope not working correctly #44510
Comments
Will it be possible to share complete code snippet .Please, share the output of |
NOTE : uses input data file mnist.npzimport tensorflow as tf
from tensorflow import keras
import time
import socket
import os
tf.debugging.set_log_device_placement(True)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
gpus = tf.config.list_physical_devices('GPU')
print(' gpus = ', gpus)
def get_compiled_model():
# Make a simple 2-layer densely-connected neural network.
inputs = keras.Input(shape=(784,))
print("On GPU:0")
with tf.device("/device:GPU:0"):
x = keras.layers.Dense(256, activation="relu")(inputs)
#assert x.device.endswith("/GPU:0")
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
print("On GPU:1")
with tf.device("/device:GPU:1"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
x = keras.layers.Dense(256, activation="relu")(inputs)
#assert x.device.endswith("/GPU:1")
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
print("On GPU:2")
with tf.device("/device:GPU:2"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
x = keras.layers.Dense(256, activation="relu")(inputs)
#assert x.device.endswith("/GPU:2")
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
x = keras.layers.Dense(256, activation="relu")(inputs)
print("On GPU:3")
with tf.device("/device:GPU:3"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
x = keras.layers.Dense(256, activation="relu")(x)
#assert x.device.endswith("GPU:3")
x = keras.layers.Dense(256, activation="relu")(x)
x = keras.layers.Dense(256, activation="relu")(x)
x = keras.layers.Dense(256, activation="relu")(x)
x = keras.layers.Dense(256, activation="relu")(x)
x = keras.layers.Dense(256, activation="relu")(x)
outputs = keras.layers.Dense(10)(x)
model = keras.Model(inputs, outputs)
opt = keras.optimizers.Adam()
model.compile(
optimizer=opt,
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
experimental_run_tf_function=False,
)
return model
def get_dataset():
batch_size = 32
num_val_samples = 10000
# Return the MNIST dataset in the form of a `tf.data.Dataset`.
path = '/g/g92/jtaylor/workspace/TFnew/mnist.npz'
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data(path)
# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve num_val_samples samples for validation
x_val = x_train[-num_val_samples:]
y_val = y_train[-num_val_samples:]
x_train = x_train[:-num_val_samples]
y_train = y_train[:-num_val_samples]
return (
tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size),
tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(batch_size),
tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size),
)
model = get_compiled_model()
# Train the model on all available devices.
train_dataset, val_dataset, test_dataset = get_dataset()
model.fit(train_dataset, epochs=2, validation_data=val_dataset)
# Test the model on all available devices.
model.evaluate(test_dataset) |
Output of tf.config.list_physical_devices() as requested:- Num GPUs Available: 4
gpus = [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'),
PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')] |
Please, let us know which TF version you are using? |
PowerAi Tensorflow version 2.1.0 |
@JohnTaylor2000, Thanks! |
The actual model that I am working on is too large to fit into GPU memory. I have a data parallel code using horovod that runs on hundreds of GPUs but now need to use a larger model. To do this I need to spread the layers across multiple GPUs. |
Hi all, just wondering if you have been able to run the test code that I have provided and/or need any further help? |
Wondering if you can confirm that this is an issue based on my supplied bug report and if you have a solution? This is a roadblock to my research so I am keen to have this resolved - any help greatly appreciated! |
Any developments on this issue please? |
Any help available? I have had no response since 5 November? |
I think TF currently does not support model parallelism with keras model like what you have written. However, I believe they do support model parallelism with primitive operations. Also, it is almost like they do not have any docs about model parallelism which is an issue.
|
Thank you for the interesting solution to this problem. I am investigating whether this will work on the model that I have developed. I noticed that the latest versions of tf.keras.layers no longer has the attribute 'device' which means, as you say, that Keras no longer supports model parallelism where we assign layers to a device. Interestingly, the development of GPUs with much larger memory eg 80GB on the A100 and the proposed new Grace architecture which allow high bandwidth access to CPU memory, will reduce the need for model parallelism. However, this will likely be offset by the desire to build bigger more complex model systems using the Keras functional API. |
@JohnTaylor2000 |
I have been able to test this using TensorFlow 2.6.0 using a test code. I do not have access to 2.7.0 at the moment. Still seeing the same problem when using a dense layer tf.keras.layers.Dense the response was here is a simple test where the print command will generate the error above:-
|
Hi @JohnTaylor2000 ! You can use set_visible_devices to disable and enable specific GPU's during operation. Have you tried the same in 2.8 version yet? |
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
tf.device command does not correctly assign a GPU device to tf.keras layers on node with 4 GPUs so cannot implement model parallelism. All layers appear on device GPU:0 with the exception of some IO based on output of tf.debugging.set_log_device_placement(True)
Describe the expected behavior
tf.keras layers are correctly assigned to a device.
Standalone code to reproduce the issue
import tensorflow as tf
from tensorflow import keras
tf.debugging.set_log_device_placement(True)
print("On GPU:1")
inputs = keras.Input(shape=(784,))
with tf.device("/device:GPU:1"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
x = keras.layers.Dense(256, activation="relu")(inputs)
print(x)
assert x.device.endswith("/GPU:1")
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
I have a larger test problem that will run on a 4 GPU node. If you turn off the assert statement, then using nvidia-smi you can see that all memory and computational work is happening on GPU:0 and almost none is assigned to other GPUs. Happy to supply this code if needed.
Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
Tensor("dense/Identity:0", shape=(None, 256), dtype=float32)
Traceback (most recent call last):
File "py_test.py", line 11, in
assert x.device.endswith("/GPU:1")
AssertionError
The text was updated successfully, but these errors were encountered: