-
Notifications
You must be signed in to change notification settings - Fork 75.3k
DLPack with Int32 tensor on the GPU: inconsistent eager mode / graph mode / XLA #78091
Description
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
v1.12.1-117097-gecf05620570 2.19.0-dev20241016
Custom code
No
OS platform and distribution
Linux Ubuntu 22.04
Mobile device
No response
Python version
3.12
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Hello,
I realize that int32 is a special dtype in TensorFlow for historical reasons. It seems that the handling of GPU int32-typed tensors has evolved over time.
Currently, the device field of a tensor created with:
with tf.device('gpu'):
x = tf.constant([0,1,2], tf.int32)does indicate it's a GPU tensor: /job:localhost/replica:0/task:0/device:GPU:0.
However, when exporting and re-importing it via DLPack, it comes back as a CPU tensor.
There even seems to be a unit test validating this:
tensorflow/tensorflow/python/dlpack/dlpack_test.py
Lines 75 to 78 in d3de971
| if tf_tensor_dtype == dtypes.int32: | |
| # int32 tensor is always on cpu for now | |
| self.assertEqual(tf_tensor2.device, | |
| "/job:localhost/replica:0/task:0/device:CPU:0") |
However, @jhoydis found that this is not consistent between modes. In particular, if the tensor goes through an XLA-compiled function, it will correctly live on the GPU even after a round-trip through DLPack. (See reproducer below).
Would it please be possible to revisit this behavior, so that exporting an int32 GPU tensor via DLPack does result in a GPU DLPack capsule in all modes, not just XLA?
Standalone code to reproduce the issue
import tensorflow as tf
def f_eager(x):
return x
f_graph = tf.function()(f_eager)
f_xla = tf.function(jit_compile=True)(f_eager)
with tf.device('gpu'):
x = tf.constant([0,1,2], tf.int32)
print("Original tensor:", x.device)
dlcapsule = tf.experimental.dlpack.to_dlpack(x)
x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
print("Default:", x_.device)
dlcapsule = tf.experimental.dlpack.to_dlpack(f_eager(x))
x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
print("Eager:", x_.device)
dlcapsule = tf.experimental.dlpack.to_dlpack(f_graph(x))
x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
print("Graph:", x_.device)
dlcapsule = tf.experimental.dlpack.to_dlpack(f_xla(x))
x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
print("XLA:", x_.device)Relevant log output
Original tensor: /job:localhost/replica:0/task:0/device:GPU:0
Default: /job:localhost/replica:0/task:0/device:CPU:0
Eager: /job:localhost/replica:0/task:0/device:CPU:0
Graph: /job:localhost/replica:0/task:0/device:CPU:0
XLA: /job:localhost/replica:0/task:0/device:GPU:0