Skip to content

DLPack with Int32 tensor on the GPU: inconsistent eager mode / graph mode / XLA #78091

@merlinND

Description

@merlinND

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

v1.12.1-117097-gecf05620570 2.19.0-dev20241016

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04

Mobile device

No response

Python version

3.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hello,

I realize that int32 is a special dtype in TensorFlow for historical reasons. It seems that the handling of GPU int32-typed tensors has evolved over time.

Currently, the device field of a tensor created with:

with tf.device('gpu'):
    x = tf.constant([0,1,2], tf.int32)

does indicate it's a GPU tensor: /job:localhost/replica:0/task:0/device:GPU:0.

However, when exporting and re-importing it via DLPack, it comes back as a CPU tensor.
There even seems to be a unit test validating this:

if tf_tensor_dtype == dtypes.int32:
# int32 tensor is always on cpu for now
self.assertEqual(tf_tensor2.device,
"/job:localhost/replica:0/task:0/device:CPU:0")

However, @jhoydis found that this is not consistent between modes. In particular, if the tensor goes through an XLA-compiled function, it will correctly live on the GPU even after a round-trip through DLPack. (See reproducer below).

Would it please be possible to revisit this behavior, so that exporting an int32 GPU tensor via DLPack does result in a GPU DLPack capsule in all modes, not just XLA?

Standalone code to reproduce the issue

import tensorflow as tf


def f_eager(x):
    return x
f_graph = tf.function()(f_eager)
f_xla = tf.function(jit_compile=True)(f_eager)


with tf.device('gpu'):
    x = tf.constant([0,1,2], tf.int32)
    print("Original tensor:", x.device)

    dlcapsule = tf.experimental.dlpack.to_dlpack(x)
    x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
    print("Default:", x_.device)

    dlcapsule = tf.experimental.dlpack.to_dlpack(f_eager(x))
    x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
    print("Eager:", x_.device)

    dlcapsule = tf.experimental.dlpack.to_dlpack(f_graph(x))
    x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
    print("Graph:", x_.device)

    dlcapsule = tf.experimental.dlpack.to_dlpack(f_xla(x))
    x_ = tf.experimental.dlpack.from_dlpack(dlcapsule)
    print("XLA:", x_.device)

Relevant log output

Original tensor: /job:localhost/replica:0/task:0/device:GPU:0
Default: /job:localhost/replica:0/task:0/device:CPU:0
Eager: /job:localhost/replica:0/task:0/device:CPU:0
Graph: /job:localhost/replica:0/task:0/device:CPU:0
XLA: /job:localhost/replica:0/task:0/device:GPU:0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions