Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failed to get device properties, error code: 3 | tfds.numpy()] #5423

Open
WesleyHsieh0806 opened this issue May 22, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@WesleyHsieh0806
Copy link

WesleyHsieh0806 commented May 22, 2024

Short description
I tried to create a pytorch dataset from tensorflow datasets using tfds.numpy(). It works well when running without cpus. However, when I launch multiple gpu training, my process gets stuck and I got the error message 2024-05-22 01:37:48.269871: E tensorflow/core/grappler/clusters/utils.cc:80] Failed to get device properties, error code: 3.

Environment information

  • Operating System: Ubuntu 20.04.6 LTS

  • Python version: 3.8.10

  • tensorflow-datasets/tfds-nightly version: tensorflow==2.13.1, tensorflow-datasets==4.9.2

  • tensorflow/tf-nightly version: Could not find a version that satisfies the requirement tf-nightly (from versions: none)

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?
    yes

Reproduction instructions

I used the following code to create a pytorch-style dataset

dataset = tfds.as_numpy(tfds.load("droid_100", data_dir=dataset_dir, split="train"))
dataset = [data for data in dataset]

# Pytorch-Style Dataset
class PytorchStyleDataset(Dataset):
  def __getitem__(index):
    episode = dataset[index]
    img_array = []
    for i, step in enumerate(episode["steps"]):
           img = step["observation"]["exterior_image_1_left"]
   
           img_array.append(img)
    # The process stucks at the last iteration of the for loop
    img_array = np.stack(img_array)
    return img_array

Link to logs
2024-05-22 08:19:45.984980: E tensorflow/core/grappler/clusters/utils.cc:80] Failed to get device properties, error code: 3
2024-05-22 08:19:45.986367: E tensorflow/core/grappler/clusters/utils.cc:80] Failed to get device properties, error code: 3
2024-05-22 08:19:45.989617: E tensorflow/core/grappler/clusters/utils.cc:80] Failed to get device properties, error code: 3
2024-05-22 08:19:46.003328: E tensorflow/core/grappler/clusters/utils.cc:80] Failed to get device properties, error code: 3

Expected behavior
I can successfully use tfds.numpy() to create pytorch-style dataset, and getitem() should return the image array as numpy.

Real behavior
The process stucks at the last iteration of the for loop in getitem(), and the error message pops out.

@WesleyHsieh0806 WesleyHsieh0806 added the bug Something isn't working label May 22, 2024
@fineguy fineguy self-assigned this May 22, 2024
@fineguy
Copy link
Collaborator

fineguy commented May 22, 2024

Hi @WesleyHsieh0806 !

This seems like a problem with Tensorflow rather than TFDS. Could you try updating both to the latest versions?

@WesleyHsieh0806
Copy link
Author

WesleyHsieh0806 commented May 23, 2024

I updated tensorflow to 2.16.1 and tfds to 4.9.4. The process works now, but the message still exists

@WesleyHsieh0806
Copy link
Author

My guess is tensorflow dataset is somehow occupying my gpu memory.
Disabling GPUs with tf.config.set_visible_devices([], 'GPU') fixed the issue, but it seems like a hack. I'd prefer a more permanent solution that doesn't involve manually hiding GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants