Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

Closed
ammar-nizami opened this issue Aug 20, 2019 · 14 comments
Assignees
Labels
comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author TF 1.14 for issues seen with TF 1.14 type:bug Bug

Comments

@ammar-nizami
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Enterprise 64-bit
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip install tensorflow-gpu
  • TensorFlow version (use command below): 1.14.0
  • Python version: 3.7.4
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: cuda_10.0.130_411.31_win10 / cudnn-10.0-windows10-x64-v7.6.2.24
  • GPU model and memory: Nvidia GeForce 940MX 2GB

Describe the current behavior
Python stopped working
2019-08-20 18:38:59.811455: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

Describe the expected behavior
should print 'Hello, TensorFlow-GPU!'

Code to reproduce the issue
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

hello_gpu = tf.constant('Hello, TensorFlow-GPU!')
sess_gpu = tf.Session()
print(sess_gpu.run(hello_gpu))

Other info / logs
The first print statement generates b'Hello, TensorFlow!'. But the second tf.session() in the same jupyter notebook crashes python

2019-08-20 18:44:31.855812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
2019-08-20 18:44:31.863667: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-20 18:44:31.868460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-20 18:44:31.870987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-20 18:44:31.875292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-20 18:44:31.877960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-20 18:44:31.881525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1391 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-08-20 18:45:07.339418: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

@ammar-nizami
Copy link
Author

@nickyfoster
Copy link

Same issue for me here!

@gadagashwini-zz gadagashwini-zz self-assigned this Aug 21, 2019
@gadagashwini-zz gadagashwini-zz added TF 1.14 for issues seen with TF 1.14 comp:gpu GPU related issues type:bug Bug labels Aug 21, 2019
@gadagashwini-zz
Copy link
Contributor

#28582

@gadagashwini-zz
Copy link
Contributor

I tried on colab but i didn't see any error.

@nickyfoster
Copy link

I've tried to run my script today, and it went without any errors.
Then I paused my script and started it again from checkpoint - and the error appeared again.

@akademi4eg
Copy link

This code snippet works well when run from python console. Maybe the issue is in the drivers?
TF 1.14.0, Python 3.6, CUDA 10.0

@ammar-nizami
Copy link
Author

ammar-nizami commented Aug 21, 2019

The error is now coming up intermittently. Sometimes on the second call, sometimes on the 7th or 8th call. I am not able to recreate the error consistently.

@ymodak
Copy link
Contributor

ymodak commented Aug 21, 2019

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process and particular TF Session.
Therefore you have to kill the process and not only the tf session.
If using python interpreter then you have to exit your python interpreter.
If using jupyter notebook then kill the kernel.

@ymodak ymodak added the stat:awaiting response Status - Awaiting response from author label Aug 21, 2019
@ammar-nizami
Copy link
Author

ammar-nizami commented Aug 22, 2019

Restarting the kernel ensures that the first call always succeeds. I guess everything is working as intended.

@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@hpahkala
Copy link

hpahkala commented Oct 2, 2019

I have same issue.
Win10 Python3.6.8 TF1.14 CUDA10.1 cuDNN7.6.4

# Works fine and gives answer 1 GPU
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices()) # Crash "failed to get device attribute 13 for device 0:"

After restarting kernel, the system get stuck

@yongliu93
Copy link

I have the same errors! Do you fix it already? And I am curious why it doesn't show the memory information of GPU in the log message. Here are my log messages
2019-10-30 16:35:31.338022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-30 16:35:31.341495: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-30 16:35:31.349476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:01:00.0
2019-10-30 16:35:31.357431: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-30 16:35:31.363190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-30 16:35:31.991901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-30 16:35:31.997356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-30 16:35:32.000553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-30 16:35:32.004830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)

2019-10-30 16:40:04.156937: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

@BlueStragglers
Copy link

After many experiments, I found that the problem is that there is a problem with the graphics card allocation. Just open the terminal multiple times and re-experiment.

@afaq-ahmad
Copy link

After removing these lines

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config)

that I have tried to solve the CUBLAS_STATUS_ALLOC_FAILED error,
it running now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author TF 1.14 for issues seen with TF 1.14 type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants