New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795
Comments
Same issue for me here! |
I tried on colab but i didn't see any error. |
I've tried to run my script today, and it went without any errors. |
This code snippet works well when run from python console. Maybe the issue is in the drivers? |
The error is now coming up intermittently. Sometimes on the second call, sometimes on the 7th or 8th call. I am not able to recreate the error consistently. |
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process and particular TF Session. |
Restarting the kernel ensures that the first call always succeeds. I guess everything is working as intended. |
I have same issue.
After restarting kernel, the system get stuck |
I have the same errors! Do you fix it already? And I am curious why it doesn't show the memory information of GPU in the log message. Here are my log messages 2019-10-30 16:40:04.156937: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error |
After many experiments, I found that the problem is that there is a problem with the graphics card allocation. Just open the terminal multiple times and re-experiment. |
After removing these lines config = tf.ConfigProto()config.gpu_options.allow_growth = Truesession = tf.Session(config=config)that I have tried to solve the CUBLAS_STATUS_ALLOC_FAILED error, |
System information
Describe the current behavior
Python stopped working
2019-08-20 18:38:59.811455: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
Describe the expected behavior
should print 'Hello, TensorFlow-GPU!'
Code to reproduce the issue
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
hello_gpu = tf.constant('Hello, TensorFlow-GPU!')
sess_gpu = tf.Session()
print(sess_gpu.run(hello_gpu))
Other info / logs
The first print statement generates b'Hello, TensorFlow!'. But the second tf.session() in the same jupyter notebook crashes python
2019-08-20 18:44:31.855812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
2019-08-20 18:44:31.863667: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-20 18:44:31.868460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-20 18:44:31.870987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-20 18:44:31.875292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-20 18:44:31.877960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-20 18:44:31.881525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1391 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-08-20 18:45:07.339418: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
The text was updated successfully, but these errors were encountered: