Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

ammar-nizami · 2019-08-20T13:17:04Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Enterprise 64-bit
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version (use command below): 1.14.0
Python version: 3.7.4
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: cuda_10.0.130_411.31_win10 / cudnn-10.0-windows10-x64-v7.6.2.24
GPU model and memory: Nvidia GeForce 940MX 2GB

Describe the current behavior
Python stopped working
2019-08-20 18:38:59.811455: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

Describe the expected behavior
should print 'Hello, TensorFlow-GPU!'

Code to reproduce the issue
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

hello_gpu = tf.constant('Hello, TensorFlow-GPU!')
sess_gpu = tf.Session()
print(sess_gpu.run(hello_gpu))

Other info / logs
The first print statement generates b'Hello, TensorFlow!'. But the second tf.session() in the same jupyter notebook crashes python

2019-08-20 18:44:31.855812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
2019-08-20 18:44:31.863667: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-20 18:44:31.868460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-20 18:44:31.870987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-20 18:44:31.875292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-20 18:44:31.877960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-20 18:44:31.881525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1391 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-08-20 18:45:07.339418: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

ammar-nizami · 2019-08-20T13:17:51Z

same as https://stackoverflow.com/questions/57121326

nickyfoster · 2019-08-20T21:21:28Z

Same issue for me here!

gadagashwini-zz · 2019-08-21T08:21:35Z

#28582

gadagashwini-zz · 2019-08-21T08:22:06Z

I tried on colab but i didn't see any error.

nickyfoster · 2019-08-21T10:01:19Z

I've tried to run my script today, and it went without any errors.
Then I paused my script and started it again from checkpoint - and the error appeared again.

akademi4eg · 2019-08-21T12:17:18Z

This code snippet works well when run from python console. Maybe the issue is in the drivers?
TF 1.14.0, Python 3.6, CUDA 10.0

ammar-nizami · 2019-08-21T13:20:09Z

The error is now coming up intermittently. Sometimes on the second call, sometimes on the 7th or 8th call. I am not able to recreate the error consistently.

ymodak · 2019-08-21T19:06:29Z

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process and particular TF Session.
Therefore you have to kill the process and not only the tf session.
If using python interpreter then you have to exit your python interpreter.
If using jupyter notebook then kill the kernel.

ammar-nizami · 2019-08-22T11:10:31Z

Restarting the kernel ensures that the first call always succeeds. I guess everything is working as intended.

tensorflow-bot · 2019-08-22T11:10:32Z

Are you satisfied with the resolution of your issue?
Yes
No

hpahkala · 2019-10-02T19:04:12Z

I have same issue.
Win10 Python3.6.8 TF1.14 CUDA10.1 cuDNN7.6.4

# Works fine and gives answer 1 GPU
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices()) # Crash "failed to get device attribute 13 for device 0:"

After restarting kernel, the system get stuck

yongliu93 · 2019-10-30T20:48:29Z

I have the same errors! Do you fix it already? And I am curious why it doesn't show the memory information of GPU in the log message. Here are my log messages
2019-10-30 16:35:31.338022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-30 16:35:31.341495: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-30 16:35:31.349476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:01:00.0
2019-10-30 16:35:31.357431: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-30 16:35:31.363190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-30 16:35:31.991901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-30 16:35:31.997356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-30 16:35:32.000553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-30 16:35:32.004830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)

2019-10-30 16:40:04.156937: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

BlueStragglers · 2020-03-17T08:08:36Z

After many experiments, I found that the problem is that there is a problem with the graphics card allocation. Just open the terminal multiple times and re-experiment.

afaq-ahmad · 2020-04-12T14:40:25Z

After removing these lines

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config)

that I have tried to solve the CUBLAS_STATUS_ALLOC_FAILED error,
it running now

gadagashwini-zz self-assigned this Aug 21, 2019

gadagashwini-zz added TF 1.14 for issues seen with TF 1.14 comp:gpu GPU related issues type:bug Bug labels Aug 21, 2019

gadagashwini-zz assigned ymodak and unassigned gadagashwini-zz Aug 21, 2019

ymodak added the stat:awaiting response Status - Awaiting response from author label Aug 21, 2019

ammar-nizami closed this as completed Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

ammar-nizami commented Aug 20, 2019

ammar-nizami commented Aug 20, 2019

nickyfoster commented Aug 20, 2019

gadagashwini-zz commented Aug 21, 2019

gadagashwini-zz commented Aug 21, 2019

nickyfoster commented Aug 21, 2019

akademi4eg commented Aug 21, 2019

ammar-nizami commented Aug 21, 2019 •

edited

ymodak commented Aug 21, 2019

ammar-nizami commented Aug 22, 2019 •

edited

tensorflow-bot bot commented Aug 22, 2019

hpahkala commented Oct 2, 2019 •

edited

yongliu93 commented Oct 30, 2019

BlueStragglers commented Mar 17, 2020

afaq-ahmad commented Apr 12, 2020

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 #31795

Comments

ammar-nizami commented Aug 20, 2019

ammar-nizami commented Aug 20, 2019

nickyfoster commented Aug 20, 2019

gadagashwini-zz commented Aug 21, 2019

gadagashwini-zz commented Aug 21, 2019

nickyfoster commented Aug 21, 2019

akademi4eg commented Aug 21, 2019

ammar-nizami commented Aug 21, 2019 • edited

ymodak commented Aug 21, 2019

ammar-nizami commented Aug 22, 2019 • edited

tensorflow-bot bot commented Aug 22, 2019

hpahkala commented Oct 2, 2019 • edited

yongliu93 commented Oct 30, 2019

BlueStragglers commented Mar 17, 2020

afaq-ahmad commented Apr 12, 2020

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config)

ammar-nizami commented Aug 21, 2019 •

edited

ammar-nizami commented Aug 22, 2019 •

edited

hpahkala commented Oct 2, 2019 •

edited