Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION #17747

Closed
fsaxen opened this issue Mar 15, 2018 · 17 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@fsaxen
Copy link

fsaxen commented Mar 15, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 1.6.0
  • Python version: 3.5.2
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 9.0.176.2 / cudnn7.0.5
  • GPU model and memory: GTX 1080 (8GB)
  • Exact command to reproduce:
import numpy as np
import tensorflow as tf

batch_size = 64
images = tf.random_normal(shape=[batch_size, 32, 32, 3], dtype=tf.float32)
angles = tf.random_uniform([batch_size], -0.5, 0.5)
images = tf.contrib.image.rotate(images, angles)

with tf.Session() as sess:
    _ = sess.run(images)

Any Idea why this small example produces the following error?

ERROR:
2018-03-16 18:27:24.292665: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-03-16 18:27:24.292700: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_driver.cc:1110] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_INSTRUCTION ::
2018-03-16 18:27:24.296409: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1

UPDATE:
I think issue #17485 is very similar

@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Mar 16, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Bazel version

@gzzzzzzz
Copy link

tf.contrib.image.rotate(images, angles) crashed on Win10,python3.6,tensorflow1.6,CUDA9,cudnn7.0,1080Ti。
Error Code:failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-03-17 22:26:12.688690: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1

@xinlinli170
Copy link

xinlinli170 commented Mar 30, 2018

Same, tf.contrib.image.rotate(images, angles) crashed on Win10, python 3.6, tensorflow 1.4, CUDA 8.0, cuDNN 6, 1080ti.

Error message:
2018-03-30 11:31:04.028220: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2018-03-30 11:31:04.042984: F C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1

After I upgrade to tensorflow 1.7.0, CUDA 9.0, cuDNN 7.
Error message become:
2018-03-30 11:55:22.974188: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-03-30 11:55:22.979694: F T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1

Works well on CPU all the time, only crash when using GPU.

Try to use tf.contrib.image.angles_to_projective_transforms() and tf.contrib.image.transform(), same error

@aligokalppeker
Copy link

aligokalppeker commented Apr 29, 2018

Same issue is valid for my environment also; Windows 10 x64, Tensorflow 1.7.0, CUDA 9, cuDNN 7, Python 3.6

2018-04-29 21:11:23.422100: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-04-29 21:11:23.422100: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-04-29 21:11:23.422361: F T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1
2018-04-29 21:11:23.422608: F T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:203] Unexpected Event status: 1

@lixiang-ucas
Copy link

lixiang-ucas commented May 25, 2018

got same issue using tf1.4.0,cuda8.0,cudnn6.0.21. The same code works fine sometimes and reports the error some other time.

@AloshkaD
Copy link

I recreated a fresh conda env with cdnn, tensorflow gpu1.8, and keras gpu 2.1 installed from anaconda. Before that I downgraded my cuda from 9.2 to 9.0

@manikaaggarwal
Copy link

I also get this issue, in between of the training.

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Jul 20, 2018
@aselle aselle assigned zheng-xq and unassigned aselle Jul 20, 2018
@aselle
Copy link
Contributor

aselle commented Jul 20, 2018

@zheng-xq, Do you know about the state of windows CUDA support. Otherwise assign it back and we should assign it to who implemented rotate images.

@WhiteCipher
Copy link

same error!
Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS

@Henrystarstar
Copy link

same error~ Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION

@tensorflowbutler
Copy link
Member

Nagging Assignee @zheng-xq: It has been 30 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@maidaly
Copy link

maidaly commented Sep 22, 2018

I also have the same error . CUDA_ERROR_ILLEGAL_INSTRUCTION. we just need to know which instruction caused this error as the gpu works fine else and the code works fine if we change the device to be cpu instead.

@gogobd
Copy link

gogobd commented Oct 6, 2018

I've got the same issue running the code above with Windows 10, tensorflow 1.8.0, cudatoolkit 9.0 (anaconda), cudnn 7.1.4 (anaconda) on a GTX 970 (msi), any help would be very appreciated...

>python test_tf_CUDA_ERROR_ILLEGAL_INSTRUCTION.py
2018-10-06 21:06:28.880502: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.253
pciBusID: 0000:02:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2018-10-06 21:06:28.892851: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-10-06 21:06:29.561979: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-06 21:06:29.567773: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-10-06 21:06:29.574945: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-10-06 21:06:29.579561: I C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2048 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:02:00.0, compute capability: 5.2)
2018-10-06 21:06:29.887384: E C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:1110] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_INSTRUCTION ::
2018-10-06 21:06:29.887400: E C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2018-10-06 21:06:29.898608: F C:\users\nwani\_bazel_nwani\mmtm6wb6\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:208] Unexpected Event status: 1

The code to reproduce this is:

import tensorflow as tf

batch_size = 64
images = tf.random_normal(shape=[batch_size, 32, 32, 3], dtype=tf.float32)
angles = tf.random_uniform([batch_size], -0.5, 0.5)
images = tf.contrib.image.rotate(images, angles)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)

with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    _ = sess.run(images)

@ymodak
Copy link
Contributor

ymodak commented Oct 17, 2018

@fsaxen Is this still an issue? @gogobd Can you please test it against latest TensorFlow version and post your findings?

@ymodak ymodak added the stat:awaiting response Status - Awaiting response from author label Oct 17, 2018
@ymodak
Copy link
Contributor

ymodak commented Oct 22, 2018

Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@ymodak ymodak closed this as completed Oct 22, 2018
@gogobd
Copy link

gogobd commented Oct 29, 2018

I updated to 1.11.0 and the problem is gone. Thank you!

@zhangjunhust
Copy link

same error!
Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS

Have you figured it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests