Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM #5772

Closed
yetionyo opened this issue Nov 22, 2016 · 18 comments
Closed

could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM #5772

yetionyo opened this issue Nov 22, 2016 · 18 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@yetionyo
Copy link

The version of cuda and cudnn meets the requirement, but still cannot use cudnn properly.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

Operating System:
Linux version 3.16.0-30-generic (buildd@kissel) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #40~14.04.1-Ubuntu

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):
-rw-r--r-- 1 root root 558720 Sep 15 07:02 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root 16 Sep 15 07:05 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.8.0
lrwxrwxrwx 1 root root 19 Sep 15 07:05 /usr/local/cuda/lib64/libcudart.so.8.0 -> libcudart.so.8.0.44
-rw-r--r-- 1 root root 415432 Sep 15 07:02 /usr/local/cuda/lib64/libcudart.so.8.0.44
-rw-r--r-- 1 root root 775162 Sep 15 07:02 /usr/local/cuda/lib64/libcudart_static.a
lrwxrwxrwx 1 root root 13 Nov 22 10:55 /usr/local/cuda/lib64/libcudnn.so -> libcudnn.so.5
lrwxrwxrwx 1 root root 17 Nov 22 10:55 /usr/local/cuda/lib64/libcudnn.so.5 -> libcudnn.so.5.1.5
-rw-r--r-- 1 root root 78065952 Nov 22 10:09 /usr/local/cuda/lib64/libcudnn.so.5.0.5
-rw-r--r-- 1 root root 79337624 Nov 22 10:17 /usr/local/cuda/lib64/libcudnn.so.5.1.5
-rw-r--r-- 1 root root 69756172 Nov 22 10:17 /usr/local/cuda/lib64/libcudnn_static.a

If installed from binary pip package, provide:

  1. A link to the pip package you installed:
    export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

  2. The output from python -c "import tensorflow; print(tensorflow.__version__)".
    I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
    I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
    I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
    I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
    I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
    0.11.0

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

when trying to call a function that is only supported by cudnn, for example conv2d

@prb12
Copy link
Member

prb12 commented Nov 22, 2016

@yetionyo Could you please supply a minimal repro example?
@zheng-xq Can you think of any reason why this might happen?

@prb12 prb12 added stat:awaiting tensorflower Status - Awaiting response from tensorflower stat:awaiting response Status - Awaiting response from author labels Nov 22, 2016
@yetionyo
Copy link
Author

It is proved to be irrelevant with conv2d itself, maybe it's related with the way I used conv2d, because I can run this demo without this problem.
import tensorflow as tf

my_data = tf.random_normal([20,20,20,3])
my_filter = tf.random_normal([3,3,3,10])
conv_result = tf.nn.conv2d(my_data, my_filter, strides=[1, 1, 1, 1], padding="VALID")
sess = tf.Session()
result = sess.run(conv_result)
print result

But it's a little strange that what kind of operation would lead to this problem (it's more like a failure of calling cudnn)

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 23, 2016
@prb12
Copy link
Member

prb12 commented Nov 23, 2016

Similar problem to #5476, #4909 and #4111 ?

All these seem to be mention passing an empty numpy array into TF.... @zheng-xq Is there perhaps some input validation missing on cuDNN ops?

@yetionyo
Copy link
Author

Yeah, these problems are similar to mine. Maybe empty numpy array is not main reason in this problem, but some improper ops indeed exist. Thanks :)

@prb12 prb12 reopened this Nov 24, 2016
@prb12
Copy link
Member

prb12 commented Nov 24, 2016

I'd like to leave this open until we understand why an empty array causes a CUDA error, rather than a TensorFlow runtime InvalidArgument error status.

@gibiansky
Copy link
Contributor

Looks like this is still an issue on current master. It would be nice to get this fixed! The CUDA error is quite mysterious when you run into it.

@aselle aselle added type:bug Bug and removed bug labels Feb 9, 2017
@ronghanghu
Copy link

This issue seems to affect TensorFlow Fold, which uses dynamic network structures and can often generate empty tensor if a path is not used in a dynamic batch

@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly.

@zheng-xq zheng-xq assigned yzhwang and unassigned zheng-xq Dec 22, 2017
@yzhwang
Copy link

yzhwang commented Dec 22, 2017

There is a pull request that should handle the issue. Please check after the pull request has been approved: #15264

@tensorflowbutler
Copy link
Member

Nagging Assigneee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@yzhwang
Copy link

yzhwang commented Jan 25, 2018

#15264 has been merged, so I believe the issue should have been fixed by that. Please reopen if it still exists.

@yzhwang yzhwang closed this as completed Jan 25, 2018
@drscotthawley
Copy link

drscotthawley commented Feb 2, 2018

Just found this page. I'm seeing this error with a fresh nightly tensorflow-gpu on Ubuntu. So, despite the merge, this doesn't look resolved.

@kirk86
Copy link

kirk86 commented Feb 4, 2018

Same here I get this error as well on ubuntu tf 1.4.1 not the nightly build.

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Feb 5, 2018

@drscotthawley you need to provide more details (logs, small repro code, etc) for people to tell whether it's the same problem (empty tensors into cudnn) or not. The fix above only adds support of empty tensor on certain ops, and very likely there are ops not covered.

@yzhwang
Copy link

yzhwang commented Feb 6, 2018

@ppwwyyxx Thanks for the comment! @drscotthawley and @kirk86 , could you provide more info so that I can take a closer look?

@drscotthawley
Copy link

drscotthawley commented Feb 6, 2018

@ppwwyyxx @yzhwang I had just downloaded a fresh CUDA from NVIDIA, which defaults to version 9.1, not realizing that TF didn't support that yet. I resolved this problem by downgrading to CUDA 9.0. You can close this issue again.
@kirk86, try using CUDA 9.0 instead. Also, I'm using CUDNN 7.0.5 and it's working.

Might be worth noting: I've built TF from source before, but couldn't manage to do so using CUDA 9.1. I don't recall the errors, just that downgrading to 9.0 finally enabled me to "get back to work."

@kirk86
Copy link

kirk86 commented Feb 6, 2018

@drscotthawley Thanks for you answer but in my case I can't do that. It's a shared system and I'm not an admin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests