Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

Closed
duancaohui opened this issue Mar 10, 2019 · 12 comments
Closed

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

duancaohui opened this issue Mar 10, 2019 · 12 comments

Comments

@duancaohui
Copy link

duancaohui commented Mar 10, 2019

Recently, I get a new computer with Turing GPU (RTX 2080 Ti ), so I set up CUDA, cudnn, and matconvnet in my new computer. I follow the install guidance in http://www.vlfeat.org/matconvnet/install/, everything seems to be going well together:

Systerm
Windows 10
CUDA 10
cudnn-10.0-windows10-x64-v7.4.2.24
MATLAB 2018 a
matconvnet-1.0-beta25

However, when I train my model using trainFn, an error occurred, these seems a error with cudnn.:

vl_nnconv
vl::impl::dispatch_cudnn<C, CU>::operator(): ConvolutionForwardCudnn<dataType>::operator(): cuDNN error [cudnn:"\\matconvnet-1.0-beta25\\matlab\\src\\bits\\nnconv_cudnn.cu":209
(CUDNN_STATUS_EXECUTION_FAILED)]

This is because MATLAB does not natively support Turing and there may be issues now, there are some answers to resolve it:
[1]https://ww2.mathworks.cn/matlabcentral/answers/439616-does-matlab-2018b-support-nvidia-geforce-2080-ti-rtx-for-creating-training-implementing-deep-learnin
[2]https://ww2.mathworks.cn/matlabcentral/answers/432027-matlab-cuda-10

this is a known bug with Turing GPU and matconvnet which can be worked-around by running a simple function and ignoring the error

try
    nnet.internal.cnngpu.reluForward(1);
catch ME
end

However, this method can only resolve this error in my test, cannot resolve this error in my training. I add this simply function in my trainFn, this error still occurred!

@duancaohui duancaohui changed the title error using vl_nnconv, cuDNN error error using vl_nnconv, cuDNN error, bug with Turing GPU Mar 10, 2019
@whisperrrr
Copy link

Hey,the same error occurred when I use vl_nnconv with GPU. But the url you post to solve this error isn't avilable right now.

@duancaohui
Copy link
Author

duancaohui commented Mar 22, 2019

The url is avilable, you can copy this url and open with your explorer:

@Free-Cloud
Copy link

My GPU is RTX2070, and I fix this error when I use the CUDA9.0 and update it to Patch 4.

@MumuChenGunGun
Copy link

Is there someone who fix this error?

@whisperrrr
Copy link

The url is avilable, you can copy this url and open with your explorer:

The url is avilable, you can copy this url and open with your explorer:

Thanks. I replaced cuda10.1 to cuda9.2,and it's worked well

@yuanlong-o
Copy link

Hi, I got rtx2080 with cuda9.2, but still get the vl_nnconv error. Could you please share your driver information?

@whisperrrr
Copy link

whisperrrr commented Oct 28, 2019 via email

@duancaohui
Copy link
Author

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

2 similar comments
@duancaohui
Copy link
Author

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

@duancaohui
Copy link
Author

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

@AileenSengupta
Copy link

My GPU is RTX2070, and I fix this error when I use the CUDA9.0 and update it to Patch 4.

Can you please help me with the code for Matlab on how to get rid of the error, I am using CUDA 10

@AileenSengupta
Copy link

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing: `try test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN'); catch ME end

try test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1))); catch ME end

try test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max'); catch ME end

try test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN'); catch ME end`

the code is mean that just ignore all the errors, and then all is ok!

Hi am struggling with the error still in matlab:

Error using DAGNetwork/classify (line 193)
Failed to initialize the cuDNN handle. Return code was CUDNN_STATUS_NOT_INITIALIZED.

I am using GeForce GTX 1080 Ti and Cuda 10.0 but after i tried to remove the exceptions, I still get the same error. Any help is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants