error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

duancaohui · 2019-03-10T09:19:15Z

Recently, I get a new computer with Turing GPU (RTX 2080 Ti ), so I set up CUDA, cudnn, and matconvnet in my new computer. I follow the install guidance in http://www.vlfeat.org/matconvnet/install/, everything seems to be going well together:

Systerm
Windows 10
CUDA 10
cudnn-10.0-windows10-x64-v7.4.2.24
MATLAB 2018 a
matconvnet-1.0-beta25

However, when I train my model using trainFn, an error occurred, these seems a error with cudnn.:

vl_nnconv
vl::impl::dispatch_cudnn<C, CU>::operator(): ConvolutionForwardCudnn<dataType>::operator(): cuDNN error [cudnn:"\\matconvnet-1.0-beta25\\matlab\\src\\bits\\nnconv_cudnn.cu":209
(CUDNN_STATUS_EXECUTION_FAILED)]

This is because MATLAB does not natively support Turing and there may be issues now, there are some answers to resolve it:
[1]https://ww2.mathworks.cn/matlabcentral/answers/439616-does-matlab-2018b-support-nvidia-geforce-2080-ti-rtx-for-creating-training-implementing-deep-learnin
[2]https://ww2.mathworks.cn/matlabcentral/answers/432027-matlab-cuda-10

this is a known bug with Turing GPU and matconvnet which can be worked-around by running a simple function and ignoring the error

try
    nnet.internal.cnngpu.reluForward(1);
catch ME
end

However, this method can only resolve this error in my test, cannot resolve this error in my training. I add this simply function in my trainFn, this error still occurred!

The text was updated successfully, but these errors were encountered:

whisperrrr · 2019-03-22T02:49:59Z

Hey，the same error occurred when I use vl_nnconv with GPU. But the url you post to solve this error isn't avilable right now.

duancaohui · 2019-03-22T02:54:42Z

The url is avilable, you can copy this url and open with your explorer:

Free-Cloud · 2019-03-23T18:56:18Z

My GPU is RTX2070, and I fix this error when I use the CUDA9.0 and update it to Patch 4.

MumuChenGunGun · 2019-04-02T06:28:28Z

Is there someone who fix this error?

whisperrrr · 2019-04-15T08:22:24Z

The url is avilable, you can copy this url and open with your explorer:

Thanks. I replaced cuda10.1 to cuda9.2，and it's worked well

yuanlong-o · 2019-10-26T23:21:17Z

Hi, I got rtx2080 with cuda9.2, but still get the vl_nnconv error. Could you please share your driver information?

whisperrrr · 2019-10-28T11:37:22Z

CPU:Intel(R) Core(TM) i9-7920 CPU @ 2.90 GHz GPU:GeForce RTX 2080 Ti At 2019-10-27 07:21:23, "yuanlong-o" <notifications@github.com> wrote: Hi, I got rtx2080 with cuda9.2, but still get the vl_nnconv error. Could you please share your driver information? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

duancaohui · 2019-10-28T11:44:33Z

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

duancaohui · 2019-10-28T11:44:52Z

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

duancaohui · 2019-10-28T11:46:39Z

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing:
`try
test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN');
catch ME
end

try
test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1)));
catch ME
end

try
test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max');
catch ME
end

try
test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN');
catch ME
end`

the code is mean that just ignore all the errors, and then all is ok!

AileenSengupta · 2022-04-03T06:53:15Z

My GPU is RTX2070, and I fix this error when I use the CUDA9.0 and update it to Patch 4.

Can you please help me with the code for Matlab on how to get rid of the error, I am using CUDA 10

AileenSengupta · 2022-04-03T06:59:14Z

Hi, I have fixed this error in the condition of CUDA 10.0 and RTX 2080 Ti, please add the following code before your training or testing: `try test1 = vl_nnconv(gpuArray(zeros(6,6,1,2)), gpuArray(ones(3,3,1,2)), gpuArray(ones(2,1)),'CuDNN'); catch ME end

try test1 = vl_nnbnorm(gpuArray(zeros(6,6,2,2)),gpuArray(ones(2,1)),gpuArray(ones(2,1))); catch ME end

try test1 = vl_nnpool(gpuArray(zeros(6,6,1,2)),2,'pad',0,'stride',2,'method','max'); catch ME end

try test1 = vl_nnconvt(gpuArray(zeros(6,6,16,2)), gpuArray(ones(3,3,16,16)),gpuArray(ones(16,1)), 'crop', [0,1,0,1], 'upsample',2, 'numGroups', 1, 'CuDNN'); catch ME end`

the code is mean that just ignore all the errors, and then all is ok!

Hi am struggling with the error still in matlab:

Error using DAGNetwork/classify (line 193)
Failed to initialize the cuDNN handle. Return code was CUDNN_STATUS_NOT_INITIALIZED.

I am using GeForce GTX 1080 Ti and Cuda 10.0 but after i tried to remove the exceptions, I still get the same error. Any help is appreciated.

duancaohui changed the title ~~error using vl_nnconv, cuDNN error~~ error using vl_nnconv, cuDNN error, bug with Turing GPU Mar 10, 2019

duancaohui closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

duancaohui commented Mar 10, 2019 •

edited

Loading

whisperrrr commented Mar 22, 2019

duancaohui commented Mar 22, 2019 •

edited

Loading

Free-Cloud commented Mar 23, 2019

MumuChenGunGun commented Apr 2, 2019

whisperrrr commented Apr 15, 2019

yuanlong-o commented Oct 26, 2019

whisperrrr commented Oct 28, 2019 via email

duancaohui commented Oct 28, 2019

duancaohui commented Oct 28, 2019

duancaohui commented Oct 28, 2019

AileenSengupta commented Apr 3, 2022

AileenSengupta commented Apr 3, 2022

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

error using vl_nnconv, cuDNN error, bug with Turing GPU #1206

Comments

duancaohui commented Mar 10, 2019 • edited Loading

whisperrrr commented Mar 22, 2019

duancaohui commented Mar 22, 2019 • edited Loading

Free-Cloud commented Mar 23, 2019

MumuChenGunGun commented Apr 2, 2019

whisperrrr commented Apr 15, 2019

yuanlong-o commented Oct 26, 2019

whisperrrr commented Oct 28, 2019 via email

duancaohui commented Oct 28, 2019

duancaohui commented Oct 28, 2019

duancaohui commented Oct 28, 2019

AileenSengupta commented Apr 3, 2022

AileenSengupta commented Apr 3, 2022

duancaohui commented Mar 10, 2019 •

edited

Loading

duancaohui commented Mar 22, 2019 •

edited

Loading