-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED #9489
Comments
Check out #5354, #3600, #4196, |
I have the same problem, but the advice did not solve it for me. BTW, CUDA 8.0 is work well for my computer. |
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks! |
This seems to possibly be an out-of-memory being masked by the CUBLAS_STATUS_NOT_INITIALIZED error. When I have low memory and ask for a new session for detection I hit this error, when I clear the gpu of other processes and free memory then I do not get the cublas error. Since by default it seems tensorflow sucks up nearly all available memory this could happen a lot I imagine.
|
This didnt work for me. I have checked all my installation. Made sure my LD_LIBRARY_PATH is what it should be. Kept my memory down to a minimum. Nothing seems to be working. |
python tf_p11_RNN.py 2017-12-30 21:18:15.192881: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED I am getting the same error. But when I run as sudo there is no error. Can someone tell me why? |
Full log: Caused by op u'rnn/rnn/basic_lstm_cell/MatMul', defined at: InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(128, 156), b.shape=(156, 512), m=128, n=512, k=156 |
I fixed this by running as root. |
As @jeremy-rutman said, this is likely an out-of-memory issue. When I get this error, I can fix it every time by unplugging my external monitor. This frees up GPU memory immediately, and I'm then able to run predictions with TF. Another fix that seems to work is if I close my million Chrome tabs. Note that running as root did not fix it for me. |
@DhashS It work with me. Thanks so much! |
In my case it was caused by running it under Windows with |
I solved this problem by cleaning mt device memory. |
I 👍 the out of memory error ! Same for me, cleaning up worked ! |
In my case it was not a memory error, I fixed it by removing the cache folder |
Thanks @stbnps ! |
Thanks @stbnps ! This has been frustrating me for a while and I could not find a way to run as a user. Removing .nv folder fixed this. |
@DhashS Thank you so much! Running as root solve to my problem too!!!! |
|
I have tried 'rm ~/.nv' and 'config = tf.ConfigProto() config.gpu_options.allow_growth = True', no one work. |
+1 - worked on linux |
Thanks @stbnps. Your solution worked like a charm. |
Heads up, if you're facing this problem in a conda env, make sure to pin the conda package |
@ankitvgupta Didn't work for me with |
Got it, thanks for the heads up. Yeah, I tried it on a K80 (I can never remember if that's considered an RTX or not) on AWS (built off their deep learning base AMI on a p2), in case that's useful for repro. I suspect it's also pretty specific to whatever random other packages are being installed in my dockerfile and conda env, but figured I'd share just in case others were facing it. |
@ankitvgupta I'm trying to recompile from source with CUDA 10.1.243 right now for my RTX card, will let you know. K80 is not RTX, so maybe that's why it's working for you. |
Ok somehow I can't install |
Nothing changes when compiling TF from source on CUDA
Which work if you compiled TF 2.2 from source with CUDA 10.2 (doesn't work with CUDA 10.1.x) |
@EKami I'm a bit confused. Which part of your code block relies on CUDA 10.2? Is is specifically the memory growth that is bugged on older versions of CUDA or is it something else? |
this method also works for me. my Training env: |
TF 2.3 resolve the issue indeed, and no need for the |
No it didn't. |
It's when you compile TF from source, you can pick whatever version of CUDA you want. |
Just an FYI. Solved this issue on TF 2.2.0, Cuda 10.1 by removing existing libcublas and installing a libcublas 10.2.1.243-1 instead.
|
In my case, I try to limit the GPU memory by adding these lines of code on top of my code. (tensorflow==2.4.0) config = tf.compat.v1.ConfigProto() |
Thanks @ngotra2710. I was able to bypass the problem with your solution (I'm testing this model). Anyhow it is strange, because it complains that it cannot allocate enough memory
But the card should have enough
|
@sanjoy This OOM could be the same as the other issue Scott mentioned. |
I'm on cuda-11-0 and tensorflow 2.4 and got the same errror.
got my code running, the actual error was that I had |
it's a version match problem i meet it too. now solved as follows libcudnn: pytorch: #do not install torch 1.8.0 |
In my case with rtx3070 & tensorflow 2.4.1, physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True) |
Got a same error on TF 2.5/CUDA 11.3. |
@AxelBohm I got the same error and your codes work for me, though I have no idea what they mean. |
Another way to set auto-growth is: |
Maybe this helps someone: |
always best to try switching from cuda to cpu to examine the error message in more detail |
System information
CUDA/cuDNN version: 8.0
GPU model and memory:
Describe the problem
If I change the order of device usage, it would report error
Source code / logs
The text was updated successfully, but these errors were encountered: