Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs using Ubuntu bumblebee #394
Comments
vrv
changed the title from
failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs
to
failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs using Ubuntu bumblebee
Dec 2, 2015
|
Updated subject to reflect the environment you're trying to run in. Hopefully someone in the community who knows more about bumblebee/optimus laptops might be able to help! |
jpmerc
commented
Dec 2, 2015
|
Just to be clear, I am able to run any programs with Bazel build. However, when using simple python programs where the tensorflow library is imported, the gpu will not work and I am stuck with using only the cpus. |
girving
assigned
vrv
Dec 7, 2015
|
@vrv: Assigning you since it doesn't let me assign zheng-xq. Do you know why? |
girving
added
the
cuda
label
Dec 7, 2015
|
@jpmerc, could you run your command line through sudo, similar to you C++ examples? I wonder whether it is the root access that is making the difference. The initialization logic should be the same between C++ and Python clients. |
jpmerc
commented
Dec 11, 2015
|
It doesn't seem to find the cuda library when in sudo :
|
|
Interesting. Could you add the path to your Cuda 7.0 runtime to LD_LIBRARY_PATH? |
jpmerc
commented
Dec 11, 2015
|
It is already in LD_LIBRARY_PATH. In LD_LIBRARY_PATH I have : The library the program is looking for is there : |
PeterBeukelman
commented
Dec 15, 2015
|
Did someone find an answer to this?
It runs, but probably without GPU except for the end when it complains: |
|
Could you run |
PeterBeukelman
commented
Dec 15, 2015
|
Found 1 NVIDIA devices Tue Dec 15 23:56:17 2015 +-----------------------------------------------------------------------------+ |
|
Well, it was worth a shot. On Tue, Dec 15, 2015 at 3:28 PM PeterBeukelman notifications@github.com
|
|
@jpmerc, could you try to set LD_LIBRARY_PATH inside your sudo? That should make sudo preserve the environment variables. sudo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.0/lib64 optirun python convolutional.py @PeterBeukelman, to make sure you have the same problem, could you run the C++ tutorial? bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer |
PeterBeukelman
commented
Dec 16, 2015
|
I think the basic mnist worked with GPU early on. But I ran into problem with word2vec visualization. Somewhere down the search to fix that, I noticed 2 different driver versions being mentioned 346 and 352. This made me think I mistakenly updated the driver and tried to revert to 346. I purged everything but after install still ended up with 352. I had this before also and dug up some old version of gcc (3.4) if I use that version I recollect I made that passed before though. It ended with copying the binary. |
PeterBeukelman
commented
Dec 16, 2015
|
I changed to csh from bash and tried again to build bazel with gcc3.4.6 |
jpmerc
commented
Dec 17, 2015
|
@zheng-xq It seems to work fine now, but it does not use my configuration (configured for Cuda 3.0 and it tries to use 3.5).
|
PeterBeukelman
commented
Dec 17, 2015
|
That is an old Cuda version. Am I correct in thinking that my issues with Bazel are independent of issues with importing tensorflow in python resulting in |
|
@jpmerc, could you confirm that you run TF_UNOFFICIAL_SETTING=1 ./configure before starting the build? @PeterBeukelman, I think they are most likely separate issues. Note that Bazel is best supported on Ubuntu 14.04 at the moment. The default gcc with that is 4.8. |
dcunited001
referenced this issue
Dec 23, 2015
Closed
failed call to cuInit: CUDA_ERROR_UNKNOWN after Docker build on Macbook Pro (Late 2013) with Linux #601
Cvikli
commented
Jan 25, 2016
|
Had the same problem like @jpmerc, with Nvidia GTX 960m. And the problem was something connected with his: https://devtalk.nvidia.com/default/topic/907350/installing-cuda-7-0-but-get-cuda-7-5-/ |
|
@martinwicke, @zheng-xq: Is this obsolete now that we support 7.5? |
recurse-id
commented
Mar 9, 2016
|
I had the same problem, this fixed it: |
|
It should be fixed. I'll close this for now -- we can reopen if it's still a problem. |
martinwicke
closed this
Mar 10, 2016
ajwimmers
commented
Jun 11, 2016
•
|
It's worth adding that |
liusiye
commented
Jun 28, 2016
|
sudo apt-get install nvidia-modprobe, this is magic |
hyy1111
commented
Jul 30, 2016
|
sudo apt-get install nvidia-modprobe, this fixed it for me too |
kastnerkyle
commented
Aug 1, 2016
|
Same for me. Does anyone know why this fixes it? |
immartian
commented
Aug 6, 2016
|
No, I still can't fix with the nvidia-modprobe method.
|
chrisranderson
commented
Oct 27, 2016
|
@immartian that looks like a different issue. Symlinks set up in /usr/local/cuda/lib64 set up correctly? Seems I've got that when I've had duplicated files instead of a symlink chain. |
mattjj
commented
Nov 1, 2016
|
Installing nvidia-modprobe worked for me too. |
yjliang123
commented
Nov 8, 2016
|
@immartian I have the same problem as you. Have you fixed this problem? |
robinschucker
commented
Nov 16, 2016
|
sudo apt-get install nvidia-modprobe fixed it for me too |
Sadrpour
commented
Nov 22, 2016
•
|
This is not working for me .... GPU is available and works until i put the computer to sleep/suspend after waking up the computer i always get the message below and GPU is unavailable when i run code (only CPU is available).
I am using nvidia-docker
and none of the solutions above work. Nvidia-smi and nvidia-debugdump -l both show the GPU is installed and driver is up to date. |
Sadrpour
referenced this issue
Nov 22, 2016
Open
GPU becomes unavailable after computer wakes up #5777
nbatfai
commented
Dec 11, 2016
|
I had met this error too when the computer wakes up after hibernation. Run 1_Utilities/deviceQuery example which will tell whether CUDA card is available or not. |
patyork
referenced this issue
in keras-team/keras
Dec 25, 2016
Closed
Issue when using TensorFlow backend with GPU. #4830
juanprietob
commented
Feb 7, 2017
|
I ran into this issue recently. I upgraded my nvidia-driver to version 375.26 and docker to version Docker version 1.13.0.
The problem here is that cuda fails to initiate the 'shared GPU context'. However, try running |
WeitaoVan
commented
Mar 2, 2017
|
I had the same issue. |
LeeKyungMoon
commented
Mar 27, 2017
|
Is it necessary to reboot after installing(executing) 'sudo apt-get install nvidia-modprobe'? |
jamesdanged
commented
May 9, 2017
|
sudo apt-get install nvidia-modprobe works for me, with a restart. |
pyk
commented
May 14, 2017
|
@LeeKyungMoon Reboot only works for me, without intalling |
dreamsuifeng
commented
Jul 11, 2017
|
sudo is fine for me with the same issue. |
liutongxuan
commented
Aug 9, 2017
|
nvidia-cuda-mps-server works for me |
poppingtonic
commented
Aug 30, 2017
|
In Ubuntu 17.04,
Multiple reboots later, I still get this issue.
|
poppingtonic
referenced this issue
Aug 30, 2017
Closed
failed call to cuInit: CUDA_ERROR_UNKNOWN #7653
brayan07
commented
Oct 6, 2017
•
Are you still having trouble with this issue?This is what worked for me. I had previously tried an installation of CUDA using a .run file. The installation had configured the nvidia-384 driver and this was precisely what I saw when I ran nvidia-smi. I ran /usr/bin/nvidia-uninstall and the CUDA_ERROR_UNKNOWN went away. Further, when I run nvidia-smi I see the expected driver version (375). In summary, make sure that previous installations of drivers/CUDA are not the source of the error if the suggestions in this thread don't work. This is likely the case if one installation was done with a .run file while another was done via a .deb package. |
Inoryy
commented
Oct 26, 2017
•
|
Everything worked fine under |
sheshappanavar
commented
Nov 17, 2017
|
I am facing the "CUDA_ERROR_UNKNOWN" issue on Windows 2012 R2 server. Anybody tried it on windows server please help? |
GiggleLiu
commented
Nov 21, 2017
|
@LeeKyungMoon after installing nvidia-modprobe, reboot works for me. |
prasad3130
commented
Nov 21, 2017
|
In my case, nvidia-modprobe was installed and the paths were correct. What solved was running the commands here https://devtalk.nvidia.com/default/topic/760872/ubuntu-12-04-error-cudagetdevicecount-returned-30/ Especially, running following: Hope this helps. |
jpmerc commentedDec 2, 2015
I have a Quadro K1100M integrated gpu with compute capability 3.0. I had to install bumblebee to make CUDA work. I am now able to run the tutorials_example_trainer with the command
sudo optirun bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu. I have been able to do that withTF_UNOFFICIAL_SETTING=1 ./configure. However, I am not able to run examples in python directly.For example, if I run the convolutional.py in tensorflow/models/image/mnist with the command
optirun python convolutional.py, I get the following error :It is like my gpu is not recognized in python programs because of the 3.0 compute capability. Is there a way to avoid this problem?