New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed call to cuInit: CUDA_ERROR_UNKNOWN in python programs using Ubuntu bumblebee #394
Comments
Updated subject to reflect the environment you're trying to run in. Hopefully someone in the community who knows more about bumblebee/optimus laptops might be able to help! |
Just to be clear, I am able to run any programs with Bazel build. However, when using simple python programs where the tensorflow library is imported, the gpu will not work and I am stuck with using only the cpus. |
@vrv: Assigning you since it doesn't let me assign zheng-xq. Do you know why? |
@jpmerc, could you run your command line through sudo, similar to you C++ examples? I wonder whether it is the root access that is making the difference. The initialization logic should be the same between C++ and Python clients. |
It doesn't seem to find the cuda library when in sudo :
|
Interesting. Could you add the path to your Cuda 7.0 runtime to LD_LIBRARY_PATH? |
It is already in LD_LIBRARY_PATH. In LD_LIBRARY_PATH I have : The library the program is looking for is there : |
Did someone find an answer to this?
It runs, but probably without GPU except for the end when it complains: |
Could you run |
Found 1 NVIDIA devices Tue Dec 15 23:56:17 2015 +-----------------------------------------------------------------------------+ |
Well, it was worth a shot. On Tue, Dec 15, 2015 at 3:28 PM PeterBeukelman notifications@github.com
|
@jpmerc, could you try to set LD_LIBRARY_PATH inside your sudo? That should make sudo preserve the environment variables. sudo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.0/lib64 optirun python convolutional.py @PeterBeukelman, to make sure you have the same problem, could you run the C++ tutorial? bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer |
I think the basic mnist worked with GPU early on. But I ran into problem with word2vec visualization. Somewhere down the search to fix that, I noticed 2 different driver versions being mentioned 346 and 352. This made me think I mistakenly updated the driver and tried to revert to 346. I purged everything but after install still ended up with 352. I had this before also and dug up some old version of gcc (3.4) if I use that version I recollect I made that passed before though. It ended with copying the binary. |
I changed to csh from bash and tried again to build bazel with gcc3.4.6 |
@zheng-xq It seems to work fine now, but it does not use my configuration (configured for Cuda 3.0 and it tries to use 3.5).
|
That is an old Cuda version. Am I correct in thinking that my issues with Bazel are independent of issues with importing tensorflow in python resulting in |
@jpmerc, could you confirm that you run TF_UNOFFICIAL_SETTING=1 ./configure before starting the build? @PeterBeukelman, I think they are most likely separate issues. Note that Bazel is best supported on Ubuntu 14.04 at the moment. The default gcc with that is 4.8. |
Had the same problem like @jpmerc, with Nvidia GTX 960m. And the problem was something connected with his: https://devtalk.nvidia.com/default/topic/907350/installing-cuda-7-0-but-get-cuda-7-5-/ |
@martinwicke, @zheng-xq: Is this obsolete now that we support 7.5? |
I had the same problem, this fixed it: |
It should be fixed. I'll close this for now -- we can reopen if it's still a problem. |
It's worth adding that |
sudo apt-get install nvidia-modprobe, this is magic |
sudo apt-get install nvidia-modprobe, this fixed it for me too |
I ran into this issue recently. I upgraded my nvidia-driver to version 375.26 and docker to version Docker version 1.13.0.
The problem here is that cuda fails to initiate the 'shared GPU context'. However, try running |
I had the same issue. |
Is it necessary to reboot after installing(executing) 'sudo apt-get install nvidia-modprobe'? |
sudo apt-get install nvidia-modprobe works for me, with a restart. |
@leekyungmoon Reboot only works for me, without intalling |
sudo is fine for me with the same issue. |
nvidia-cuda-mps-server works for me |
In Ubuntu 17.04,
Multiple reboots later, I still get this issue.
|
Are you still having trouble with this issue?This is what worked for me. I had previously tried an installation of CUDA using a .run file. The installation had configured the nvidia-384 driver and this was precisely what I saw when I ran nvidia-smi. I ran /usr/bin/nvidia-uninstall and the CUDA_ERROR_UNKNOWN went away. Further, when I run nvidia-smi I see the expected driver version (375). In summary, make sure that previous installations of drivers/CUDA are not the source of the error if the suggestions in this thread don't work. This is likely the case if one installation was done with a .run file while another was done via a .deb package. |
Everything worked fine under |
I am facing the "CUDA_ERROR_UNKNOWN" issue on Windows 2012 R2 server. Anybody tried it on windows server please help? |
@leekyungmoon after installing nvidia-modprobe, reboot works for me. |
In my case, nvidia-modprobe was installed and the paths were correct. What solved was running the commands here https://devtalk.nvidia.com/default/topic/760872/ubuntu-12-04-error-cudagetdevicecount-returned-30/ Especially, running following: Hope this helps. |
Interestingly none of these worked for me. Adding following to .bashrc worked like charm!! export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:/usr/local/cuda-8.0/targets/x86_64-linux/lib/" |
Same here @tharuniitk None of these work. |
@prasad3130 Thanks a lot, that worked like a charm. Although it is worth noting that the command should be the following |
If you work in linux/ubuntu, check your kernel, by |
try |
|
|
|
I've had a problem running darknet detector on ubuntu in multiuser environment. To solve the issue, i've exported CUDA_CACHE_PATH variable as: |
System information
+------------------------------------------------------+
/lib/modules/4.2.0-27-generic/kernel/drivers/net/ethernet/nvidia Describe the problemI upgrade the nvidia driver througth command: ii nvidia-384 384.130-0ubuntu0.14.04.1 amd64 NVIDIA binary driver - version 384.130 Error occured when i run tensorflow code as follows: i have tried many solutions such as install Maybe It is noteworthy that Hopefully you can help me with this issue. |
in my case, the NVidia graphic driver accidently switch to another version that is not suitable. So just re-install the suitable version then it works again. |
What's the equivalent solution on Windows? |
` ` |
I have a Quadro K1100M integrated gpu with compute capability 3.0. I had to install bumblebee to make CUDA work. I am now able to run the tutorials_example_trainer with the command
sudo optirun bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
. I have been able to do that withTF_UNOFFICIAL_SETTING=1 ./configure
. However, I am not able to run examples in python directly.For example, if I run the convolutional.py in tensorflow/models/image/mnist with the command
optirun python convolutional.py
, I get the following error :It is like my gpu is not recognized in python programs because of the 3.0 compute capability. Is there a way to avoid this problem?
The text was updated successfully, but these errors were encountered: