New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236
Comments
TensorFlow v2.3 is compatible with CUDA 10.1 and cuDNN 7.6. For more information regarding this please take a look at the tested build configurations. And the CUDA version mismatch query has been explained in this StackOverflow comment. Can you paste the output of nvida-smi? |
Output of NVIDIA-SMI: Driver Version: 450.1.06 nvcc --version I followed all the steps mentioned in the tensorflow gpu guide, the only thing extra I did was install cuda-toolkit 'sudo apt-get install cuda-toolkit' What am i doing wrong? |
@javedsha The following is a procedure I use for Ubuntu 18.04, confirmed to work with the Ubuntu-shipped python 3.6. Hope it helps to pinpoint your issue. In your case, the trouble possibly started with the Btw, CUDA version that is reported by the nvidia-smi is not necessarily the CUDA version that Tensorflow picks up (longer story), but with my installation procedure it should report 10.1.
|
@ahtik is the local thing same as the stable version? I will give this a try today and will post here. Thank you. |
Yes, it's the latest 10.1 cuda, just makes the deployment a bit easier for
my use case.
…On September 16, 2020 3:57:53 PM Javed Shaikh ***@***.***> wrote:
@ahtik is the local thing same as the stable version? I will give this a
try today and will post here. Thank you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I have the same problem with libcublas.so.10. Same OS, same python version, tf 2.3 etc. The only difference was that I didn't use venv and have different GPU. My solution was to install cuda 10.2 even if it contradicts the guide. GPU is working in tensorflow now. I have taken cuda-10.2 from nvidia website as a deb package. Also the guide itself (https://www.tensorflow.org/install/gpu) seems to be not perfectly written. It tells you to install CUPTI when there is no way to install it separately. It read as: "Install CUPTI which ships with the CUDA® Toolkit. Append its installation directory to the $LD_LIBRARY_PATH environmental variable:" when it should be IMHO "Install CUDA Toolkit. You will have CUPTI library installed. Append its installation directory to the $LD_LIBRARY_PATH environmental variable:" And I still don't get how section with CUPTI goes before section with cuda installation on Ubuntu. I hope my feedback will be useful. |
@Zapunidi Indeed, the official guide for Ubuntu doesn't seem to work for me either (all other libs load fine, getting one Warning):
Almost like something in the NVIDIA machine-learning repo still manages to force an upgrade from 10.1.. I do not have this warning when using the local installation method I posted previously. For cudnn and tensorrt/libnvinfer I have a separate tensorrt-cuda10.1 setup. |
So the issue is for everyone. It will make sense to upgrade the tensorflow documentation as it is not working. |
I can't see the difference between CUDA and CUDA Toolkit. Even https://www.tensorflow.org/install/gpu joggle these two terms like "The following NVIDIA® software must be installed on your system: ... CUDA® Toolkit —TensorFlow supports CUDA® 10.1 (TensorFlow >= 2.1.0)..." Then for Linux setup the manual just mentions "CUDA" without "Toolkit".
So it was not a clean install. I do not have a spare machine with supported GPU to make clean test for you guys. I also don't think that GPU virtualization is mature to use virtual machine on my primary PC. |
@Zapunidi Yes, CUDA and CUDA Toolkit is 100% the same. This reboot in the middle does not matter, as long as you still reboot after the last step. One thing that might work is to run on top of everything still the "local" installation method like this and see what happens after reboot (taken from my comment above):
IF this fails and still curious, you can try with my instructions in #43236 (comment) using the "local" repo installation method and this way the CUDA version remains 10.1&TF works fine. Just make sure not to reboot before the end and ensure most recent nvidia-driver-418 suitable for your GPU is used (the same that you currently have). This does involve some risk when using on a primary PC, I just don't know a better way to quickly clean up everything cuda-related without removing the nvidia driver at the same time. |
The problem is that libcublas seems to be missing when installing cuda-10.1 via apt A work around seems to be installing cuda-10.1 via runfile. I encountered another error during the installation process but maybe it works for you. Check this thread for more details: https://forums.developer.nvidia.com/t/cublas-for-10-1-is-missing/71015/18 |
@bnsblue If using the local installer method that I detailed in my comment above then [1] It has involved a bit for our use and does not include the libcudnn7 and tensorrt bits, but this should still work as well. For nvidia drivers using v455. If you're interested, I can provide the full instruction that I'm using. |
@ravikyram why this is waiting for author response? The steps in the documentation doesn't work. |
@ahtik Thanks for the response! It would be awesome if you could share the full instructions :) |
@ahtik I also tried your "local install" and |
Must have gotten something wrong the first time, the libraries show up now. But
Then I will continue to try and get this working. |
@johntmyers Did you make sure to restart the machine after all the driver and CUDA installation steps? This error is usually from not restarting. |
@ahtik Yes, same issue. FWIW I'm using 18.04 on Google Compute Engine, so I'm not sure if something there is not working properly. |
Any updates on the issue please. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
@ahtik thanks for your setup above, I would also be grateful if you shared your TensorRT instructions! |
Closing as stale. Please reopen if you'd like to work on this further. |
System information
I have exactly followed the instruction dated Sep1-16. I am encountering the following error: ioz@ioz-B250M-DS3H:
ioz@ioz-B250M-DS3H:~$ python3.6
Can you please help me with your thoughts. Thanks |
I thank @ahtik for his post. I would like to contibute with my own recipe that I derived from @ahtik 's one. Some comments down below.
My server GPU model: GeForce GTX 1070
Check if the GPU is working with:
|
System information
Describe the problem
After installing tensorflow, GPU is not detected and getting error: 'Cannot open dynamic library libcublas.so.10'.
Provide the exact sequence of commands / steps that you executed before running into the problem
How I fix the problem:
I started with a clean VM on Azure with nothing installed. Then followed the tensorflow guides (above) to install NVIDIA-Driver, CUDA 10.1, cuDNN, cuda-toolkit and tensorflow.
After all these steps, my local folder had two cuda folders (don't know why):
/usr/local/cuda-10.1/lib64/
/usr/localo/cuda-10.2/lib64/
The error which I was getting was for dynamic library 'libcublas.so.10'. And this file was not present in folder 'cuda-10.1', but instead it was present in 'cuda-10.2' (note, that i have installed everything in venv)
I have to manually copy all the files (including files inside the 'stubs' folder). And then it works.
This site also mention this issue, where they say that with CUDA 10.1, some of the libraries are installed differently - https://forums.developer.nvidia.com/t/cublas-for-10-1-is-missing/71015/4 (the steps here are when you install libraries at system level and not venv).
Expected Behaviour:
Either tensorflow should automatically refer to the missing dynamic libraries or mention how to fix this in Install Set up.
Note: The errors are similar when you install CUDA 10.2, it's just the dynamic library version are different.
The text was updated successfully, but these errors were encountered: