New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using tensorflow gpu 2.1 with Cuda 10.2 #34759
Comments
As printed in the stack trace |
Thanks for your answer. |
You are able to import tf cpu version in tf 2.0 import tensorflow as tf
tf.test.is_gpu_available() |
No I installed specifically tensorflow gpu, didn't get the warning at import and more than simply using this function to test, I monitored the GPU usage during training (and my training was way faster than when I was masking the GPU). |
When I look at the logs for tf 2.0.0, I see that it's loading packages from the 10.0 version of Cuda. These packages are certainly legacy packages that I still have, and so that's why it's working even though my main Cuda is 10.2. |
@zaccharieramzi, Were you able to install Tensorflow-gpu 2.1 with Cuda 10.1. |
Hi @gadagashwini , I didn't have time to try. I can't really try right now, cos I am too afraid to mess up my conda environment atm(haha). But I can let you know in a few days. |
Thanks @zaccharieramzi. Please try and let us know how it progresses. Thanks! |
@zaccharieramzi They got Cuda 10.1 working in another thread: #34429 (comment). Could you verify? |
I managed to build master with --config=v1 and cuda 10.2(gcc), numpy 1.17.4, however I get reproducible segfaults on code that runs fine with tf 1.x/cuda 10.1. OS is a fresh new Ubuntu 18.04, cuda/cudnn from debs. Hardware is a xeon 2695v2, 128GB RAM, 1060ti:
Weirdest thing I'm doing in the code is to have tensors with a dimension axis at 0 (ie: shape@(40,4,0,50)). This happens because I'm doing a network architecture search process. I'm rolling back to 10.1 meanwhile, Regards. |
@EwoutH it think so, I did a pull on master yesterday and this was merged 6 days ago. edit: I have just crashed with a prebuilt conda tf-gpu 1.15 package. It seems that my code may crash with a segfault or get some numerical overflows depending on driver, python version, tf version, packages, etc. Conda has its own cuda 10.0 runtime for this package. I'm still on 440.33.01(cuda 10.2) driver btw. I didn't notice before because my previous setup (py3.6 tf 1.15 cuda 10.1) simply threw exceptions. It turned some vars into infs on a train call and I raised the exceptions myself. I suposed all of these were regular instability/overflows but now I think some may be related with 0 sized dimensions at some tensors. I will rollback the driver to 10.1, however I guess this is worth checking out. I will also try to reproduce the error with minimal code. |
@zaccharieramzi, Any update on this issue! |
Sorry, I didn't get a chance to check on this with the holiday season and all. |
I have similar issues here. The way help me out is to build tensorflow from source. It seems the prebuild tensorflow is not compatible with cuda10.2 quite well. |
I rolled back to cuda 10.1, and everything seems to work fine. I am going to close this since tensorflow 2.1 is not supposed to be directly usable with cuda 10.2, and from @lijiaying 's comment, I understand there is no issue with building from source. |
did you get any solution for your problem?? |
@aloksingh3110 like I said in the comment just above, I rolled back to cuda 10.1. This person has run into problems when building from source with cuda 10.2, so I advise you to roll back to 10.1. |
I have the same problem, but using
|
I'm relatively very new to this, so I might be wrong, but @SalahAdDin it seems like you're missing the libnvinfer library found in TensorRT (see here). CUDA-10.1 seems to have loaded fine. |
@hd1090. I tried to install tensorRT in cuda 10.1. My gpu is working, but tensorRT installation is erroring out on me with the following message. Do you know what could be wrong here? The following packages have unmet dependencies: |
@maximuslee1226
Using |
@zaccharieramzi did you find any other solution other then rolling back to 10.1? |
@alihamid996 I didn't try anything else, so I couldn't tell you myself. @lijiaying 's comment suggests that it's possible to build tf 2.1 from source with cuda 10.2 though. |
My head hurts just to see it is such a pain in the rear to get all the moving pieces exactly right to have GPU support. NVidia obviously wants you to install 10.2... and here goes Tensorflow only works with 10.1 out of box. Now I have to recompile the OpenCV from scratch.. Just wonderful. |
Could you please clarify. are there any plans to support 10.2 in the next release of Tensorflow? Unfortunately there are no CUDA 10.1 packages for modern RHEL based repos like Centos 8 or Fedora 30+. Only CUDA 10.2 is available for these distributives. |
New sdcard image for Jetson Nano comes with CUDA 10.2 preinstalled and older images cannot be downloaded anymore. Not sure if is possible to downgrade because it seams nVidia blacklist older packages. |
Is there any tensorflow installation solution for those person which have CUDA 10.2 installed, gcc 7.5.0 and ubuntu18.04? if so, only with tensorflow 2.1? |
#38194 (comment) seem to have a solution |
I had a similar problem with 10.2 using
The error when trying
This error This resolved it:
|
UPDATE: WARNING below #34759 (comment) FYI, we have a tested work-around (symlink fake I am using this for days now, mainly with CPU and I never got trouble with it (so libcudart 10.1 and 10.2 really seem compatible, as was promised higher up in that thread NOT SURE !!). |
I don't think libcudart 10.1 and 10.2 are ABI compatible (doc). Symlinking 10.2 to 10.1 may seem to work, but there is no guarantee that e.g. this will not fry your GPU. |
Wow, that would be bad ... To be clear, I am doing most of the actual work on the CPU (this smallish |
Can anyone tell about when tensorflow gpu would be able to run with cuda 10.2? |
At least after the TF 2.4 release |
Our current plan is to use move TF 2.4 to CUDA 11. |
Managed to install tensorflow-gpu 2.3 with cudatoolkit 10.1 on my cuda 10.2 driver(Jan 19th,2021)
|
pip
pip
Describe the problem
I want to use
tensorflow-gpu==2.1.0rc0
with cuda 10.2 and it seems that it can't work right now.When I use
tensorflow-gpu=2.0.0
it works perfectly fine.Provide the exact sequence of commands / steps that you executed before running into the problem
Which gives the following warnings:
Any other info / logs
When I do
locate libcudart.so
, I get the following:locate libnvinfer_plugin.so
is empty.The text was updated successfully, but these errors were encountered: