New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow-gpu pip package is not compatible with cuda9 docker image #17566
Comments
@gunan do you remember why you have a check on major/minor cuDNN version instead of just major version? |
@zheng-xq @martinwicke do you know why we check for cuDNN minor version? |
Hello, I had to rebuild my computer and am now experiencing the one of the errors described in the original post (see below). Is there a recommended workaround? Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. |
If you use docker, you should pin the version of cuDNN you are installing. For instance: If you are not using docker, you can still downgrade cuDNN with a similar command. |
Thank you for your reply. I had just solved it by updating Tensorflow. Type "pip install update" |
@wdma How could you solve it by upgrading TF? I'm getting this error on the latest (1.6) TF. |
+1 |
@adampl Installing tensorflow per these instructions (https://www.tensorflow.org/install/) generates the above error. Typing "pip install update" fixes it. I hope this helps! |
I ended up doing what @flx42 advised (#17566 (comment)) though it's not a perfect solution. |
If you use docker, I think you have 3 options:
If you don't use docker, just make sure your machine has cuDNN 7.0, not 7.1. |
Add downgrade option for cuDNN, workaround for tensorflow/tensorflow#17566
@rongou I implemented your second suggestion in my Dockerfile and I've been able to run TF 1.6 along with KERAS 2.15 within the base image nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04. |
@zheng-xq @martinwicke let me know if there is a problem with cuDNN that warrants this strict check! |
System information
Hello, I leave my story installing cuda with the problems related to the messages below.
I installed cuDNN v7.0.4 Library for Linux(the oldest version for cuda9.0) (link) like belows. No Errors, works well. |
@flx42 I am tracking this down tomorrow. The nightly docker is also broken due to the same issue, which means I cannot run nightly tests so I have some direct motivation. I see two issues maybe more in summary of this thread:
My suggestion will be to move cuDNN forward for the next release if possible, e.g. 1.7 unless it is too late and at a minimum switch the nightly as well as followup on the version check. |
I think the biggest problem is the "latest" nvidia-docker images are cuda 9.1, cudnn 7.1. Also, it is too late to change anything for 1.7. RC0 is almost out. |
The nightly docker we release is broken. I ran into the problem trying to figure out why my perf tests stopped running. We can at least pin the nightly docker image until 1.8 nightlies. I do not think we need to move to 7.1 per say as that also break people but I will ask XQ tomorrow about major/minor versions to see what he thinks and Martin as well. I know you can but just to help track stuff down to help. That unblocks me and anyone using nightly docker. I cannot believe you were awake to see my message. :-) |
Thanks for following up on this. |
What do you mean? Which dockerfile is that? |
Sorry, you are right. |
If there is a good reason to have this check on the minor version (e.g. an incompatibility despite the SONAME), I might split future images with the cuDNN minor version. |
I believe that we have run into incompatibilities between some minor versions of CUDA, but I'm not sure whether we've ever seen that in cuDNN (@jlebar are there details?) |
@martinwicke, yeah, e.g. CUDA 9.0 and CUDA 9.1 are quite different in the respects we care about. For cudnn, I have not seen a statement specifying their level of backwards compatibility. I would naively expect that if you build against cudnn x.y and run with cudnn x.z for z >= y, it probably will work. But to be comfortable with blessing that I'd want a statement in writing from nvidia. (Perhaps such a statement already exists.) Whether or not it should be a fatal error in TF vs a "good luck, you're on your own" warning (like we do for known-broken ptxas versions), I don't have an opinion on. |
CUDA toolkit libraries and cuDNN have different SONAME, so that's actually expected.
|
Statement from NVIDIA: Beginning in cuDNN 7, binary compatibility of patch and minor releases is maintained as follows:
(Note that this compatibility was not necessarily guaranteed in prior cuDNN major releases.) --Cliff Woolley |
Update. I caught up with the build people on @gunan team and the nightly Docker will be fixed. I need to talk to a few more people but I think we should consider a warning for a minor version difference and a fatal for major difference. I have some concerns that there might be feature differences in point releases. I am doing research and talking with people, it will likely take a few days. I or someone will have a final statement. b/74600152 |
We will also update the cuDNN docs to say the same as what I posted above. Thanks for pointing out the omission. |
@cliffwoolley Thank you. I am opening an internal issue and looking for someone to update the code to match your statement in cuDNN. |
Last update until done or change in progress. I found someone to make the changes to match Cliff's cuDNN version position update. Will post when complete and would not expect it to take very long. Team effort I just get the honor of updating the github issue. :-) |
@gunan Should we modify Dockerfile.gpu to make it more similar to Dockerfile.gpu-devel? That is to say, |
How much space will we save by doing that? |
400 MB from a quick test, 300 MB if I re-add CUPTI. So it seems worth it, and it's always better to pin the version of a key dependency like cuDNN. It's better for reproducibility. |
Then this looks like it is worthwhile.
Thanks for the analysis! I can review,if you like to send the change.
…On Wed, Mar 14, 2018 at 10:15 AM, Felix Abecassis ***@***.***> wrote:
400 MB from a quick test, 300 MB if I re-add CUPTI. So it seems worth it,
and it's always better to pin the version of a key dependency like cuDNN.
It's better for reproducibility.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17566 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHlCOdUpWzmocwEQ3V3NzFgdGFXoDNZLks5teVA6gaJpZM4SjG8W>
.
|
I'm having the same error with pip. I guess I'll try the Docker then |
Related: tensorflow#17566 Fixes: tensorflow#17431 Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
Fixed in nightly builds. We are now checking according to Cliff's update in nightly builds now and I am going to guess in TF 1.8 (100% not 1.7) because 1.8 has not been branched yet.:
|
System information
No
Linux Ubuntu 16.04
binary (
pip install tensorflow-gpu
)1.6.0
2.7
CUDA 9, cuDNN 7
I was trying to build a horovod image, but this would affect anyone using the
nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
base image:Describe the problem
When building a docker image based on
nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
and doing apip install tensorflow-gpu==1.6.0
, the resulting image causes a crash because the base image contains cuDNN 7.1, while the tensorflow-gpu pip package was built against cuDNN 7.0.Source code / logs
Error messages:
@flx42
The text was updated successfully, but these errors were encountered: