-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN error when running TF 2.0 Docker image #300
Comments
The official TensorFlow 2.0 containers also have this problem. |
I use the following container each night to run all of our OSS Accuracy and Benchmark tests. It runs with V100s. I do not have an RTX setup; but I am not sure why that would matter, but it might. I use this one and it ran last night. It was a long time ago and maybe you build the Dockerfile that is in the folder instead of this specific docker. A totally wild guess and maybe this is an RTX thing, seems odd but these are all guesses. I will try to do a quick check with your code snippet as a data point. I created the Docker with this command from the PerfZero documents: I entered the docker with: I ran the script from the original post with and did not see any errors: I use CUDA_VISIBLE_DEVICES because one of my GPUs is just for display. |
I just did the same test with the TF 2.0 beta, using the following to start the docker and then ran the same python3 test.py with the original posts code. This could be issues with version of nvidia docker runtime, but that seems unlikely because the GPUs are seen and that is really all the runtime does.
Closing for now. |
I used a GTX-1080 for the tests above. My PerfZero dockers are tested on V100 (nightly) and GTX-1080 when ever I am "doing stuff". |
I tried building the Docker container from https://github.com/tfboyd/benchmarks/blob/4f6f785dda66fa27119a88d35e192469c3bbe894/perfzero/docker/Dockerfile_ubuntu_1804_tf_v2 created in #298 and running TF 2.0 I get a cuDNN error trying to run TensorFlow.
I build the container with:
And run it with
After which I run the following code in a Python shell:
And then I get
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
Full output here
I'm running nvidia-docker with Docker version is
Docker version 18.09.1, build 4c52b9
I have 2 GeForce RTX 2080 installed and available in my machine. My current driver version is415.27
.I am wondering if I'm doing something wrong? Or is there an issue with the Docker build?
The text was updated successfully, but these errors were encountered: