Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN error when running TF 2.0 Docker image #300

Closed
peterroelants opened this issue Feb 5, 2019 · 4 comments
Closed

cuDNN error when running TF 2.0 Docker image #300

peterroelants opened this issue Feb 5, 2019 · 4 comments
Assignees

Comments

@peterroelants
Copy link

peterroelants commented Feb 5, 2019

I tried building the Docker container from https://github.com/tfboyd/benchmarks/blob/4f6f785dda66fa27119a88d35e192469c3bbe894/perfzero/docker/Dockerfile_ubuntu_1804_tf_v2 created in #298 and running TF 2.0 I get a cuDNN error trying to run TensorFlow.

I build the container with:

docker build -t tf_workflow .

And run it with

docker run --runtime=nvidia --name tf_workflow_container -it tf_workflow /bin/bash

After which I run the following code in a Python shell:

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np

print(tf.__version__)

data = np.random.random((1,28,28)).astype(np.float32) 
model = tf.keras.Sequential([layers.Reshape(target_shape=[28, 28, 1],
                                            input_shape=(28, 28,)),
                             layers.Conv2D(2, 5, padding='same', activation=tf.nn.relu)])

model(data)

And then I get UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
Full output here

I'm running nvidia-docker with Docker version is Docker version 18.09.1, build 4c52b9 I have 2 GeForce RTX 2080 installed and available in my machine. My current driver version is 415.27.

I am wondering if I'm doing something wrong? Or is there an issue with the Docker build?

@leimao
Copy link

leimao commented Jun 27, 2019

The official TensorFlow 2.0 containers also have this problem.

@tfboyd
Copy link
Member

tfboyd commented Jun 27, 2019

I use the following container each night to run all of our OSS Accuracy and Benchmark tests. It runs with V100s. I do not have an RTX setup; but I am not sure why that would matter, but it might.

I use this one and it ran last night. It was a long time ago and maybe you build the Dockerfile that is in the folder instead of this specific docker. A totally wild guess and maybe this is an RTX thing, seems odd but these are all guesses.
https://github.com/tensorflow/benchmarks/blob/master/perfzero/docker/Dockerfile_ubuntu_1804_tf_v2

I will try to do a quick check with your code snippet as a data point.

I created the Docker with this command from the PerfZero documents:
python3 benchmarks/perfzero/lib/setup.py --dockerfile_path=docker/Dockerfile_ubuntu_1804_tf_v2

I entered the docker with:
docker run --runtime=nvidia -it --rm -v $(pwd):/workspace -v /data:/data perfzero/tensorflow bash

I ran the script from the original post with and did not see any errors:
CUDA_VISIBLE_DEVICES=0 python3 test.py

I use CUDA_VISIBLE_DEVICES because one of my GPUs is just for display.

@tfboyd tfboyd self-assigned this Jun 27, 2019
@tfboyd
Copy link
Member

tfboyd commented Jun 27, 2019

I just did the same test with the TF 2.0 beta, using the following to start the docker and then ran the same python3 test.py with the original posts code. This could be issues with version of nvidia docker runtime, but that seems unlikely because the GPUs are seen and that is really all the runtime does.

docker run --runtime=nvidia -it --rm -v $(pwd):/workspace -v /data:/data tensorflow/tensorflow:2.0.0b1-gpu-py3 bash

Closing for now.

@tfboyd tfboyd closed this as completed Jun 27, 2019
@tfboyd
Copy link
Member

tfboyd commented Jun 27, 2019

I used a GTX-1080 for the tests above. My PerfZero dockers are tested on V100 (nightly) and GTX-1080 when ever I am "doing stuff".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants