cuDNN error when running TF 2.0 Docker image #300

peterroelants · 2019-02-05T18:20:46Z

I tried building the Docker container from https://github.com/tfboyd/benchmarks/blob/4f6f785dda66fa27119a88d35e192469c3bbe894/perfzero/docker/Dockerfile_ubuntu_1804_tf_v2 created in #298 and running TF 2.0 I get a cuDNN error trying to run TensorFlow.

I build the container with:

docker build -t tf_workflow .

And run it with

docker run --runtime=nvidia --name tf_workflow_container -it tf_workflow /bin/bash

After which I run the following code in a Python shell:

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np

print(tf.__version__)

data = np.random.random((1,28,28)).astype(np.float32) 
model = tf.keras.Sequential([layers.Reshape(target_shape=[28, 28, 1],
                                            input_shape=(28, 28,)),
                             layers.Conv2D(2, 5, padding='same', activation=tf.nn.relu)])

model(data)

And then I get UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
Full output here

I'm running nvidia-docker with Docker version is Docker version 18.09.1, build 4c52b9 I have 2 GeForce RTX 2080 installed and available in my machine. My current driver version is 415.27.

I am wondering if I'm doing something wrong? Or is there an issue with the Docker build?

The text was updated successfully, but these errors were encountered:

leimao · 2019-06-27T17:53:41Z

The official TensorFlow 2.0 containers also have this problem.

tfboyd · 2019-06-27T18:57:00Z

I use the following container each night to run all of our OSS Accuracy and Benchmark tests. It runs with V100s. I do not have an RTX setup; but I am not sure why that would matter, but it might.

I use this one and it ran last night. It was a long time ago and maybe you build the Dockerfile that is in the folder instead of this specific docker. A totally wild guess and maybe this is an RTX thing, seems odd but these are all guesses.
https://github.com/tensorflow/benchmarks/blob/master/perfzero/docker/Dockerfile_ubuntu_1804_tf_v2

I will try to do a quick check with your code snippet as a data point.

I created the Docker with this command from the PerfZero documents:
python3 benchmarks/perfzero/lib/setup.py --dockerfile_path=docker/Dockerfile_ubuntu_1804_tf_v2

I entered the docker with:
docker run --runtime=nvidia -it --rm -v $(pwd):/workspace -v /data:/data perfzero/tensorflow bash

I ran the script from the original post with and did not see any errors:
CUDA_VISIBLE_DEVICES=0 python3 test.py

I use CUDA_VISIBLE_DEVICES because one of my GPUs is just for display.

tfboyd · 2019-06-27T19:26:55Z

I just did the same test with the TF 2.0 beta, using the following to start the docker and then ran the same python3 test.py with the original posts code. This could be issues with version of nvidia docker runtime, but that seems unlikely because the GPUs are seen and that is really all the runtime does.

docker run --runtime=nvidia -it --rm -v $(pwd):/workspace -v /data:/data tensorflow/tensorflow:2.0.0b1-gpu-py3 bash

Closing for now.

tfboyd · 2019-06-27T19:33:32Z

I used a GTX-1080 for the tests above. My PerfZero dockers are tested on V100 (nightly) and GTX-1080 when ever I am "doing stuff".

peterroelants mentioned this issue Feb 5, 2019

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tensorflow/tensorflow#24496

Closed

edwardcho mentioned this issue Jun 21, 2019

How to test your code?? kshitizrimal/Fast-SCNN#4

Open

tfboyd self-assigned this Jun 27, 2019

tfboyd closed this as completed Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDNN error when running TF 2.0 Docker image #300

cuDNN error when running TF 2.0 Docker image #300

peterroelants commented Feb 5, 2019 •

edited

Loading

leimao commented Jun 27, 2019 •

edited

Loading

tfboyd commented Jun 27, 2019 •

edited

Loading

tfboyd commented Jun 27, 2019 •

edited

Loading

tfboyd commented Jun 27, 2019

cuDNN error when running TF 2.0 Docker image #300

cuDNN error when running TF 2.0 Docker image #300

Comments

peterroelants commented Feb 5, 2019 • edited Loading

leimao commented Jun 27, 2019 • edited Loading

tfboyd commented Jun 27, 2019 • edited Loading

tfboyd commented Jun 27, 2019 • edited Loading

tfboyd commented Jun 27, 2019

peterroelants commented Feb 5, 2019 •

edited

Loading

leimao commented Jun 27, 2019 •

edited

Loading

tfboyd commented Jun 27, 2019 •

edited

Loading

tfboyd commented Jun 27, 2019 •

edited

Loading