2.4.0rc1 not supporting RTX 3090? #44969

ysyyork · 2020-11-18T08:01:06Z

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Docker image tensorflow/tensorflow:2.4.0rc1-gpu-jupyter

Describe the current behavior
I checked the cuda version in the container which is cuda 11.1. I also have nvidia driver 455.38 installed on my host machine. I have a custom built of tf 2.3.0 working with this setup. But when I pull the docker image of tensorflow/tensorflow:2.4.0rc1-gpu-jupyter it's raising CUDNN errors like below:

2020-11-18 07:54:42.398846: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-18 07:54:42.417594: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3693205000 Hz
2020-11-18 07:54:43.206492: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-18 07:54:44.092906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-18 07:54:44.097015: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-11-18 07:54:44.422337: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-18 07:54:44.429661: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-6-976a0a33b022> in <module>
----> 1 m.predict(np.zeros((1, 256, 256, 3), dtype=np.float32))

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
   1627           for step in data_handler.steps():
   1628             callbacks.on_predict_batch_begin(step)
-> 1629             tmp_batch_outputs = self.predict_function(iterator)
   1630             if data_handler.should_sync:
   1631               context.async_wait()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    826     tracing_count = self.experimental_get_tracing_count()
    827     with trace.Trace(self._name) as tm:
--> 828       result = self._call(*args, **kwds)
    829       compiler = "xla" if self._experimental_compile else "nonXla"
    830       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    893       # If we did not create any variables the trace we have is good enough.
    894       return self._concrete_stateful_fn._call_flat(
--> 895           filtered_flat_args, self._concrete_stateful_fn.captured_inputs)  # pylint: disable=protected-access
    896 
    897     def fn_with_cond(inner_args, inner_kwds, inner_filtered_flat_args):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1917       # No tape is watching; skip to running the function.
   1918       return self._build_call_outputs(self._inference_function.call(
-> 1919           ctx, args, cancellation_manager=cancellation_manager))
   1920     forward_backward = self._select_forward_and_backward_functions(
   1921         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    558               inputs=args,
    559               attrs=attrs,
--> 560               ctx=ctx)
    561         else:
    562           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node efficientnetb0/stem_conv/Conv2D (defined at <ipython-input-6-976a0a33b022>:1) ]] [Op:__inference_predict_function_5650]

Function call stack:
predict_function

code to replicate:

from tensorflow.python.keras.applications.efficientnet import EfficientNetB0
m = EfficientNetB0(weights=None, input_shape=(256, 256, 3))
import numpy as np
m.predict(np.zeros((1, 256, 256, 3)))

Describe the expected behavior
Should be able to run on RTX3090 without any issue.

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2020-11-18T08:50:57Z

@ysyyork
This issue has already been addressed here, please verify and let us know.
You may also check: #44750, #43718

google-ml-butler · 2020-11-25T09:26:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2020-12-02T09:51:04Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2020-12-02T09:51:07Z

Are you satisfied with the resolution of your issue?
Yes
No

ysyyork added the type:bug Bug label Nov 18, 2020

google-ml-butler bot assigned Saduf2019 Nov 18, 2020

Saduf2019 added the TF 2.4 for issues related to TF 2.4 label Nov 18, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Nov 18, 2020

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Nov 25, 2020

google-ml-butler bot closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.4.0rc1 not supporting RTX 3090? #44969

2.4.0rc1 not supporting RTX 3090? #44969

ysyyork commented Nov 18, 2020 •

edited

Loading

Saduf2019 commented Nov 18, 2020

google-ml-butler bot commented Nov 25, 2020

google-ml-butler bot commented Dec 2, 2020

google-ml-butler bot commented Dec 2, 2020

2.4.0rc1 not supporting RTX 3090? #44969

2.4.0rc1 not supporting RTX 3090? #44969

Comments

ysyyork commented Nov 18, 2020 • edited Loading

Saduf2019 commented Nov 18, 2020

google-ml-butler bot commented Nov 25, 2020

google-ml-butler bot commented Dec 2, 2020

google-ml-butler bot commented Dec 2, 2020

ysyyork commented Nov 18, 2020 •

edited

Loading