Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.4.0rc1 not supporting RTX 3090? #44969

Closed
ysyyork opened this issue Nov 18, 2020 · 4 comments
Closed

2.4.0rc1 not supporting RTX 3090? #44969

ysyyork opened this issue Nov 18, 2020 · 4 comments
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug

Comments

@ysyyork
Copy link

ysyyork commented Nov 18, 2020

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Docker image tensorflow/tensorflow:2.4.0rc1-gpu-jupyter

Describe the current behavior
I checked the cuda version in the container which is cuda 11.1. I also have nvidia driver 455.38 installed on my host machine. I have a custom built of tf 2.3.0 working with this setup. But when I pull the docker image of tensorflow/tensorflow:2.4.0rc1-gpu-jupyter it's raising CUDNN errors like below:

2020-11-18 07:54:42.398846: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-18 07:54:42.417594: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3693205000 Hz
2020-11-18 07:54:43.206492: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-18 07:54:44.092906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-18 07:54:44.097015: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-11-18 07:54:44.422337: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-18 07:54:44.429661: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-6-976a0a33b022> in <module>
----> 1 m.predict(np.zeros((1, 256, 256, 3), dtype=np.float32))

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
   1627           for step in data_handler.steps():
   1628             callbacks.on_predict_batch_begin(step)
-> 1629             tmp_batch_outputs = self.predict_function(iterator)
   1630             if data_handler.should_sync:
   1631               context.async_wait()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    826     tracing_count = self.experimental_get_tracing_count()
    827     with trace.Trace(self._name) as tm:
--> 828       result = self._call(*args, **kwds)
    829       compiler = "xla" if self._experimental_compile else "nonXla"
    830       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    893       # If we did not create any variables the trace we have is good enough.
    894       return self._concrete_stateful_fn._call_flat(
--> 895           filtered_flat_args, self._concrete_stateful_fn.captured_inputs)  # pylint: disable=protected-access
    896 
    897     def fn_with_cond(inner_args, inner_kwds, inner_filtered_flat_args):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1917       # No tape is watching; skip to running the function.
   1918       return self._build_call_outputs(self._inference_function.call(
-> 1919           ctx, args, cancellation_manager=cancellation_manager))
   1920     forward_backward = self._select_forward_and_backward_functions(
   1921         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    558               inputs=args,
    559               attrs=attrs,
--> 560               ctx=ctx)
    561         else:
    562           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node efficientnetb0/stem_conv/Conv2D (defined at <ipython-input-6-976a0a33b022>:1) ]] [Op:__inference_predict_function_5650]

Function call stack:
predict_function

code to replicate:

from tensorflow.python.keras.applications.efficientnet import EfficientNetB0
m = EfficientNetB0(weights=None, input_shape=(256, 256, 3))
import numpy as np
m.predict(np.zeros((1, 256, 256, 3)))

Describe the expected behavior
Should be able to run on RTX3090 without any issue.

@ysyyork ysyyork added the type:bug Bug label Nov 18, 2020
@Saduf2019 Saduf2019 added the TF 2.4 for issues related to TF 2.4 label Nov 18, 2020
@Saduf2019
Copy link
Contributor

@ysyyork
This issue has already been addressed here, please verify and let us know.
You may also check: #44750, #43718

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Nov 18, 2020
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Nov 25, 2020
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.4 for issues related to TF 2.4 type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants