Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OP_REQUIRES failed at conv_ops.cc:1276 : Not found: No algorithm worked! #52223

Open
albertz opened this issue Oct 1, 2021 · 1 comment
Open
Assignees
Labels
2.6.0 comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@albertz
Copy link
Contributor

albertz commented Oct 1, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip binary
  • TensorFlow version (use command below): v2.6.0-rc2-32-g919f693420e 2.6.0
  • Python version: 3.8.10
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.4 / 8.2.4.15
  • GPU model and memory: NVIDIA GeForce RTX 2070

Describe the current behavior

I get the following error on GPU:

2021-10-01 23:05:27.951528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6173 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070, pci bus id:0000:09:00.0, compute capability: 7.5
2021-10-01 23:05:28.331213: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8204
2021-10-01 23:05:28.866860: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at conv_ops.cc:1276 : Not found: No algorithm worked!
Traceback (most recent call last):
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: No algorithm worked!
         [[{{node convolution}}]]
  (1) Not found: No algorithm worked!
         [[{{node convolution}}]]
         [[convolution/_5]]
0 successful operations.
0 derived errors ignored.

Describe the expected behavior

It is expected that I get some error here. But it should be about some wrong filter shape. The exception I get is very misleading.

On CPU, the exception looks much different, and is more like what I would expect.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy

tf.compat.v1.disable_eager_execution()


with tf.Graph().as_default() as graph:
    with tf.compat.v1.Session(graph=graph) as session:
        x = tf.compat.v1.placeholder(tf.float32, (None, None, 1, 40))  # [B,T,1,40]
        filters = tf.compat.v1.placeholder(tf.float32, (3, 3, None, 32))
        y = tf.compat.v1.nn.convolution(x, filter=filters, padding="SAME")

        session.run(
            y,
            feed_dict={
                x: numpy.zeros((3, 4, 1, 40)),
                filters: numpy.zeros((3, 3, 1, 32)),
                })

Other info / logs

This problem was originally reported here: rwth-i6/returnn#703

There are a couple of the same error also reported here:

In many of these cases, it seems to be caused by too less GPU memory. However, not in the case here. So probably these are not duplicates. Although it's not totally clear.

I stumbled upon this problem due to a wrong model checkpoint, and the model checkpoint loading ignored the different shape of the filter, which is another bug (#52220).

@mohantym
Copy link
Contributor

mohantym commented Oct 4, 2021

Hi @sanatmpa1! Could you please look at this issue , Its not replicating when filters shape is set to (3, 3, None, 40) (i.e even numbers as Errors Indicate). Attaching GIST in TF 2.5 ,2.6 and 2.7 for reference.

@mohantym mohantym assigned sanatmpa1 and unassigned mohantym Oct 4, 2021
@sanatmpa1 sanatmpa1 assigned sachinprasadhs and unassigned sanatmpa1 Oct 8, 2021
@sachinprasadhs sachinprasadhs added stat:awaiting response Status - Awaiting response from author stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants