Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

Open
wx0608 opened this issue Jun 8, 2024 · 13 comments
Open
Assignees
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.16 type:bug Bug

Comments

@wx0608
Copy link

wx0608 commented Jun 8, 2024

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

tf 2.16

Custom code

Yes

OS platform and distribution

Linux Ubuntu 22.04

Mobile device

No response

Python version

3.9.19

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.4/8.9.7.29

GPU model and memory

NVIDIA GeForce RTX 3090

Current behavior?

When TF deterministic was set, runtime exception was thrown at MaxPooling2D().

Standalone code to reproduce the issue

When TF deterministic was set, runtime exception was thrown at MaxPooling2D().

Relevant log output

Traceback (most recent call last):
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3dda39ff370e>", line 1, in <module>
    runfile('/mnt/projects/Projects/Test_Classification/train_model.py', wdir='/mnt/projects/Projects/Test_Classification')
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/mnt/projects/Projects/Test_Classification/train_model.py", line 956, in <module>
    history = model.fit(x_train, y_train,
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad defined at (most recent call last):
<stack traces unavailable>
GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
	 [[{{node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad}}]]
	tf2xla conversion failed while converting __inference_one_step_on_data_13588[]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions.
	 [[StatefulPartitionedCall]] [Op:__inference_one_step_on_iterator_14045]
@google-ml-butler google-ml-butler bot added the type:feature Feature requests label Jun 8, 2024
@sushreebarsa sushreebarsa added comp:ops OPs related issues TF 2.16 labels Jun 10, 2024
@tilakrayal
Copy link
Contributor

@wx0608,
Could you please share a reproducible code/colab gist that supports your statement so that the issue can be easily understood? Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jun 10, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 18, 2024
@Masumekeshavarzi
Copy link

I am using DeepLabV3+ with Resent backbone and I am getting the same error when I want to have reproducible results. I put these lines as well:
SEED=123
tf.keras.utils.set_random_seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)
random.seed(SEED)
tf.random.set_seed(SEED)
np.random.seed(SEED)

os.environ['TF_DETERMINISTIC_OPS'] = '1'
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
tf.config.experimental.enable_op_determinism()

from here I realied that TF_CUDNN_DETERMINISTIC. When set to 'true' or '1', this selects deterministic gradient algorithms for tf.nn.max_poold and tf.keras.layers.MaxPoolD ,
but still get this error :GPU MaxPool gradient ops do not yet have a deterministic XLA implementation

before I has same problem with Upsampling layers in which we should use bilinear interpolation not Nearestneighbor , but with MaxPooling layers I dont know what should be changed. Please leave any meassage that might be helpful.

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jun 20, 2024
@Masumekeshavarzi
Copy link

wx0608 any findings?

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jun 20, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 28, 2024
@deeepwin
Copy link

deeepwin commented Jul 3, 2024

Same here. I guess, as enable_op_determinism() documentation states:

Certain ops will raise an `UnimplementedError` because they do not yet have a
  deterministic implementation. Additionally, due to bugs, some ops might be
  nondeterministic and not raise an `UnimplementedError`. If you encounter such
  ops, please [file an issue](https://github.com/tensorflow/tensorflow/issues).

Can you please implement MaxPool2D deterministic?

AveragePooling2D layer does not have that problem. My Setting:

tf.config.experimental.enable_op_determinism()

to make code deterministic, causes:

tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
         [[{{node gradient_tape/test-Actor/enc_max_2/MaxPool/MaxPoolGrad}}]] [Op:__inference__train_10443]

using that Tensorflow code:

        def encoder_block(x, filters, pool_size):
            x = conv_block(x, filters)
            p = KL.MaxPooling2D(pool_size, name=self._unique_name('enc_max'))(x)
            return x, p

tensorflow: 2.11.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__9_20:32:38_PDT_2021
Cuda compilation tools, release 11.3, V11.3.122
Build cuda_11.3.r11.3/compiler.30059648_0

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jul 3, 2024
@deeepwin
Copy link

deeepwin commented Jul 3, 2024

Seems when disabling XLA, error disappers for MaxPooling2D:

@TF.function(jit_compile=False)
 def _train()

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jul 10, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jul 18, 2024
@tilakrayal tilakrayal added type:bug Bug and removed type:feature Feature requests labels Jul 24, 2024
@github-actions github-actions bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jul 25, 2024
@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 8, 2024
@flacle
Copy link

flacle commented Aug 15, 2024

Hi. If possible, please also consider deterministic XLA for MaxPool1D. Let me know in case a separate issue is needed here or in another repo.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 15, 2024
@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024
@tilakrayal
Copy link
Contributor

@wx0608,
Could you please check whether this issue is resolved with the latest tensorflow v2.17 which contains Keras 3.0 version. As Keras3 now built to support multiple backends there are some changes in design.

Thank you!

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024
@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024
@deeepwin
Copy link

Tested with Keras 3.0, Tensorflow 2.17. Still same error:

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation. [[{{node gradient_tape/test-5321-Actor_1/enc_max_26_1/MaxPool2d/MaxPoolGrad}}]]

For me MaxPool1D is not an option.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 18, 2024
@summa-code
Copy link

I am having the same issue with MaxPool1D

@steveepreston
Copy link

steveepreston commented Aug 28, 2024

this issue still exist.
how to solve it?

@tilakrayal tilakrayal added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.16 type:bug Bug
Projects
None yet
Development

No branches or pull requests

8 participants