GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

wx0608 · 2024-06-08T06:18:41Z

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

tf 2.16

Custom code

Yes

OS platform and distribution

Linux Ubuntu 22.04

Mobile device

No response

Python version

3.9.19

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.4/8.9.7.29

GPU model and memory

NVIDIA GeForce RTX 3090

Current behavior?

When TF deterministic was set, runtime exception was thrown at MaxPooling2D().

Standalone code to reproduce the issue

When TF deterministic was set, runtime exception was thrown at MaxPooling2D().

Relevant log output

Traceback (most recent call last):
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3dda39ff370e>", line 1, in <module>
    runfile('/mnt/projects/Projects/Test_Classification/train_model.py', wdir='/mnt/projects/Projects/Test_Classification')
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/mnt/projects/Projects/Test_Classification/train_model.py", line 956, in <module>
    history = model.fit(x_train, y_train,
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad defined at (most recent call last):
<stack traces unavailable>
GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
	 [[{{node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad}}]]
	tf2xla conversion failed while converting __inference_one_step_on_data_13588[]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions.
	 [[StatefulPartitionedCall]] [Op:__inference_one_step_on_iterator_14045]

tilakrayal · 2024-06-10T09:41:09Z

@wx0608,
Could you please share a reproducible code/colab gist that supports your statement so that the issue can be easily understood? Thank you!

github-actions · 2024-06-18T01:50:59Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

Masumekeshavarzi · 2024-06-20T00:34:19Z

I am using DeepLabV3+ with Resent backbone and I am getting the same error when I want to have reproducible results. I put these lines as well:
SEED=123
tf.keras.utils.set_random_seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)
random.seed(SEED)
tf.random.set_seed(SEED)
np.random.seed(SEED)

os.environ['TF_DETERMINISTIC_OPS'] = '1'
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
tf.config.experimental.enable_op_determinism()

from here I realied that TF_CUDNN_DETERMINISTIC. When set to 'true' or '1', this selects deterministic gradient algorithms for tf.nn.max_poold and tf.keras.layers.MaxPoolD ,
but still get this error :GPU MaxPool gradient ops do not yet have a deterministic XLA implementation

before I has same problem with Upsampling layers in which we should use bilinear interpolation not Nearestneighbor , but with MaxPooling layers I dont know what should be changed. Please leave any meassage that might be helpful.

Masumekeshavarzi · 2024-06-20T00:35:09Z

wx0608 any findings?

github-actions · 2024-06-28T01:51:23Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

deeepwin · 2024-07-03T16:54:43Z

Same here. I guess, as enable_op_determinism() documentation states:

Certain ops will raise an `UnimplementedError` because they do not yet have a
  deterministic implementation. Additionally, due to bugs, some ops might be
  nondeterministic and not raise an `UnimplementedError`. If you encounter such
  ops, please [file an issue](https://github.com/tensorflow/tensorflow/issues).

Can you please implement MaxPool2D deterministic?

AveragePooling2D layer does not have that problem. My Setting:

tf.config.experimental.enable_op_determinism()

to make code deterministic, causes:

tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
         [[{{node gradient_tape/test-Actor/enc_max_2/MaxPool/MaxPoolGrad}}]] [Op:__inference__train_10443]

using that Tensorflow code:

        def encoder_block(x, filters, pool_size):
            x = conv_block(x, filters)
            p = KL.MaxPooling2D(pool_size, name=self._unique_name('enc_max'))(x)
            return x, p

tensorflow: 2.11.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__9_20:32:38_PDT_2021
Cuda compilation tools, release 11.3, V11.3.122
Build cuda_11.3.r11.3/compiler.30059648_0

deeepwin · 2024-07-03T18:10:01Z

Seems when disabling XLA, error disappers for MaxPooling2D:

@TF.function(jit_compile=False)
 def _train()

github-actions · 2024-07-18T01:52:18Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

flacle · 2024-08-15T13:48:28Z

Hi. If possible, please also consider deterministic XLA for MaxPool1D. Let me know in case a separate issue is needed here or in another repo.

tilakrayal · 2024-08-16T13:13:44Z

@wx0608,
Could you please check whether this issue is resolved with the latest tensorflow v2.17 which contains Keras 3.0 version. As Keras3 now built to support multiple backends there are some changes in design.

Thank you!

deeepwin · 2024-08-18T10:47:38Z

Tested with Keras 3.0, Tensorflow 2.17. Still same error:

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation. [[{{node gradient_tape/test-5321-Actor_1/enc_max_26_1/MaxPool2d/MaxPoolGrad}}]]

For me MaxPool1D is not an option.

summa-code · 2024-08-28T03:21:14Z

I am having the same issue with MaxPool1D

steveepreston · 2024-08-28T05:50:11Z

this issue still exist.
how to solve it?

google-ml-butler bot added the type:feature Feature requests label Jun 8, 2024

google-ml-butler bot assigned sushreebarsa Jun 8, 2024

sushreebarsa added comp:ops OPs related issues TF 2.16 labels Jun 10, 2024

sushreebarsa assigned tilakrayal and unassigned sushreebarsa Jun 10, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jun 10, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 18, 2024

google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jun 20, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jun 20, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 28, 2024

google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jul 3, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jul 10, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jul 18, 2024

tilakrayal added type:bug Bug and removed type:feature Feature requests labels Jul 24, 2024

github-actions bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jul 25, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 8, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 15, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Aug 16, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 18, 2024

tilakrayal added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

wx0608 commented Jun 8, 2024

tilakrayal commented Jun 10, 2024

github-actions bot commented Jun 18, 2024

Masumekeshavarzi commented Jun 20, 2024

Masumekeshavarzi commented Jun 20, 2024

github-actions bot commented Jun 28, 2024

deeepwin commented Jul 3, 2024 •

edited

Loading

deeepwin commented Jul 3, 2024

github-actions bot commented Jul 18, 2024

flacle commented Aug 15, 2024

tilakrayal commented Aug 16, 2024

deeepwin commented Aug 18, 2024

summa-code commented Aug 28, 2024

steveepreston commented Aug 28, 2024 •

edited

Loading

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

GPU MaxPool gradient ops do not yet have a deterministic XLA implementation #69417

Comments

wx0608 commented Jun 8, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

tilakrayal commented Jun 10, 2024

github-actions bot commented Jun 18, 2024

Masumekeshavarzi commented Jun 20, 2024

Masumekeshavarzi commented Jun 20, 2024

github-actions bot commented Jun 28, 2024

deeepwin commented Jul 3, 2024 • edited Loading

deeepwin commented Jul 3, 2024

github-actions bot commented Jul 18, 2024

flacle commented Aug 15, 2024

tilakrayal commented Aug 16, 2024

deeepwin commented Aug 18, 2024

summa-code commented Aug 28, 2024

steveepreston commented Aug 28, 2024 • edited Loading

deeepwin commented Jul 3, 2024 •

edited

Loading

steveepreston commented Aug 28, 2024 •

edited

Loading