Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

shkarupa-alex · 2023-04-13T10:01:27Z

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

2.12.0, 2.13.0-dev20230412

Custom Code

Yes

OS Platform and Distribution

Google Colab

Mobile device

No response

Python version

Google Colab

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

11.8

GPU model and memory

No response

Current Behaviour?

See example below to reproduce. Here is speed test results:

stride-1 conv 40.66
stride-1 conv_jit 64.11 // conv2d is slower with JIT but only if stride=1
stride-2 conv 40.18
stride-2 conv_jit 28.05 // when stride=2 it is FASTER with JIT

stride-1 dwconv 9.82
stride-1 dwconv_jit 5.72 // dwconv is faster with JIT but only if stride=1
stride-2 dwconv 2.59
stride-2 dwconv_jit 4.2 // when stride=2 it is SLOWER with JIT

Standalone code to reproduce the issue

https://colab.research.google.com/drive/1zqqPVVKt4ILRA1rCoWjB1uOtB3D0hDc-?usp=sharing

Relevant log output

No response

SuryanarayanaY · 2023-04-27T09:54:18Z

Hi @shkarupa-alex ,

I am not sure whether this is better comparison of XLA performance or not. I have increased the repeats to 1000 and also found similar results. Please refer to attached gist.

stride-1 conv 44.51
stride-1 conv_jit 64.96
stride-1 sepconv 19.95
stride-1 sepconv_jit 17.47
stride-1 dwconv 9.0
stride-1 dwconv_jit 5.89

stride-2 conv 32.42
stride-2 conv_jit 28.98
stride-2 sepconv 5.18
stride-2 sepconv_jit 6.97
stride-2 dwconv 2.58
stride-2 dwconv_jit 4.29

cheshire · 2023-05-11T10:07:14Z

@shkarupa-alex @SuryanarayanaY Thanks for the bug! Would you be interested in pursuing this further? E.g. when looked under nsys systems profiler, are the kernel names different? A next step would be looking into conv_algorithm_picker, enabling logs and checking why in XLA:GPU a different kernel from cuDNN is chosen from TF.

google-ml-butler bot added the type:bug Bug label Apr 13, 2023

google-ml-butler bot assigned sushreebarsa Apr 13, 2023

sushreebarsa added TF 2.12 For issues related to Tensorflow 2.12 comp:xla XLA labels Apr 13, 2023

sushreebarsa assigned SuryanarayanaY and unassigned sushreebarsa Apr 19, 2023

SuryanarayanaY added the type:performance Performance Issue label Apr 20, 2023

SuryanarayanaY assigned sachinprasadhs and unassigned SuryanarayanaY May 5, 2023

sachinprasadhs assigned cheshire May 10, 2023

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

shkarupa-alex commented Apr 13, 2023 •

edited by google-ml-butler bot

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Apr 27, 2023

cheshire commented May 11, 2023

Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

Comments

shkarupa-alex commented Apr 13, 2023 • edited by google-ml-butler bot

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Apr 27, 2023

cheshire commented May 11, 2023

shkarupa-alex commented Apr 13, 2023 •

edited by google-ml-butler bot