Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stride-1 tf.nn.conv2d with XLA is 1.5x slower then without XLA, as far as stride-2 tf.nn.depthwise_conv2d #60312

Open
shkarupa-alex opened this issue Apr 13, 2023 · 2 comments
Assignees
Labels
comp:xla XLA stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 type:bug Bug type:performance Performance Issue

Comments

@shkarupa-alex
Copy link
Contributor

shkarupa-alex commented Apr 13, 2023

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

2.12.0, 2.13.0-dev20230412

Custom Code

Yes

OS Platform and Distribution

Google Colab

Mobile device

No response

Python version

Google Colab

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

11.8

GPU model and memory

No response

Current Behaviour?

See example below to reproduce. Here is speed test results:

stride-1 conv 40.66
stride-1 conv_jit 64.11 // conv2d is slower with JIT but only if stride=1
stride-2 conv 40.18
stride-2 conv_jit 28.05 // when stride=2 it is FASTER with JIT

stride-1 dwconv 9.82
stride-1 dwconv_jit 5.72 // dwconv is faster with JIT but only if stride=1
stride-2 dwconv 2.59
stride-2 dwconv_jit 4.2 // when stride=2 it is SLOWER with JIT

Standalone code to reproduce the issue

https://colab.research.google.com/drive/1zqqPVVKt4ILRA1rCoWjB1uOtB3D0hDc-?usp=sharing

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:bug Bug label Apr 13, 2023
@sushreebarsa sushreebarsa added TF 2.12 For issues related to Tensorflow 2.12 comp:xla XLA labels Apr 13, 2023
@SuryanarayanaY SuryanarayanaY added the type:performance Performance Issue label Apr 20, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @shkarupa-alex ,

I am not sure whether this is better comparison of XLA performance or not. I have increased the repeats to 1000 and also found similar results. Please refer to attached gist.

stride-1 conv 44.51
stride-1 conv_jit 64.96
stride-1 sepconv 19.95
stride-1 sepconv_jit 17.47
stride-1 dwconv 9.0
stride-1 dwconv_jit 5.89

stride-2 conv 32.42
stride-2 conv_jit 28.98
stride-2 sepconv 5.18
stride-2 sepconv_jit 6.97
stride-2 dwconv 2.58
stride-2 dwconv_jit 4.29

@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 10, 2023
@cheshire
Copy link
Member

@shkarupa-alex @SuryanarayanaY Thanks for the bug! Would you be interested in pursuing this further? E.g. when looked under nsys systems profiler, are the kernel names different? A next step would be looking into conv_algorithm_picker, enabling logs and checking why in XLA:GPU a different kernel from cuDNN is chosen from TF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:xla XLA stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 type:bug Bug type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

5 participants