Skip to content

[webgpu]: optimize pool operators #24598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2025
Merged

Conversation

xhcao
Copy link
Contributor

@xhcao xhcao commented Apr 30, 2025

The patch optimizes pool operators when output size is small and kernel size is big

Description

Motivation and Context

The patch optimizes pool operators when output size is small
and kernel size is big
@xhcao
Copy link
Contributor Author

xhcao commented Apr 30, 2025

The issue is from an user's model.
If the input data shape is [1, 64, 128, 128] (NCHW), the original code needs one work group for 64 output elements. The kernel size is 128128, so there is a 128128 loop.
If the output size is small, one work group is responsible for one output element, there is only 128 loop.
@jchen10 PTAL

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 30, 2025
@guschmue
Copy link
Contributor

guschmue commented May 6, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@xhcao
Copy link
Contributor Author

xhcao commented May 6, 2025

#23614

@guschmue
Copy link
Contributor

guschmue commented May 6, 2025

this ci error is unrelated to this PR - fix incoming

@guschmue guschmue merged commit cdff2c1 into microsoft:main May 6, 2025
81 checks passed
@vadimkantorov
Copy link

vadimkantorov commented May 9, 2025

does this PR also introduce a shortcut when the input size is equal to kernel size - same code path as simply taking a mean?

@xhcao
Copy link
Contributor Author

xhcao commented May 12, 2025

does this PR also introduce a shortcut when the input size is equal to kernel size - same code path as simply taking a mean?

Hi, @vadimkantorov The path is valid when

bool are_small_output_big_kernel = output_size <= 128 && kernel_size >= 128;
, when the kernel size is large and the output size is small.
same code path as simply taking a mean?, Do you mean AveragePool and GlobalAveragePool? The path is also valid for these operators.

@vadimkantorov
Copy link

vadimkantorov commented May 12, 2025

Yes, I mean global pooling, where output_size is strictly 1x1, while input_size (matching kernel_size as in #23614) can be 100 or anything else.

I guess output_size = 1, output_size = kernel_size = 100 would not pass this check, right?

@xhcao
Copy link
Contributor Author

xhcao commented May 12, 2025

ut_size = kernel_size = 100 w

In #23614, the input shape is [1, 56, 80, 128], the kernel shape is [80, 128], so the output shape size is 56 (< 128), the kernel size is 80 * 128 (> 128), which will pass this check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants