webgpu: Optimize depthwise conv2d #5209

qjia7 · 2021-06-11T07:04:46Z

PERF

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

qjia7 · 2021-06-11T10:44:45Z

@kainino0x @lina128 @jinjingforever Please take a look, thanks.

In our last meeting, I ever said that the perf of depthwiseConv2d was almost 2x slower than webgl on Intel GPU. With this change, the perf is close between webgpu and webgl.
hand_detector:

DepthwiseConv2dNative	13.88

becomes

DepthwiseConv2dNative	8.28

This optimization is mainly for filter size 3x3 and stride size is 1x1. It's widely used for all of we tested models.
For such kind of depthwise conv2d, due to the stride size is only 1x1, there will be many data are repeated access in each channel tile. If we only calculate one output data in one invocation, we need to access 3x3 data in x and 3x3 data in filter. So to get a 4x4 output data, we need (3x3+3x3)* (4x4) = 288 times memory access. However, if we calculate 4x4 data in one invocation, we only need to access (3x6)*4 data in x and (3x3)*4 data in filter. So it needs (3x6 + 3x3) * 4 = 108 times, which is only half of previous one.

lina128

Thank you for the explanation and great perf improvement!

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @jinjingforever and @kainino0x)

PERF

google-cla bot added the cla: yes label Jun 11, 2021

qjia7 force-pushed the depthwise_opt branch from 321ab69 to d62ac40 Compare June 11, 2021 10:06

qjia7 requested review from kainino0x, lina128 and jinjingforever June 11, 2021 10:44

lina128 approved these changes Jun 11, 2021

View reviewed changes

kainino0x approved these changes Jun 14, 2021

View reviewed changes

qjia7 added 2 commits June 15, 2021 12:17

webgpu: Optimize depthwise conv2d

7258b19

PERF

resolve the rebase issue

467dff6

qjia7 force-pushed the depthwise_opt branch from d62ac40 to 467dff6 Compare June 15, 2021 04:48

qjia7 merged commit 64f3ff1 into tensorflow:master Jun 15, 2021

qjia7 deleted the depthwise_opt branch May 5, 2023 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: Optimize depthwise conv2d #5209

webgpu: Optimize depthwise conv2d #5209

qjia7 commented Jun 11, 2021 •

edited by nsthorat

qjia7 commented Jun 11, 2021

lina128 left a comment

webgpu: Optimize depthwise conv2d #5209

webgpu: Optimize depthwise conv2d #5209

Conversation

qjia7 commented Jun 11, 2021 • edited by nsthorat

qjia7 commented Jun 11, 2021

lina128 left a comment

Choose a reason for hiding this comment

qjia7 commented Jun 11, 2021 •

edited by nsthorat