Fine-tune packed BackpropInput #7386

Linchenn · 2023-02-16T19:37:02Z

Originally, we use idyCVal and idyCVal2 (could only be 0.0 or 1.0) to indicate if result.xy and result.zw are valid and result.xy * idyCVal and result.zw * idyCVal2 could return the results to avoid checking through if-branches.

If stride is 1, both idyCVal and idyCVal2 are always 1.0 and the original solution performs well. However, if stride is 2, whenever, only one of idyCVal and idyCVal2 is 1.0, which means either computing result.xy or result.zw is wasting of time.

As a result, this PR adds the if-branches to check idyCVal and idyCVal2 before computing, instead of always computing result.xy * idyCVal and result.zw * idyCVal2. This improves Conv2DBackpropInput ops in ArPortraitDepth ~40% and ArPortraitDepth model is improved from 23ms to 18.5ms (Benchmarked on LENOVO P620 2021).

Before the PR:

After the PR:

Reference #7371.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

Linchenn

Reviewable status: 0 of 1 approvals obtained

tfjs-backend-webgl/src/conv_backprop_packed_gpu.ts line 78 at r1 (raw file):

            if (idyCVal && idyCVal2) {
              for (int d2 = 0; d2 < ${convInfo.outChannels}; d2 += 2) {
                vec4 wValue = getW(wRPerm, wCPerm, d1, d2);

Even though the three branches have the same codes, we still should not move the if-branches into the for loop. Otherwise the performance drops (for Conv2DBackpropInput op, drops ~10%), probably because the if-branches will be executed for each iteration (then threads probably would be stalled/synced in each iteration).

Code quote:

              for (int d2 = 0; d2 < ${convInfo.outChannels}; d2 += 2) {
                vec4 wValue = getW(wRPerm, wCPerm, d1, d2);

pyu10055

great work, thanks!

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @qjia7)

qjia7

LGTM, thanks!

Linchenn · 2023-02-17T19:36:47Z

Benchmark on Samsung Galaxy S22 Ultra:

Before this PR (with packed ConvBackpropInput):

After this PR:
ConvBackpropInput op's inference time drops (the top1 drops from 5.05 to 4.10)

qjia7 · 2023-02-20T08:24:09Z

I just saw a big regression on Intel devices (both CFL and ADL) with this PR.
Before (CFL)

After (CFL)

Before (ADL)

After (ADL)

qjia7 · 2023-02-20T08:29:31Z

It's surprising that there is such a big regressions for Intel devices. But the similar algorithm works well for webgpu. It may be a driver bug if it only happens on Intel devices. I can report an issue to our driver team. But it will be good if we can skip this optimization for Intel devices on TFJS level temporarily.

Fine-tune packed BackpropInput

dcf35cd

Linchenn commented Feb 16, 2023

View reviewed changes

Linchenn requested review from pyu10055 and qjia7 February 16, 2023 19:42

lint

61f2435

pyu10055 approved these changes Feb 16, 2023

View reviewed changes

qjia7 approved these changes Feb 17, 2023

View reviewed changes

qjia7 and others added 2 commits February 17, 2023 10:28

Merge branch 'master' into impBackprop

baa201a

Merge branch 'master' into impBackprop

d27c3cd

Linchenn merged commit 7921dd5 into tensorflow:master Feb 17, 2023

qjia7 mentioned this pull request Jun 6, 2023

[Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand_detector models in WebGL #5197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tune packed BackpropInput #7386

Fine-tune packed BackpropInput #7386

Uh oh!

Linchenn commented Feb 16, 2023 •

edited

Loading

Uh oh!

Linchenn left a comment

Uh oh!

pyu10055 left a comment

Uh oh!

qjia7 left a comment

Uh oh!

Linchenn commented Feb 17, 2023 •

edited

Loading

Uh oh!

qjia7 commented Feb 20, 2023

Uh oh!

qjia7 commented Feb 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fine-tune packed BackpropInput #7386

Fine-tune packed BackpropInput #7386

Uh oh!

Conversation

Linchenn commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Linchenn left a comment

Choose a reason for hiding this comment

Uh oh!

pyu10055 left a comment

Choose a reason for hiding this comment

Uh oh!

qjia7 left a comment

Choose a reason for hiding this comment

Uh oh!

Linchenn commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qjia7 commented Feb 20, 2023

Uh oh!

qjia7 commented Feb 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Linchenn commented Feb 16, 2023 •

edited

Loading

Linchenn commented Feb 17, 2023 •

edited

Loading