[webgl] Modularized mean kernel with custom shader. #4033

annxingyuan · 2020-10-07T15:27:28Z

Changes

Adds custom mean kernel to WebGL that achieves div --> sum in a single pass, which mitigates precision issues arising from large divisors.
On the DDSP model running on an iPhone6, this patch brings the final output to within .015 of the reference values.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

annxingyuan · 2020-10-08T13:33:47Z

Hi @pyu10055 - FYI I've verified this fix with my iPhone on the DDSP demo.

annxingyuan · 2020-10-08T13:34:41Z

Hi @tafsiri - this PR is mostly about a precision issue in WebGL that Ping's been facing, but was also hoping to get your feedback on the modularization bit.

pyu10055

Thanks for refactoring this, a high level question on multiple reduction stages, please take a look at the comments below.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):

/**
 * @license
 * Copyright 2017 Google LLC. All Rights Reserved.

2020

tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):

    if (divisor != null) {
      updateSnippet =
          `sumValue += dot(values * ${(1 / divisor).toPrecision(8)}, ones);`;

why it need to limit to precision 8?

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

function meanReduce(
    x: TensorInfo, reduceSize: number, backend: MathBackendWebGL): TensorInfo {
  const reductionStages = getReductionStages(x.shape);

Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

2020

Done

tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

why it need to limit to precision 8?

Done

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?

Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.

I see, will it be possible to merge mean as an operation for ReduceProgram?

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

I see, will it be possible to merge mean as an operation for ReduceProgram?

This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?

tafsiri

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):

    result = backend.runWebGLProgram(program, [result], 'float32');

    if (previousResult.dataId !== x.dataId) {

from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.

tafsiri

Reviewed 1 of 6 files at r1, 1 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):

    if (meanInputIsTransposed) {
      webglBackend.disposeIntermediateTensorInfo(meanInput);

A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.

An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.

An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).

Done

great idea!

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?

Done

I refactored this per our offline discussion - let me know if this is what you had in mind.

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.

That's true, but previousResult is assigned to result before we call runWebGLProgram, so the first run through this loop previousResult equals x.

pyu10055

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I refactored this per our offline discussion - let me know if this is what you had in mind.

Nice, Thanks!

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

import {expectArraysClose, expectArraysEqual} from '../test_util';

describeWithFlags('mean', ALL_ENVS, () => {

can you add a test that covers the large reduction dimension?

annxingyuan

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add a test that covers the large reduction dimension?

Done

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I meant input as something like tf.ones([1, 70000])

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

I meant input as something like tf.ones([1, 70000])

will it be too slow for cpu?

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

will it be too slow for cpu?

Done

I moved the test to the webgl backend only.

pyu10055

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I moved the test to the webgl backend only.

cool, I did not see the webgl backend only test?

annxingyuan

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)

tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

cool, I did not see the webgl backend only test?

https://github.com/tensorflow/tfjs/pull/4033/files#diff-c6e4bfd0982ba56ef4fab7f9a7f10183

…mean_2

setup

8a4b50e

google-cla bot added the cla: yes label Oct 7, 2020

annxingyuan added 9 commits October 7, 2020 14:18

mean

3214db2

setupgs

2c6fe3a

fix

b3c03ad

snapshot

1cc989a

simplify

cb03db2

clean

a3c6ab4

add comment

a52eeaa

clean

94cb946

fix

95f61c0

annxingyuan requested a review from pyu10055 October 8, 2020 13:32

annxingyuan marked this pull request as ready for review October 8, 2020 13:32

annxingyuan added 2 commits October 8, 2020 09:32

rename

69f3413

Merge branch 'master' into webgl_mean_2

d3765d6

annxingyuan requested a review from tafsiri October 8, 2020 13:34

annxingyuan changed the title ~~[webgl] Custom mean kernel.~~ [webgl] Modularized mean kernel with custom shader. Oct 8, 2020

annxingyuan added 7 commits October 8, 2020 10:51

fix

e0aa059

fix

92bdd3c

clean

cdc61c8

embiggen test

f2fce2e

Merge branch 'master' into webgl_mean_2

57b40da

inc prec

6251470

Merge branch 'master' into webgl_mean_2

eecb01f

pyu10055 requested changes Oct 9, 2020

View reviewed changes

add

5d20810

annxingyuan commented Oct 9, 2020

View reviewed changes

save

0318868

annxingyuan added 3 commits October 9, 2020 05:56

test

f5112d3

Merge branch 'master' into webgl_mean_2

942ed6c

merge

12fa219

pyu10055 requested changes Oct 9, 2020

View reviewed changes

annxingyuan commented Oct 9, 2020

View reviewed changes

tafsiri reviewed Oct 9, 2020

View reviewed changes

annxingyuan added 2 commits October 12, 2020 09:20

merge

6bc92ce

clean

4eb81c7

annxingyuan commented Oct 12, 2020

View reviewed changes

pyu10055 approved these changes Oct 12, 2020

View reviewed changes

mean test

7d842ce

annxingyuan commented Oct 12, 2020

View reviewed changes

pyu10055 requested changes Oct 12, 2020

View reviewed changes

add

355a399

annxingyuan commented Oct 12, 2020

View reviewed changes

pyu10055 requested changes Oct 12, 2020

View reviewed changes

annxingyuan commented Oct 12, 2020

View reviewed changes

pyu10055 approved these changes Oct 13, 2020

View reviewed changes

annxingyuan added 4 commits October 13, 2020 17:08

Merge branch 'master' into webgl_mean_2

e67814e

Merge branch 'master' into webgl_mean_2

c6c27bc

lint

eb5c4c0

Merge branch 'webgl_mean_2' of github.com:tensorflow/tfjs into webgl_…

0faa969

…mean_2

annxingyuan merged commit 8ba34b4 into master Oct 14, 2020

annxingyuan deleted the webgl_mean_2 branch October 14, 2020 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgl] Modularized mean kernel with custom shader. #4033

[webgl] Modularized mean kernel with custom shader. #4033

annxingyuan commented Oct 7, 2020 •

edited

Loading

annxingyuan commented Oct 8, 2020

annxingyuan commented Oct 8, 2020

pyu10055 left a comment

annxingyuan left a comment

pyu10055 left a comment

annxingyuan left a comment

tafsiri left a comment

tafsiri left a comment

annxingyuan left a comment

pyu10055 left a comment

annxingyuan left a comment

pyu10055 left a comment

pyu10055 left a comment

annxingyuan left a comment

pyu10055 left a comment

annxingyuan left a comment

[webgl] Modularized mean kernel with custom shader. #4033

[webgl] Modularized mean kernel with custom shader. #4033

Conversation

annxingyuan commented Oct 7, 2020 • edited Loading

Changes

annxingyuan commented Oct 8, 2020

annxingyuan commented Oct 8, 2020

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

tafsiri left a comment

Choose a reason for hiding this comment

tafsiri left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

annxingyuan left a comment

Choose a reason for hiding this comment

annxingyuan commented Oct 7, 2020 •

edited

Loading