Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgl] Modularized mean kernel with custom shader. #4033

Merged
merged 32 commits into from
Oct 14, 2020
Merged

Conversation

annxingyuan
Copy link
Contributor

@annxingyuan annxingyuan commented Oct 7, 2020

Changes

  • Adds custom mean kernel to WebGL that achieves div --> sum in a single pass, which mitigates precision issues arising from large divisors.
  • On the DDSP model running on an iPhone6, this patch brings the final output to within .015 of the reference values.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@google-cla google-cla bot added the cla: yes label Oct 7, 2020
@annxingyuan annxingyuan marked this pull request as ready for review October 8, 2020 13:32
@annxingyuan
Copy link
Contributor Author

Hi @pyu10055 - FYI I've verified this fix with my iPhone on the DDSP demo.

@annxingyuan
Copy link
Contributor Author

Hi @tafsiri - this PR is mostly about a precision issue in WebGL that Ping's been facing, but was also hoping to get your feedback on the modularization bit.

@annxingyuan annxingyuan changed the title [webgl] Custom mean kernel. [webgl] Modularized mean kernel with custom shader. Oct 8, 2020
Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring this, a high level question on multiple reduction stages, please take a look at the comments below.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):

/**
 * @license
 * Copyright 2017 Google LLC. All Rights Reserved.

2020


tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):

    if (divisor != null) {
      updateSnippet =
          `sumValue += dot(values * ${(1 / divisor).toPrecision(8)}, ones);`;

why it need to limit to precision 8?


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

function meanReduce(
    x: TensorInfo, reduceSize: number, backend: MathBackendWebGL): TensorInfo {
  const reductionStages = getReductionStages(x.shape);

Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)


tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

2020

Done


tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

why it need to limit to precision 8?

Done


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?

Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.

I see, will it be possible to merge mean as an operation for ReduceProgram?

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

I see, will it be possible to merge mean as an operation for ReduceProgram?

This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?

Copy link
Contributor

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):

    result = backend.runWebGLProgram(program, [result], 'float32');

    if (previousResult.dataId !== x.dataId) {

from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.

Copy link
Contributor

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1, 1 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):

    if (meanInputIsTransposed) {
      webglBackend.disposeIntermediateTensorInfo(meanInput);

A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.

An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.

An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).

Done

great idea!


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?

Done

I refactored this per our offline discussion - let me know if this is what you had in mind.


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.

That's true, but previousResult is assigned to result before we call runWebGLProgram, so the first run through this loop previousResult equals x.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)


tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I refactored this per our offline discussion - let me know if this is what you had in mind.

Nice, Thanks!


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

import {expectArraysClose, expectArraysEqual} from '../test_util';

describeWithFlags('mean', ALL_ENVS, () => {

can you add a test that covers the large reduction dimension?

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add a test that covers the large reduction dimension?

Done

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I meant input as something like tf.ones([1, 70000])

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

I meant input as something like tf.ones([1, 70000])

will it be too slow for cpu?

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

will it be too slow for cpu?

Done

I moved the test to the webgl backend only.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Done

I moved the test to the webgl backend only.

cool, I did not see the webgl backend only test?

Copy link
Contributor Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)


tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

cool, I did not see the webgl backend only test?

https://github.com/tensorflow/tfjs/pull/4033/files#diff-c6e4bfd0982ba56ef4fab7f9a7f10183

@annxingyuan annxingyuan merged commit 8ba34b4 into master Oct 14, 2020
@annxingyuan annxingyuan deleted the webgl_mean_2 branch October 14, 2020 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants