-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgl] Modularized mean kernel with custom shader. #4033
Conversation
Hi @pyu10055 - FYI I've verified this fix with my iPhone on the DDSP demo. |
Hi @tafsiri - this PR is mostly about a precision issue in WebGL that Ping's been facing, but was also hoping to get your feedback on the modularization bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring this, a high level question on multiple reduction stages, please take a look at the comments below.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):
/** * @license * Copyright 2017 Google LLC. All Rights Reserved.
2020
tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):
if (divisor != null) { updateSnippet = `sumValue += dot(values * ${(1 / divisor).toPrecision(8)}, ones);`;
why it need to limit to precision 8?
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
function meanReduce( x: TensorInfo, reduceSize: number, backend: MathBackendWebGL): TensorInfo { const reductionStages = getReductionStages(x.shape);
Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)
tfjs-backend-webgl/src/mean_gpu.ts, line 3 at r2 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
2020
Done
tfjs-backend-webgl/src/mean_gpu.ts, line 36 at r2 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
why it need to limit to precision 8?
Done
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
Why this still need multiple reduction stages if the divisor can passed in to the mean program?
If the calculation is done with float32 within the shader, there should be not precision problem of float16, right?
Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Currently mean goes through a div kernel and then a sum kernel, and the sum kernel requires one or more passes (like other reduction ops - max, min, prod, etc.). This PR fuses div into the first pass of the sum kernel, but we still may need multiple passes in order to finish computing sum.
I see, will it be possible to merge mean as an operation for ReduceProgram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
I see, will it be possible to merge mean as an operation for ReduceProgram?
This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):
result = backend.runWebGLProgram(program, [result], 'float32'); if (previousResult.dataId !== x.dataId) {
from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 1 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):
if (meanInputIsTransposed) { webglBackend.disposeIntermediateTensorInfo(meanInput);
A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.
An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean.ts, line 77 at r3 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
A code style suggestion, how would you feel about asserting meanInput !== x before disposing meanInput (here and in other places where a variable that is going to be disposed is at somepoint pointing to an input tensorInfo. The code you have here is correct, but since we don't generally test for accidentally disposing inputs, an assertion would guard against any change that resulted in meanInput not being reassigned when meanInputIsTransposed is true.
An alternative pattern, which I used here (https://github.com/tensorflow/tfjs/pull/4042/files#diff-e793ca80e577c3a0d5304d61e9cd957aR47) is to make an array of tensorInfos you want to dispose at the end and, inside the meanInputIsTransposed block, push the newly created tensorinfo into that array, then you can dispose it at the end. I found its also a handy pattern for dealing with multiple conditions under which tensorinfos might get disposed (not the situation you have here).
Done
great idea!
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
This PR treats mean as a reduction - the new MeanProgram works the same way as ReduceProgram with 'sum' - the only difference is that MeanProgram fuses div in the first pass. Is that what you mean by merge mean as an operation for ReduceProgram?
Done
I refactored this per our offline discussion - let me know if this is what you had in mind.
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 40 at r3 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
from a modularization perspective I was just wondering when this would be false? from what i can tell backend.runWebGLProgram will always make a new dataId/TensorInfo.
That's true, but previousResult is assigned to result before we call runWebGLProgram, so the first run through this loop previousResult equals x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)
tfjs-backend-webgl/src/kernels/Mean_impl.ts, line 27 at r2 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Done
I refactored this per our offline discussion - let me know if this is what you had in mind.
Nice, Thanks!
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
import {expectArraysClose, expectArraysEqual} from '../test_util'; describeWithFlags('mean', ALL_ENVS, () => {
can you add a test that covers the large reduction dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
can you add a test that covers the large reduction dimension?
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Done
I meant input as something like tf.ones([1, 70000])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
I meant input as something like tf.ones([1, 70000])
will it be too slow for cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
will it be too slow for cpu?
Done
I moved the test to the webgl backend only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Done
I moved the test to the webgl backend only.
cool, I did not see the webgl backend only test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @pyu10055, and @tafsiri)
tfjs-core/src/ops/mean_test.ts, line 22 at r4 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
cool, I did not see the webgl backend only test?
https://github.com/tensorflow/tfjs/pull/4033/files#diff-c6e4bfd0982ba56ef4fab7f9a7f10183
Changes
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is