added support prod reduce op #1279
Conversation
src/kernels/webgl/reduce_gpu.ts
Outdated
@@ -59,6 +64,8 @@ export class ReduceProgram implements GPGPUProgram { | |||
let updateSnippet = ` | |||
if (${reduceType === 'sum'}) { | |||
sumValue += dot(values, ones); | |||
} else if (${reduceType === 'prod'}) { | |||
prodValue *= values[0] * values[1] * values[2] * values[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I happened to trace the code here recently. since the updateSnippet
would be part of the program code, directly performing the necessary calculation without if/else condition may improve the performance, especially updateSnippet
is inside the for loop.
so the GPGPUProgram would look like this:
for (int i = 0; i < 0; i += 4) {
int inIdx = inOffset + i;
vec4 values = vec4(
getValue(batch, inIdx),
getValue(batch, inIdx + 1),
getValue(batch, inIdx + 2),
getValue(batch, inIdx + 3)
);
prodValue *= values[0] * values[1] * values[2] * values[3]; // only the calculation but no if/else
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhwang thanks for the suggestion, curious, have you done any measure on this improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. One blocking comment about the gradient of prod()
Reviewed 7 of 7 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang, @pyu10055, @nsthorat, and @dsmilkov)
src/kernels/backend_cpu.ts, line 501 at r1 (raw file):
this.assertNotComplex(x, 'sum'); axis_util.assertAxesAreInnerMostDims('prod', axes, x.rank);
backends assume the data is validated from above, so you can remove this asserts (I know other methods have them (historical reasons) too and we can remove those too).
src/kernels/backend_webgl.ts, line 820 at r1 (raw file):
prod(x: Tensor, axes: number[]): Tensor { axis_util.assertAxesAreInnerMostDims('prod', axes, x.rank);
same here, no need for this assert since validation happens in the layer above.
src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
@yhwang thanks for the suggestion, curious, have you done any measure on this improvement?
Since the condition inside the if statement is constant, the webgl compiler will effectively prune the program so there should be no perf impact on the if statement.
src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):
sumValue += dot(values, ones); } else if (${reduceType === 'prod'}) { prodValue *= values[0] * values[1] * values[2] * values[3];
you might save 1 cycle if you do:
vec2 tmp = vec2(values[0], values[1]) * vec2(values[2], values[3])
prodValue *= tmp[0] * tmp[1]
src/ops/reduction_ops.ts, line 151 at r1 (raw file):
* is true, the rank of the `Tensor` is reduced by 1 for each entry in `axes`. * If `keepDims` is true, the reduced dimensions are retained with length 1. * If axes has no entries, all dimensions are reduced, and a `Tensor` with a
wrap axes in quotes
src/ops/reduction_ops.ts, line 183 at r1 (raw file):
const axes = axis_util.parseAxisParam(axis, $x.shape); // Use a custom gradient to bypass 2 gradient backprops since sum is used
prod is not used as often, so no need to do custom gradient.
src/ops/reduction_ops.ts, line 206 at r1 (raw file):
}); const expandedDy = dy.reshape(expandedDyShape); const derX = expandedDy.mul(ones(x.shape, 'float32'));
I may be misunderstanding something, but the gradient of prod is a bit more work than this - you have to divide the product by the input tensor. See https://github.com/tensorflow/tensorflow/blob/a0e76ce73c5f095fc61e06c19ff8e653cfd2965c/tensorflow/python/ops/math_grad.py#L137 for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang, @pyu10055, @nsthorat, and @dsmilkov)
src/ops/reduction_ops_test.ts, line 920 at r1 (raw file):
it('gradients: prod(2d)', () => { const a = tf.tensor2d([1, 2, 3, 1, 0, 1], [3, 2]);
make sure not to have 0 when computing gradients since that nulls the gradient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @nsthorat)
src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Since the condition inside the if statement is constant, the webgl compiler will effectively prune the program so there should be no perf impact on the if statement.
ah, I see. in that case, the compiler will optimize it out
… op, since it requires the cumprod op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @pyu10055, and @nsthorat)
src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):
Previously, yhwang (Yihong Wang) wrote…
ah, I see. in that case, the compiler will optimize it out
Done.
src/ops/reduction_ops.ts, line 183 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
prod is not used as often, so no need to do custom gradient.
removed the custom op.
src/ops/reduction_ops.ts, line 206 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
I may be misunderstanding something, but the gradient of prod is a bit more work than this - you have to divide the product by the input tensor. See https://github.com/tensorflow/tensorflow/blob/a0e76ce73c5f095fc61e06c19ff8e653cfd2965c/tensorflow/python/ops/math_grad.py#L137 for reference.
gradient for prod requires a cumprod op, which we currently don't have, will summit this op without the gradient func first.
src/ops/reduction_ops_test.ts, line 920 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
make sure not to have 0 when computing gradients since that nulls the gradient.
removed the gradient func for this op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang and @nsthorat)
Description
implement python reduce_prod op
ref tensorflow/tfjs#692
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is