Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

added support prod reduce op #1279

Merged
merged 4 commits into from Oct 1, 2018
Merged

added support prod reduce op #1279

merged 4 commits into from Oct 1, 2018

Conversation

pyu10055
Copy link
Collaborator

@pyu10055 pyu10055 commented Sep 17, 2018

Description

implement python reduce_prod op
ref tensorflow/tfjs#692


For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

@@ -59,6 +64,8 @@ export class ReduceProgram implements GPGPUProgram {
let updateSnippet = `
if (${reduceType === 'sum'}) {
sumValue += dot(values, ones);
} else if (${reduceType === 'prod'}) {
prodValue *= values[0] * values[1] * values[2] * values[3];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to trace the code here recently. since the updateSnippet would be part of the program code, directly performing the necessary calculation without if/else condition may improve the performance, especially updateSnippet is inside the for loop.

so the GPGPUProgram would look like this:

        for (int i = 0; i < 0; i += 4) {                                        
          int inIdx = inOffset + i;                                             
          vec4 values = vec4(                                                   
            getValue(batch, inIdx),                                             
            getValue(batch, inIdx + 1),                                         
            getValue(batch, inIdx + 2),                                         
            getValue(batch, inIdx + 3)                                          
          );                                                                    
                                                                                
          prodValue *= values[0] * values[1] * values[2] * values[3];  // only the calculation but no if/else         
        } 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhwang thanks for the suggestion, curious, have you done any measure on this improvement?

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. One blocking comment about the gradient of prod()

Reviewed 7 of 7 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang, @pyu10055, @nsthorat, and @dsmilkov)


src/kernels/backend_cpu.ts, line 501 at r1 (raw file):

    this.assertNotComplex(x, 'sum');

    axis_util.assertAxesAreInnerMostDims('prod', axes, x.rank);

backends assume the data is validated from above, so you can remove this asserts (I know other methods have them (historical reasons) too and we can remove those too).


src/kernels/backend_webgl.ts, line 820 at r1 (raw file):

  prod(x: Tensor, axes: number[]): Tensor {
    axis_util.assertAxesAreInnerMostDims('prod', axes, x.rank);

same here, no need for this assert since validation happens in the layer above.


src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

@yhwang thanks for the suggestion, curious, have you done any measure on this improvement?

Since the condition inside the if statement is constant, the webgl compiler will effectively prune the program so there should be no perf impact on the if statement.


src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):

        sumValue += dot(values, ones);
      } else if (${reduceType === 'prod'}) {
        prodValue *= values[0] * values[1] * values[2] * values[3];

you might save 1 cycle if you do:

vec2 tmp = vec2(values[0], values[1]) * vec2(values[2], values[3])
prodValue *= tmp[0] * tmp[1]

src/ops/reduction_ops.ts, line 151 at r1 (raw file):

 * is true, the rank of the `Tensor` is reduced by 1 for each entry in `axes`.
 * If `keepDims` is true, the reduced dimensions are retained with length 1.
 * If axes has no entries, all dimensions are reduced, and a `Tensor` with a

wrap axes in quotes


src/ops/reduction_ops.ts, line 183 at r1 (raw file):

  const axes = axis_util.parseAxisParam(axis, $x.shape);

  // Use a custom gradient to bypass 2 gradient backprops since sum is used

prod is not used as often, so no need to do custom gradient.


src/ops/reduction_ops.ts, line 206 at r1 (raw file):

      });
      const expandedDy = dy.reshape(expandedDyShape);
      const derX = expandedDy.mul(ones(x.shape, 'float32'));

I may be misunderstanding something, but the gradient of prod is a bit more work than this - you have to divide the product by the input tensor. See https://github.com/tensorflow/tensorflow/blob/a0e76ce73c5f095fc61e06c19ff8e653cfd2965c/tensorflow/python/ops/math_grad.py#L137 for reference.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang, @pyu10055, @nsthorat, and @dsmilkov)


src/ops/reduction_ops_test.ts, line 920 at r1 (raw file):

  it('gradients: prod(2d)', () => {
    const a = tf.tensor2d([1, 2, 3, 1, 0, 1], [3, 2]);

make sure not to have 0 when computing gradients since that nulls the gradient.

Copy link
Contributor

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055 and @nsthorat)


src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Since the condition inside the if statement is constant, the webgl compiler will effectively prune the program so there should be no perf impact on the if statement.

ah, I see. in that case, the compiler will optimize it out

Copy link
Collaborator Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @pyu10055, and @nsthorat)


src/kernels/webgl/reduce_gpu.ts, line 68 at r1 (raw file):

Previously, yhwang (Yihong Wang) wrote…

ah, I see. in that case, the compiler will optimize it out

Done.


src/ops/reduction_ops.ts, line 183 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

prod is not used as often, so no need to do custom gradient.

removed the custom op.


src/ops/reduction_ops.ts, line 206 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

I may be misunderstanding something, but the gradient of prod is a bit more work than this - you have to divide the product by the input tensor. See https://github.com/tensorflow/tensorflow/blob/a0e76ce73c5f095fc61e06c19ff8e653cfd2965c/tensorflow/python/ops/math_grad.py#L137 for reference.

gradient for prod requires a cumprod op, which we currently don't have, will summit this op without the gradient func first.


src/ops/reduction_ops_test.ts, line 920 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

make sure not to have 0 when computing gradients since that nulls the gradient.

removed the gradient func for this op.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @yhwang and @nsthorat)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants