Skip to content

Conversation

@tafsiri
Copy link
Contributor

@tafsiri tafsiri commented Jul 13, 2020

This splits out fused ops into their own files.

It doesn't make them modular because they use a pattern that doesn't fit in with how other kernels are implemented and will need some offline discussion. However I wanted to at least save state on moving the code around.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@tafsiri tafsiri marked this pull request as draft July 13, 2020 19:01
@tafsiri
Copy link
Contributor Author

tafsiri commented Jul 13, 2020

The underlying issue was that these kernels have an optional input tensor (bias) that is sometimes null. They also use a non-ideal way to pass tensors from forward to backward pass, they close over the reference rather than call save([]) as you can't pass null to save. This is also not compatible with new modular gradients, and in my opinion isn't something we should generally do (nullable parameters). In the backward pass an extra result would be added with bias is not null.

Talked with @annxingyuan offline and here are two options I see for this:

  1. Split the fused kernels into two versions, one that takes a bias and one that doesn't. The op can still take an optional bias argument, but it will use that to decide which kernel to call (which will have a fixed/non-variable interface).
  2. Do not support gradients for these kernels. My understanding from talking to Ann is that these fused op calls are generated by the converter for fast inference. We also only support gradients for fused ops using the linear activation (1 of 5 options). We also don't document these ops publicly so they shouldn't appear in arbitrary code that is used for training. If we go down this route we can still keep the optional input tensor in the kernel definition (though this is still exceptional compared to other kernels) as we don't need to worry about keeping a potentially nullable argument for the backward pass. If we don't really see a need for gradients for fused ops this is a simple approach that allows for more dynamic kernels in future.

Thoughts?

@tafsiri tafsiri changed the title split out fused ops into their own files Modularize fused ops Jul 17, 2020
@tafsiri
Copy link
Contributor Author

tafsiri commented Jul 17, 2020

Update: After discussion offline, I tried out option 2, but found out we do used the gradients for these in layers. So went back to the custom op approach and ultimately found an approach that worked (and that doesn't require actually changing customGrad or engine. The summary of the approach is to return different customOps depending on which params are passed in.

Notes for future consideration: Because CustomGradFunc expects a comma separate list of inputs and then a saveFunc at the end, optional parameters cannot be added (because the last param is non optional). Future refactor should consider separating input tensors into an array argument separate, but that changes the api of customOp.

@tafsiri
Copy link
Contributor Author

tafsiri commented Jul 17, 2020

One final important note, I had to add a dy.reshape inside the grad func for fused matMul due to the reshape it does at the end of the custom op.

But this is now ready for review @annxingyuan

@tafsiri tafsiri marked this pull request as ready for review July 17, 2020 17:18
@tafsiri tafsiri requested a review from lina128 July 17, 2020 17:18
Copy link
Contributor

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Yannick!!!!

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)


tfjs-backend-wasm/src/setup_test.ts, line 120 at r4 (raw file):

                           // supported.
      'gradient',          // Gradients not defined yet.
      'backProp input x=[2,3,3,1] f=[2,2,1,1] s=1 p=0',

could you include comment about why this test is excluded for consistency


tfjs-backend-wasm/src/kernels/_FusedMatMul.ts, line 31 at r4 (raw file):

function setup(backend: BackendWasm) {
  wasmFusedMatMul = backend.wasm.cwrap('_FusedMatMul', null /* void */, [

use symbol


tfjs-backend-wasm/src/kernels/FusedDepthwiseConv2D.ts, line 35 at r4 (raw file):

function setup(backend: BackendWasm) {
  wasmFusedDepthwiseConv2d =
      backend.wasm.cwrap('FusedDepthwiseConv2D', null /* void */, [

use symbol


tfjs-core/src/kernel_names.ts, line 800 at r4 (raw file):

}
// tslint:disable-next-line: class-name
export interface _FusedMatMulAttrs {

do these start with underscores because they're tfjs-only kernels? should we put them at the bottom of the file?


tfjs-core/src/ops/fused_ops.ts, line 23 at r4 (raw file):

import {Activation} from './fused_types';

export {Activation, conv2d, depthwiseConv2d, matMul};

should we still keep this file around? i feel like we've been just defining objects in src/ops.ts to create namespaces?


tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):

    return dy.mul(y.step());
  }
  throw new Error(

since shouldFuse already ensures that only backpropable activations are used in backward mode, do we need this check?


tfjs-core/src/ops/fused_conv2d.ts, line 94 at r4 (raw file):

 * @param preluActivationWeights Tensor of prelu weights to be applied as part
 *     of a `prelu` activation, typically the same shape as `x`.
 */

delete duplicate comment block


tfjs-core/src/ops/fused_conv2d.ts, line 194 at r4 (raw file):

  if ($x.rank === 3) {
    reshapedTo4D = true;
    x4D = $x.as4D(1, $x.shape[0], $x.shape[1], $x.shape[2]);

should this be moved into forwardFunc?


tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):

    util.assert(
        conv_util.tupleValuesAreOne(dilations),
        () => 'Error in gradient of fused conv2D: ' +

should we move this check into shouldFuse?


tfjs-core/src/ops/fused_depthwise_conv2d.ts, line 197 at r4 (raw file):

      return [xDer, filterDer, biasDer];
    } else {
      return [xDer, filterDer];

nit: return without else


tfjs-core/src/ops/fused_mat_mul.ts, line 153 at r4 (raw file):

    if (!transposeA && !transposeB) {
      aDer = dyActivation.matMul(b3D, false, true);

should we avoid chaining here?

Copy link
Contributor Author

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @lina128)


tfjs-backend-wasm/src/setup_test.ts, line 120 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

could you include comment about why this test is excluded for consistency

Done


tfjs-backend-wasm/src/kernels/_FusedMatMul.ts, line 31 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

use symbol

Done


tfjs-backend-wasm/src/kernels/FusedDepthwiseConv2D.ts, line 35 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

use symbol

Done


tfjs-core/src/kernel_names.ts, line 800 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

do these start with underscores because they're tfjs-only kernels? should we put them at the bottom of the file?

They are in the tfjs only kernels section. The underscore naming was already in place before this pr, so i just followed that. I don't really know why they start with underscores.


tfjs-core/src/ops/fused_ops.ts, line 23 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

should we still keep this file around? i feel like we've been just defining objects in src/ops.ts to create namespaces?

in this case it was handy because it shared some of the same names as those already imported in ops.ts


tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

since shouldFuse already ensures that only backpropable activations are used in backward mode, do we need this check?

I don't think it hurts, also it doesn't really check anything, it just throws if you reach the end of the function and haven't already returned. I think it gives you a better error message too.


tfjs-core/src/ops/fused_conv2d.ts, line 94 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

delete duplicate comment block

Done


tfjs-core/src/ops/fused_conv2d.ts, line 194 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

should this be moved into forwardFunc?

When modularizing this I pretty much tried to mirror kernel interface already present, in this case the kernels are written to only take 4d tensors and that is okay. It may be good to make them more flexible but isn't strictly necessary from a modularization point of view. If you want to take this opportunity to change that constraint that could be done in a followup PR.


tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

should we move this check into shouldFuse?

could you say more? Running the check while trying to run the gradient seems to make sense to me. If shouldFuse went away wouldn't this still remain?


tfjs-core/src/ops/fused_depthwise_conv2d.ts, line 197 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

nit: return without else

done


tfjs-core/src/ops/fused_mat_mul.ts, line 153 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

should we avoid chaining here?

Done here and below

Copy link
Contributor

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)


tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

I don't think it hurts, also it doesn't really check anything, it just throws if you reach the end of the function and haven't already returned. I think it gives you a better error message too.

but the logic of shouldFuse means that you won't ever reach this error. also the error message implies we might someday implement the gradient for fused activations other than relu / linear, but we actually can't implement them because we would need the intermediate outputs. i think the error message should be 'Cannot compute gradient for fused activation ${activation}.'


tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

could you say more? Running the check while trying to run the gradient seems to make sense to me. If shouldFuse went away wouldn't this still remain?

shouldFuse checks if the activation is backpropable - I was saying shouldFuse could also check if the dilation is backpropable? so shouldFuse would return false if dilation is greater than 1.

Copy link
Contributor Author

@tafsiri tafsiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @lina128)


tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

but the logic of shouldFuse means that you won't ever reach this error. also the error message implies we might someday implement the gradient for fused activations other than relu / linear, but we actually can't implement them because we would need the intermediate outputs. i think the error message should be 'Cannot compute gradient for fused activation ${activation}.'

Done


tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

shouldFuse checks if the activation is backpropable - I was saying shouldFuse could also check if the dilation is backpropable? so shouldFuse would return false if dilation is greater than 1.

Since shoudFuse is used in a number of different ops (including fusedMatMul that doesn't have dilations), it seems that a check like that should be in the op itself and called in addition to should fuse. Since changing this is somewhat orthogonal to this modularization i'm just going to leave it as is for now and we can do that as an enhancement.

@tafsiri tafsiri merged commit f8956ae into master Jul 23, 2020
@tafsiri tafsiri deleted the mod-fused branch July 23, 2020 01:08
lina128 pushed a commit to lina128/tfjs that referenced this pull request Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants