-
Notifications
You must be signed in to change notification settings - Fork 2k
Modularize fused ops #3597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularize fused ops #3597
Conversation
|
The underlying issue was that these kernels have an optional input tensor (bias) that is sometimes null. They also use a non-ideal way to pass tensors from forward to backward pass, they close over the reference rather than call save([]) as you can't pass null to save. This is also not compatible with new modular gradients, and in my opinion isn't something we should generally do (nullable parameters). In the backward pass an extra result would be added with bias is not null. Talked with @annxingyuan offline and here are two options I see for this:
Thoughts? |
|
Update: After discussion offline, I tried out option 2, but found out we do used the gradients for these in layers. So went back to the custom op approach and ultimately found an approach that worked (and that doesn't require actually changing customGrad or engine. The summary of the approach is to return different customOps depending on which params are passed in. Notes for future consideration: Because CustomGradFunc expects a comma separate list of inputs and then a saveFunc at the end, optional parameters cannot be added (because the last param is non optional). Future refactor should consider separating input tensors into an array argument separate, but that changes the api of customOp. |
|
One final important note, I had to add a dy.reshape inside the grad func for fused matMul due to the reshape it does at the end of the custom op. But this is now ready for review @annxingyuan |
annxingyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Yannick!!!!
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)
tfjs-backend-wasm/src/setup_test.ts, line 120 at r4 (raw file):
// supported. 'gradient', // Gradients not defined yet. 'backProp input x=[2,3,3,1] f=[2,2,1,1] s=1 p=0',
could you include comment about why this test is excluded for consistency
tfjs-backend-wasm/src/kernels/_FusedMatMul.ts, line 31 at r4 (raw file):
function setup(backend: BackendWasm) { wasmFusedMatMul = backend.wasm.cwrap('_FusedMatMul', null /* void */, [
use symbol
tfjs-backend-wasm/src/kernels/FusedDepthwiseConv2D.ts, line 35 at r4 (raw file):
function setup(backend: BackendWasm) { wasmFusedDepthwiseConv2d = backend.wasm.cwrap('FusedDepthwiseConv2D', null /* void */, [
use symbol
tfjs-core/src/kernel_names.ts, line 800 at r4 (raw file):
} // tslint:disable-next-line: class-name export interface _FusedMatMulAttrs {
do these start with underscores because they're tfjs-only kernels? should we put them at the bottom of the file?
tfjs-core/src/ops/fused_ops.ts, line 23 at r4 (raw file):
import {Activation} from './fused_types'; export {Activation, conv2d, depthwiseConv2d, matMul};
should we still keep this file around? i feel like we've been just defining objects in src/ops.ts to create namespaces?
tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):
return dy.mul(y.step()); } throw new Error(
since shouldFuse already ensures that only backpropable activations are used in backward mode, do we need this check?
tfjs-core/src/ops/fused_conv2d.ts, line 94 at r4 (raw file):
* @param preluActivationWeights Tensor of prelu weights to be applied as part * of a `prelu` activation, typically the same shape as `x`. */
delete duplicate comment block
tfjs-core/src/ops/fused_conv2d.ts, line 194 at r4 (raw file):
if ($x.rank === 3) { reshapedTo4D = true; x4D = $x.as4D(1, $x.shape[0], $x.shape[1], $x.shape[2]);
should this be moved into forwardFunc?
tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):
util.assert( conv_util.tupleValuesAreOne(dilations), () => 'Error in gradient of fused conv2D: ' +
should we move this check into shouldFuse?
tfjs-core/src/ops/fused_depthwise_conv2d.ts, line 197 at r4 (raw file):
return [xDer, filterDer, biasDer]; } else { return [xDer, filterDer];
nit: return without else
tfjs-core/src/ops/fused_mat_mul.ts, line 153 at r4 (raw file):
if (!transposeA && !transposeB) { aDer = dyActivation.matMul(b3D, false, true);
should we avoid chaining here?
tafsiri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @lina128)
tfjs-backend-wasm/src/setup_test.ts, line 120 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
could you include comment about why this test is excluded for consistency
Done
tfjs-backend-wasm/src/kernels/_FusedMatMul.ts, line 31 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
use symbol
Done
tfjs-backend-wasm/src/kernels/FusedDepthwiseConv2D.ts, line 35 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
use symbol
Done
tfjs-core/src/kernel_names.ts, line 800 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
do these start with underscores because they're tfjs-only kernels? should we put them at the bottom of the file?
They are in the tfjs only kernels section. The underscore naming was already in place before this pr, so i just followed that. I don't really know why they start with underscores.
tfjs-core/src/ops/fused_ops.ts, line 23 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
should we still keep this file around? i feel like we've been just defining objects in src/ops.ts to create namespaces?
in this case it was handy because it shared some of the same names as those already imported in ops.ts
tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
since
shouldFusealready ensures that only backpropable activations are used in backward mode, do we need this check?
I don't think it hurts, also it doesn't really check anything, it just throws if you reach the end of the function and haven't already returned. I think it gives you a better error message too.
tfjs-core/src/ops/fused_conv2d.ts, line 94 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
delete duplicate comment block
Done
tfjs-core/src/ops/fused_conv2d.ts, line 194 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
should this be moved into forwardFunc?
When modularizing this I pretty much tried to mirror kernel interface already present, in this case the kernels are written to only take 4d tensors and that is okay. It may be good to make them more flexible but isn't strictly necessary from a modularization point of view. If you want to take this opportunity to change that constraint that could be done in a followup PR.
tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
should we move this check into
shouldFuse?
could you say more? Running the check while trying to run the gradient seems to make sense to me. If shouldFuse went away wouldn't this still remain?
tfjs-core/src/ops/fused_depthwise_conv2d.ts, line 197 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
nit: return without else
done
tfjs-core/src/ops/fused_mat_mul.ts, line 153 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
should we avoid chaining here?
Done here and below
annxingyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)
tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
I don't think it hurts, also it doesn't really check anything, it just throws if you reach the end of the function and haven't already returned. I think it gives you a better error message too.
but the logic of shouldFuse means that you won't ever reach this error. also the error message implies we might someday implement the gradient for fused activations other than relu / linear, but we actually can't implement them because we would need the intermediate outputs. i think the error message should be 'Cannot compute gradient for fused activation ${activation}.'
tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
could you say more? Running the check while trying to run the gradient seems to make sense to me. If shouldFuse went away wouldn't this still remain?
shouldFuse checks if the activation is backpropable - I was saying shouldFuse could also check if the dilation is backpropable? so shouldFuse would return false if dilation is greater than 1.
tafsiri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @lina128)
tfjs-core/src/ops/fused_util.ts, line 36 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
but the logic of shouldFuse means that you won't ever reach this error. also the error message implies we might someday implement the gradient for fused activations other than relu / linear, but we actually can't implement them because we would need the intermediate outputs. i think the error message should be 'Cannot compute gradient for fused activation ${activation}.'
Done
tfjs-core/src/ops/fused_conv2d.ts, line 249 at r4 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
shouldFuse checks if the activation is backpropable - I was saying shouldFuse could also check if the dilation is backpropable? so shouldFuse would return false if dilation is greater than 1.
Since shoudFuse is used in a number of different ops (including fusedMatMul that doesn't have dilations), it seems that a check like that should be in the op itself and called in addition to should fuse. Since changing this is somewhat orthogonal to this modularization i'm just going to leave it as is for now and we can do that as an enhancement.
This splits out fused ops into their own files.
It doesn't make them modular because they use a pattern that doesn't fit in with how other kernels are implemented and will need some offline discussion. However I wanted to at least save state on moving the code around.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is