-
Notifications
You must be signed in to change notification settings - Fork 2k
[webg] Fix memleak in modularized reduce. #3872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@annxingyuan before reviewing this in detail would you be able to outline in the PR description where the mem leak is in the existing code and any thoughts on why the mem leak checker doesn't catch this? |
|
@tafsiri yes - I added some notes in the description - let me know if anything is unclear! |
tafsiri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description and fix! Left a few suggested changes. But LGTM! Thanks for digging into this.
Reviewed 5 of 6 files at r1, 2 of 2 files at r2.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)
tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 25 at r2 (raw file):
type ReduceTypes = 'all'|'any'|'max'|'min'|'sum'|'prod'; function getReductionSizes(inShape: number[]):
Would you be able to add a comment for this function? It would add some flavour to the notion of multiple reductions being needed to get the overall reduction done effectively.
tfjs-backend-webgl/src/kernels/Max_test.ts, line 23 at r2 (raw file):
describeWithFlags('Max', ALL_ENVS, () => { it('does not have memory leak.', async () => {
Could you name this something like "does not have memory leak when calling reduce multiple times" to help future us remember why this test is here in addition to the general test in core.
tfjs-core/src/ops/max_test.ts, line 36 at r2 (raw file):
it('with a large dimension', async () => { const aData = new Float32Array(100);
would making this a bit bigger (say 100x100) (e.g. tf.ones([100, 100])) be better for catching problems in implementations that may have larger effective window sizes
annxingyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)
tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 25 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
Would you be able to add a comment for this function? It would add some flavour to the notion of multiple reductions being needed to get the overall reduction done effectively.
Done
I renamed the function - hopefully now it's more clear what the purpose is.
tfjs-backend-webgl/src/kernels/Max_test.ts, line 23 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
Could you name this something like "does not have memory leak when calling reduce multiple times" to help future us remember why this test is here in addition to the general test in core.
Done
tfjs-core/src/ops/max_test.ts, line 36 at r2 (raw file):
Previously, tafsiri (Yannick Assogba) wrote…
would making this a bit bigger (say 100x100) (e.g. tf.ones([100, 100])) be better for catching problems in implementations that may have larger effective window sizes
Done
lina128
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)
tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):
} return reduce(output, dtype, reductionType, backend);
Hi Ann, I wonder instead of changing to iterative approach, can we do this in recursive approach, something like:
const result = reduce(output, dtype, reductionType, backend);
backend.disposeData(output.dataId);
return result;
annxingyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)
tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):
Previously, lina128 (Na Li) wrote…
Hi Ann, I wonder instead of changing to iterative approach, can we do this in recursive approach, something like:
const result = reduce(output, dtype, reductionType, backend); backend.disposeData(output.dataId); return result;
Hi Na, I think that could result in disposing the input tensor? Also the iterative approach aligns with other kernels such as cumsum.
annxingyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)
tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Hi Na, I think that could result in disposing the input tensor? Also the iterative approach aligns with other kernels such as cumsum.
Discussed offline - this approach does not dispose the input tensor, but the iterative approach aligns with other cumulation kernels in WebGL.
This fixes #3869.
The leak happens because we call reduce recursively without cleaning up intermediate outputs. This PR changes max to call reduce iteratively and cleans up intermediate outputs along the way.
The tests I added actually fail in master because of our check in engine for leaked dataIds. We didn't happen to have a max test with a large enough dimension to trigger multiple reductions.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is