[webg] Fix memleak in modularized reduce. #3872

annxingyuan · 2020-09-02T14:38:16Z

This fixes #3869.

The leak happens because we call reduce recursively without cleaning up intermediate outputs. This PR changes max to call reduce iteratively and cleans up intermediate outputs along the way.

The tests I added actually fail in master because of our check in engine for leaked dataIds. We didn't happen to have a max test with a large enough dimension to trigger multiple reductions.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

tafsiri · 2020-09-03T00:32:35Z

@annxingyuan before reviewing this in detail would you be able to outline in the PR description where the mem leak is in the existing code and any thoughts on why the mem leak checker doesn't catch this?

annxingyuan · 2020-09-03T12:59:34Z

@tafsiri yes - I added some notes in the description - let me know if anything is unclear!

tafsiri

Great description and fix! Left a few suggested changes. But LGTM! Thanks for digging into this.

Reviewed 5 of 6 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 25 at r2 (raw file):

type ReduceTypes = 'all'|'any'|'max'|'min'|'sum'|'prod';

function getReductionSizes(inShape: number[]):

Would you be able to add a comment for this function? It would add some flavour to the notion of multiple reductions being needed to get the overall reduction done effectively.

tfjs-backend-webgl/src/kernels/Max_test.ts, line 23 at r2 (raw file):

describeWithFlags('Max', ALL_ENVS, () => {
  it('does not have memory leak.', async () => {

Could you name this something like "does not have memory leak when calling reduce multiple times" to help future us remember why this test is here in addition to the general test in core.

tfjs-core/src/ops/max_test.ts, line 36 at r2 (raw file):

  it('with a large dimension', async () => {
    const aData = new Float32Array(100);

would making this a bit bigger (say 100x100) (e.g. tf.ones([100, 100])) be better for catching problems in implementations that may have larger effective window sizes

annxingyuan

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)

tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 25 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Would you be able to add a comment for this function? It would add some flavour to the notion of multiple reductions being needed to get the overall reduction done effectively.

Done

I renamed the function - hopefully now it's more clear what the purpose is.

tfjs-backend-webgl/src/kernels/Max_test.ts, line 23 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

Could you name this something like "does not have memory leak when calling reduce multiple times" to help future us remember why this test is here in addition to the general test in core.

Done

tfjs-core/src/ops/max_test.ts, line 36 at r2 (raw file):

Previously, tafsiri (Yannick Assogba) wrote…

would making this a bit bigger (say 100x100) (e.g. tf.ones([100, 100])) be better for catching problems in implementations that may have larger effective window sizes

Done

lina128

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @tafsiri)

tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):

  }

  return reduce(output, dtype, reductionType, backend);

Hi Ann, I wonder instead of changing to iterative approach, can we do this in recursive approach, something like:

const result = reduce(output, dtype, reductionType, backend);
backend.disposeData(output.dataId);
return result;

annxingyuan

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)

tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):

Previously, lina128 (Na Li) wrote…

Hi Ann, I wonder instead of changing to iterative approach, can we do this in recursive approach, something like:
const result = reduce(output, dtype, reductionType, backend);
backend.disposeData(output.dataId);
return result;

Hi Na, I think that could result in disposing the input tensor? Also the iterative approach aligns with other kernels such as cumsum.

annxingyuan

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @lina128 and @tafsiri)

tfjs-backend-webgl/src/kernel_utils/reduce.ts, line 38 at r3 (raw file):

Previously, annxingyuan (Ann Yuan) wrote…

Hi Na, I think that could result in disposing the input tensor? Also the iterative approach aligns with other kernels such as cumsum.

Discussed offline - this approach does not dispose the input tensor, but the iterative approach aligns with other cumulation kernels in WebGL.

annxingyuan added 5 commits September 2, 2020 09:00

test

8f0a4f1

fix

a5b4ec0

fix

272a631

remove fit

da6ccc9

fix

ed6a596

googlebot added the cla: yes label Sep 2, 2020

annxingyuan added 2 commits September 2, 2020 10:55

add test

9a04bf4

save

eb807e3

annxingyuan requested review from lina128 and tafsiri September 2, 2020 15:35

annxingyuan self-assigned this Sep 2, 2020

annxingyuan added 2 commits September 3, 2020 08:57

fix

83c5ae3

Merge branch 'master' into memleak

cd48f7e

tafsiri approved these changes Sep 3, 2020

View reviewed changes

pr comments

f1b095b

annxingyuan commented Sep 3, 2020

View reviewed changes

Merge branch 'master' into memleak

01500ab

lina128 reviewed Sep 3, 2020

View reviewed changes

annxingyuan commented Sep 3, 2020

View reviewed changes

annxingyuan added 5 commits September 3, 2020 15:04

fix

19a194b

Merge branch 'master' into memleak

f7f6ae6

revert

62541cd

fix

a4f6970

fix

98d31f0

annxingyuan merged commit 6d39fab into master Sep 3, 2020

annxingyuan deleted the memleak branch September 3, 2020 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webg] Fix memleak in modularized reduce. #3872

[webg] Fix memleak in modularized reduce. #3872

Uh oh!

annxingyuan commented Sep 2, 2020 •

edited

Loading

Uh oh!

tafsiri commented Sep 3, 2020

Uh oh!

annxingyuan commented Sep 3, 2020

Uh oh!

tafsiri left a comment

Uh oh!

annxingyuan left a comment

Uh oh!

lina128 left a comment

Uh oh!

annxingyuan left a comment

Uh oh!

annxingyuan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[webg] Fix memleak in modularized reduce. #3872

[webg] Fix memleak in modularized reduce. #3872

Uh oh!

Conversation

annxingyuan commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tafsiri commented Sep 3, 2020

Uh oh!

annxingyuan commented Sep 3, 2020

Uh oh!

tafsiri left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

lina128 left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

annxingyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

annxingyuan commented Sep 2, 2020 •

edited

Loading