Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix graph executor to not dispose output tensors #7505

Merged
merged 4 commits into from
Mar 22, 2023

Conversation

chunnienc
Copy link
Collaborator

@chunnienc chunnienc commented Mar 21, 2023

Fix #7504
Added a new test case for this and verified that the old executor fails the test.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@chunnienc chunnienc marked this pull request as ready for review March 21, 2023 18:02
tfjs-converter/src/executor/graph_executor.ts Outdated Show resolved Hide resolved
}
if (liveUntilNodes == null) {

if (isControlFlow(node) || liveUntilNodes == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you moved the check for node disposability to check the liveUntilNodes (this makes sense since those are the ones we're disposing). Do we need this isControlFlow(node) check? That means we'll never dispose any nodes that live until a control flow node if I'm not mistaken.

Is this to make sure we always have a control flow node's inputs available (e.g. in the case of a while)?

Copy link
Collaborator Author

@chunnienc chunnienc Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to make sure we always have a control flow node's inputs available (e.g. in the case of a while)?

Yes. That is the behavior of the old algorithm. The branch here checks whether the current node allows us to dispose the input of it, while the checks in the loop are for whether we should dispose the input itself. See https://github.com/tensorflow/tfjs/pull/7505/files#diff-44e0a825cd7c6f31c03d9333db5dc21a8937d66a5b28d719c699863eef96ad8dR329

But the behavior is changed. The old function does not allow disposing inputs of an output node no matter it's control flow or not, while this PR allows it. I don't see any current or future problems can have with it.

Do we need this isControlFlow(node) check?

No, bc a graph with control flow should not be sync. But I still keep it in case the upstream graph builder has some special cases.

Co-authored-by: Matthew Soulanille <matthew@soulanille.net>
Copy link
Member

@mattsoulanille mattsoulanille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for fixing this bug, given this is not caught by our e2e model tests, can we add an e2e test for mobilenet graph model conversion to prevent future breakage.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @chunnienc and @mattsoulanille)

@chunnienc chunnienc merged commit 18eab05 into tensorflow:master Mar 22, 2023
@chunnienc chunnienc deleted the fix-output-disposal branch March 22, 2023 17:05
@chunnienc
Copy link
Collaborator Author

chunnienc commented Mar 22, 2023

thank you for fixing this bug, given this is not caught by our e2e model tests, can we add an e2e test for mobilenet graph model conversion to prevent future breakage.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @chunnienc and @mattsoulanille)

Sure. I'm looking into some ways to add golden model tests. Actually our existing standard models are not sufficient to catch this bug either. I did manual tests on our model set with local benchmark before our release, and all of them worked. This bug only occurs when the output nodes are not leaves or parents of outputs in the graph. So we may need to run golden model e2e tests + random inspect intermediate tensors.

@vladmandic
Copy link
Contributor

vladmandic commented Mar 22, 2023

This bug only occurs when the output nodes are not leaves or parents of leaves in the graph

Btw, isn't that somewhat typical for feature vectors?

@chunnienc
Copy link
Collaborator Author

Btw, isn't that somewhat typical for feature vectors?

Yes. But it turns out that all our (converted) models for testing and benchmarking have an additional "Identity" op node after each model output node, which makes them always leaves. It's something we need to fix on our side to make the tests more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regression: tensor is disposed early
4 participants