-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix graph executor to not dispose output tensors #7505
Conversation
} | ||
if (liveUntilNodes == null) { | ||
|
||
if (isControlFlow(node) || liveUntilNodes == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you moved the check for node disposability to check the liveUntilNodes (this makes sense since those are the ones we're disposing). Do we need this isControlFlow(node)
check? That means we'll never dispose any nodes that live until a control flow node if I'm not mistaken.
Is this to make sure we always have a control flow node's inputs available (e.g. in the case of a while)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to make sure we always have a control flow node's inputs available (e.g. in the case of a while)?
Yes. That is the behavior of the old algorithm. The branch here checks whether the current node allows us to dispose the input of it, while the checks in the loop are for whether we should dispose the input itself. See https://github.com/tensorflow/tfjs/pull/7505/files#diff-44e0a825cd7c6f31c03d9333db5dc21a8937d66a5b28d719c699863eef96ad8dR329
But the behavior is changed. The old function does not allow disposing inputs of an output node no matter it's control flow or not, while this PR allows it. I don't see any current or future problems can have with it.
Do we need this isControlFlow(node) check?
No, bc a graph with control flow should not be sync. But I still keep it in case the upstream graph builder has some special cases.
Co-authored-by: Matthew Soulanille <matthew@soulanille.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for fixing this bug, given this is not caught by our e2e model tests, can we add an e2e test for mobilenet graph model conversion to prevent future breakage.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @chunnienc and @mattsoulanille)
Sure. I'm looking into some ways to add golden model tests. Actually our existing standard models are not sufficient to catch this bug either. I did manual tests on our model set with local benchmark before our release, and all of them worked. This bug only occurs when the output nodes are not leaves or parents of outputs in the graph. So we may need to run golden model e2e tests + random inspect intermediate tensors. |
Btw, isn't that somewhat typical for feature vectors? |
Yes. But it turns out that all our (converted) models for testing and benchmarking have an additional "Identity" op node after each model output node, which makes them always leaves. It's something we need to fix on our side to make the tests more robust. |
Fix #7504
Added a new test case for this and verified that the old executor fails the test.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is