Skip to content

SavedModelBundle leak fix #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 11, 2021
Merged

Conversation

Craigacp
Copy link
Collaborator

@Craigacp Craigacp commented Jun 11, 2021

This PR adds the deallocators to the TF_Graph and TF_Session that live inside a SavedModelBundle as previously they didn't free their memory on SavedModelBundle.close().

We may want to backport this to the 0.3.x branch. I think the main conflict there is that there are formatting changes in master not present in r0.3 but we could just write a similar patch, it's only 5 lines.

cc @saudet

@Craigacp
Copy link
Collaborator Author

Turns out Graph and Session use different idioms for closing their respective native object, and this difference broke the first version of this fix. We should probably make them use the same idiom, either both using JavaCPP's deallocators, or both directly calling the relevant TF C API functions on close. The split is confusing.

@karllessard karllessard merged commit 031a0c1 into tensorflow:master Jun 11, 2021
@Craigacp Craigacp deleted the saved-model-leak branch June 11, 2021 19:19
@@ -510,21 +508,18 @@ private static SavedModelBundle load(
TF_Graph graph = TF_NewGraph();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should call TF_Graph.newGraph() here, but I'm not sure...

Copy link
Collaborator Author

@Craigacp Craigacp Jun 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that initially, but Graph.close() doesn't use handle.releaseReference() like Session.close() does, and so it doesn't call the TF_Graph deallocator. I think you're the last person to touch both Graph.delete and Session.delete and I wasn't sure if they were different for a reason.

As I said in the comment above, I think they should be made consistent, but that's going to be a bigger change than just the one which fixes this leak. For example if we switch Graph over to use handle.releaseReference() and then use TF_Graph.newGraph everywhere it'll mean that the TF_Graph inside Graph is now sensitive to enclosing PointerScopes which it currently isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that the TF_Graph becomes managed by the TF_Session and we shouldn't touch it anymore after that? It sounds like that from the documentation of the C API. In that case I'd put the call to TF_NewGraph() in TF_Session.loadSessionFromSavedModel() and forget about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think the graph is separate from the session, or at least that's my reading of the doc here - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api.h#L1229. The TF_NewSession doc above it suggests the TF_Graph is kept alive by the TF_Session, but it doesn't say that the session owns the graph, so I think we're still responsible for closing it.

@killeent-stripe
Copy link

@Craigacp Would definitely a +1 to say that backporting this fix to a 0.3.x release would be much appreciated!

Craigacp added a commit to Craigacp/tensorflow-java that referenced this pull request Jul 7, 2021
karllessard pushed a commit that referenced this pull request Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants