New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when repeatedly loading and deleting keras models #40171
Comments
Was able to reproduce the issue with TF v2.2 and TF-nightly. Please find the attached gist. Thanks! |
verified that the same behavior is present in |
Hello,
I have uploaded my experiments here. I have 3 models having roughly the same number of parameters but spread across different layers. The memory leak seems to be the least when there is only 1 hidden layer. There is almost a 10x memory leak increase when I increase the layers from 1 hidden layer to 15 hidden layers. All while keeping the # of parameters/variables (roughly) fixed. With just one layer I observed a memory usage increase of 210KB per load cycle v/s 1.7MB per load cycle with multiple layers/ Maybe there is something in the way Keras instantiates layers that is causing this issue? cc: @frreiss |
Preliminary results from profiling the example program @kmh4321 provided with
|
Here's the output of |
I've identified the cause of this leak. While loading functions from the SavedModel file,
See source file here. These lines create a second copy of the body of each function and stick that second copy into a dictionary hanging off a This problem appears to have been patched on the master branch four months ago; see this commit. It would be a really good idea to backport this fix to the 2.2 branch. I'm not sure why it wasn't backported in the first place. TensorFlow 2.2.0 came out almost two months after this fix was pushed into the master branch. It would also be a good idea to backport this fix to the 2.1 branch. |
I've identified a workaround that cuts this memory leakage by about 80%: Load the model from a temporary background thread. Here's some code to copy and paste: def load_model_hack(saved_model_path: str):
"""
Load a SavedModel from a background thread so that most of the garbage
that saved_model.load() leaves on the calling thread's environment will
be collected. Still leaks memory, but at a lower rate.
"""
import threading
from typing import Any, Dict
def callback(path: str, result_holder: Dict[str, Any]) -> None:
try:
result_holder["model"] = tf.saved_model.load(path)
except Exception as e:
result_holder["error"] = e
# Call saved_model.load() in a background thread
thread_results = {}
t = threading.Thread(target=callback, args=[saved_model_path, thread_results])
t.start()
t.join()
# Forward any exceptions thrown by the background thread
if "error" in thread_results:
raise thread_results["error"]
return thread_results["model"] After putting this workaround in place, the test case described above still leaks memory on TensorFlow 2.2.0. However, the leakage occurs at a much slower rate. I suspect that there is a second memory leak in TensorFlow 2.2.0's model loading path, and that this second leak has also been patched on the master branch without backporting the patch to the 2.2 branch. |
@k-w-w would it be possible to add the fix from the current master branch to the 2.2 branch?
|
Some updates:
Details follow. The root cause of the second memory leak is the lines immediately before the lines I pointed out in my previous comment: ( 316 func = function_lib.ConcreteFunction(func_graph)
317 func.add_to_graph()
318 if context.executing_eagerly():
319 func.add_to_graph(ops.get_default_graph()) Line 316 creates a Then line 317 calls So every function in the SavedModel is passed twice to When So every function in the SavedModel ends up instantiated in the current context with a reference count of 2. When the model's surrogate Python object goes out of scope, the This second memory leak is also fixed in commit 3421416 After that commit, line 317 is replaced with: func.add_to_graph(graph) where Here's an updated version of my workaround that corrects for both memory leaks: def load_model_hack(saved_model_path: str):
"""
Load a SavedModel without leaking as much memory as usual.
This function applies two workarounds:
* Load the model from a temporary background thread so that
`saved_model.load()` won't leave garbage hanging off of the global
default graph
* Unpin functions that `saved_model.load()` pins twice, so that the
garbage collection logic in `ConcreteFunction` will correctly remove
these functions when the restored model goes out of scope..
"""
import threading
from typing import Any, Dict
def callback(path: str, result_holder: Dict[str, Any]) -> None:
"""Callback function to be executed in a background thread"""
try:
result_holder["model"] = tf.saved_model.load(path)
# Every function that was pinned twice (and hopefully exactly those
# functions) should now be in the background thread's global default
# graph. Unpin these functions once.
default_graph = tf.compat.v1.get_default_graph()
for function_name in default_graph._functions.keys():
tf.python.context.remove_function(function_name)
except Exception as e:
result_holder["error"] = e
# Call saved_model.load() in a background thread
thread_results = {}
t = threading.Thread(target=callback, args=[saved_model_path, thread_results])
t.start()
t.join()
# Forward any exceptions thrown by the background thread
if "error" in thread_results:
raise thread_results["error"]
return thread_results["model"] This workaround further reduces the amount of memory that my test program leaks. However it does not completely eliminate memory leakage. |
Updates:
Details follow.
(See line 264 in 258 # Re-create everything except slot variables.
259 for node_id, proto in enumerate(self._proto.nodes):
260 if node_id in slot_variable_node_ids:
261 # Defer recreating slot variables so we can use the public Optimizer
262 # interface.
263 continue
264 node, setter = self._recreate(proto, node_id) <<<<<<<<<<<
265 nodes[node_id] = node
266 node_setters[node_id] = setter The (in message SavedObject {
// Objects which this object depends on: named edges in the dependency
// graph.
//
// Note: currently only valid if kind == "user_object".
repeated TrackableObjectGraph.TrackableObject.ObjectReference children = 1;
// Removed when forking SavedObject from TrackableObjectGraph.
reserved "attributes";
reserved 2;
// Slot variables owned by this object. This describes the three-way
// (optimizer, variable, slot variable) relationship; none of the three
// depend on the others directly.
//
// Note: currently only valid if kind == "user_object".
repeated TrackableObjectGraph.TrackableObject.SlotVariableReference
slot_variables = 3;
oneof kind {
SavedUserObject user_object = 4;
SavedAsset asset = 5;
SavedFunction function = 6;
SavedVariable variable = 7;
SavedBareConcreteFunction bare_concrete_function = 8;
SavedConstant constant = 9;
SavedResource resource = 10;
}
}
(in 353 def _recreate(self, proto, node_id):
354 """Creates a Python object from a SavedObject protocol buffer."""
355 factory = {
356 "user_object": (
357 lambda: self._recreate_user_object(proto.user_object, node_id)),
358 "asset": lambda: self._recreate_asset(proto.asset),
359 "function": lambda: self._recreate_function(proto.function),
360 "bare_concrete_function": functools.partial(
361 self._recreate_bare_concrete_function,
362 proto.bare_concrete_function),
363 "variable": lambda: self._recreate_variable(proto.variable),
364 "constant": lambda: self._recreate_constant(proto.constant),
365 "resource": lambda: self._recreate_resource(proto.resource),
366 }
367 kind = proto.WhichOneof("kind")
368 if kind not in factory:
369 raise ValueError("Unknown SavedObject type: %r" % kind)
360 return factory[kind]()
...
396 def _recreate_function(self, proto):
397 return function_deserialization.recreate_function(
398 proto, self._concrete_functions), setattr In the case of this memory leak, the if proto.kind is bare_concrete_function:
node = function_deserialization.setup_bare_concrete_function(proto,
self._concrete_functions)
setter = getattr
return node, setter
else ... # code that doesn't execute for this case (The reference to
(in 159 def setup_bare_concrete_function(saved_bare_concrete_function,
160 concrete_functions):
161 """Makes a restored bare concrete function callable."""
160 # Bare concrete functions accept only flat lists of Tensors with unique
163 # names.
164 concrete_function = concrete_functions[
165 saved_bare_concrete_function.concrete_function_name]
166 # pylint: disable=protected-access
167 concrete_function._arg_keywords = (
168 saved_bare_concrete_function.argument_keywords)
169 concrete_function._num_positional_args = (
170 saved_bare_concrete_function.allowed_positional_arguments)
171 # pylint: enable=protected-access
172 concrete_function.add_to_graph() <<<<<<<<<<
173 return concrete_function Of course, the code that created the And due to same the root cause as leaks 1 and 2, the Unfortunately, the class DeleteWithExtremePrejudice(object):
"""
A version of _EagerDefinedFunctionDeleter (see
tensorflow/python/eager/function.py) that keeps deleting the target function
until an InvalidArgumentError exception is thrown. Checking for that
exception is the only way to ensure that a function really has been deleted
and is not, in fact, still taking up memory.
"""
def __init__(self, func_name):
self.func_name = func_name
def __del__(self):
MAX_ATTEMPTS = 10
try:
for i in range(MAX_ATTEMPTS):
tf.python.context.remove_function(self.func_name)
# If we get here, removal did *not* fail as expected.
print(f"WARNING: Failed to remove function "
f"'{self.func_name}' after {MAX_ATTEMPTS} attempts. "
f"This problem may result in a memory leak.")
except tf.errors.InvalidArgumentError:
# tf.python.context.remove_function() throws this exception when
# you try to remove a function that has already been removed.
# In the case of this `try` clause, such behavior is "normal".
pass
except Exception as e:
print(f"WARNING: {e} thrown when attempting to delete function "
f"'{self.func_name}'. This problem may result in a memory leak.")
def load_model_hack(saved_model_path: str):
"""
Load a SavedModel without leaking memory.
This function applies two workarounds:
* Load the model from a temporary background thread so that
`saved_model.load()` won't leave garbage hanging off of the global
default graph
* Patch the garbage collection callbacks of all `ConcreteFunction`s
in the returned model so that these functions will be properly removed
when the restored model goes out of scope.
"""
import threading
from typing import Any, Dict
def callback(path: str, result_holder: Dict[str, Any]) -> None:
"""Callback function to be executed in a background thread"""
try:
model = tf.saved_model.load(path)
# Every function that was pinned two or more times should now be
# in the current thread's global default graph variable.
# Replace the deletion callbacks of these functions with a more
# effective version.
default_graph = tf.compat.v1.get_default_graph()
for func in default_graph._functions.values():
func._function_deleter = DeleteWithExtremePrejudice(func.name)
# NOTE: This assignment will trigger the previous deletion callback.
# That's ok, because every function in this list has been pinned
# at least twice.
result_holder["model"] = model
except Exception as e:
result_holder["error"] = e
# Call saved_model.load() in a background thread
thread_results = {}
t = threading.Thread(target=callback, args=[saved_model_path, thread_results])
t.start()
t.join() After applying this workaround, my test program leaks significantly less memory than before, but it still leaks memory. |
@jvishnuvardhan, I think this issue needs to be kept open a while longer. So far, we have verified that there is a serious memory leak in I would categorize this leak as a blocker for any application that needs to cycle models in and out of memory -- for example, to process a corpus of documents that span multiple languages; or to serve multiple versions of the same model. Our simple test program "only" leaks a few megabytes each time the model is loaded, but larger models with weights embedded in their graphs can leak hundreds of megabytes per load/unload cycle. The leak is actually three leaks, all of which were patched in master back in March (in commit 3421416). However, the fix was not included in the May release of TensorFlow 2.2.0. As of today, five months later, the fix has not been ported to the 2.2.x, 2.1.x, or 2.0.x branches of TensorFlow. TensorFlow 2.3.0 includes the fix for these three memory leaks. However, fixing this bug in 2.3.0 does not resolve this issue for us. My colleagues are currently using TensorFlow 2.2.x. In addition to the three large leaks, there is a fourth leak that is not currently patched in the master branch. You can see the presence of this fourth leak in the output of the notebook linked in your previous comment: With the simple 2-layer model in your example notebook, each call to I have tracked the root cause of the fourth leak to a problem in TensorFlow's mechanism for caching kernels. In addition to creating graphs, The 612 std::unordered_map<Fprint128, core::RefCountPtr<KernelAndDevice>,
613 Fprint128Hasher>
614 kernel_cache_ TF_GUARDED_BY(cache_mu_); The kernel cache does not have a size limit. There is an API call to clear the cache, but the Python side of TensorFlow only uses that API call when resetting the global random seed. Each entry in the cache is parameterized by the "fingerprint" of the associated operation. This "fingerprint" is a hash value computed from multiple parameters of the operation, including all of the operation's attributes.
The best workaround I've found for this problem is to have your Python application periodically clear the cache via the internal API. Here's some Python code to do so: import tensorflow.python as tf_internals
context_handle = tf_internals.eager.context.context()._context_handle
if context_handle is not None:
tf_internals.pywrap_tfe.TFE_ContextClearCaches(context_handle) A more permanent fix would be to evict stale entries from the cache following a least recently used policy. I'm working on a PR to apply such a fix. The above workaround reduces but does not eliminate the memory leakage of my test program. Before the workaround, each call to |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Not sure why tensorflow-butler thinks this issue hasn't had recent activity. There is an open pull request for part of the problem described here. |
@frreiss regarding the backport, we typically dont backport bug fixes into previous releases. Is it possible for you to use a later version like TF 2.3.0 which has the fixes ? |
Hi @goldiegadde . I see from the release notes that TensorFlow 2.3.1 includes 25 bug fixes, and TensorFlow 2.2.1 includes 19 bug fixes. Perhaps you meant to say that you typically don't backport fixes for memory leaks? Moving to 2.3.x is the only viable option for us at this point, and my colleagues will be doing so in spite of the disruption that this entails. The fourth leak is currently low on my priority list because #33412 is a much more severe problem for us. Hopefully that fourth leak fixes itself. |
I just tested the above colab notebook with the latest tf-nightly (version 2.5.0-dev20201214) and this memory increase with iterations looks reduced by a fair bit. Its unclear to me if this is fully fixed, but perhaps the fourth memory leak you mentioned has been fixed. Edit: |
I think this issue is as fixed as it's going to be. What do you think, @idfah ? |
Wow when reading the third leak, I thought it would be hilarious if there is a fourth (though I think there would be a fifth), by any chance, have you documented the thinking and the tools used to track down these leaks? |
My main piece of advice in terms of tools is that the tools lie. TensorFlow is a very complex program with multiple heaps (Python heap, C heap, TensorFlow's own memory manager, and CUDA) and global data structures that can look like leaks at first glance. You need to use several leak checkers and cross-reference their outputs. For this particular issue, I used |
If a Keras model is saved using
tf.saved_model.save
and then repeatedly loaded withtf.saved_model.load
and deleted withdel
it becomes apparent that there is a slow memory leak.keras.backend.clear_session
does not resolve this issue. See attached gist for an example that reproduces this issue in TensorFlow 2.2 on Google Colab.System information
I have attached a custom repro case, but this appears to happen for various types of typical keras models.
Can reproduce in Google Colab and Docker RedHat images
not tested
binary (from pip)
('2.2.0', 'v2.2.0-0-g2b96f3662b')
3.6.9 (google colab)
N/A
N/A
default in google colab
default in google colab
Describe the current behavior
When Keras models are saved / loaded repeatedly, memory usage gradually continues to grow over time. For dynamic model servers that load and unload models over time, this may eventually lead to a crash due to memory exhaustion.
Describe the expected behavior
All memory should be recovered after a keras model instance is deleted with
del
and the garbage collector is run withgc.collect()
.Standalone code to reproduce the issue
The following GitHub gist demonstrates the issue (can also be run in Colab):
https://gist.github.com/idfah/dff83de8d2a6406c9b92221e6282a8d6
The text was updated successfully, but these errors were encountered: