[40171] Limit size of kernel cache #42337
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
EagerContext
class caches instances of kernels that have been used by the current session. Because kernel state is opaque toEagerContext
, the caching mechanism is very conservative about reusing kernels. Also, there is no limit on the number of kernels cached. As a result of these two factors, the size of the kernel cache grows in an unbounded fashion if the application callstf.saved_model.load()
repeatedly. This growth is the root cause of one of the memory leaks observed in issue #40171 .This PR puts an upper limit on the size of the kernel cache in
EagerContext
. If the number of kernels exceeds this limit, the cache discards the least recently used kernel. I have hard-coded the capacity to a conservative number of 10000 kernels, which should be enough to prevent thrashing in normal usage while still providing protection against memory leaks.I also added a
RemoveKernelFromCache()
method to go with theAddKernelToCache()
method.Here is a graph showing memory usage for a Python script that repeatedly loads and unloads a toy Keras model:
The blue points show the memory footprint of that script before applying these changes; and the orange points show the memory footprint after applying these changes.
After the changes in this PR, memory usage of my test script increases until the kernel cache reaches its maximum size. After that, the memory usage increases more slowly, because there are additional memory leaks in
saved_model.load()
that are not addressed in this PR.