[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Background
In order to clean up model memory, the LLM Compressor tests rely on the python garbage collector to recognize dereferenced model objects and remove them from memory. This, in turn, drops pytorch tensor references, which, through the pytorch caching allocator, are recognized and lead to cuda memory being deallocated.
This whole collection process starts with the python garbage collector. However, the garbage collector is not perfect and will sometimes take longer to recognize some objects as dereferenced than others. Specifically, objects with cyclical references seem to take significantly longer to collect (this is because detecting reference cycles is more computationally expensive than standard reference counting)
These python objects can take so long to collect, to the point where cuda can run out of memory before the python garbage collector collects. Surprisingly, the pytorch caching allocator does not call
gc.collect()
prior to raising an OOM error, a fact which has been confirmed through my own tests and anecdotally matches @yewentao256's experience with the pytorch cuda caching allocator.Garbage Collection and LLM Compressor
It seems that model objects which have been called by
modify_save_pretrained
produce reference cycles, as their overridden functions reference their own models directly. From local testing, I see that models which do not have reference cycles are cleaned up faster than models that do have reference cycles. However, this principle does not seem to generalize beyond one file, as the nightly tests still fail, even when no cycle is present.Changes
gc.collect()
call after every test finishes in order to make sure memory bugs do not persist across tests.Testing