[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503

kylesayrs · 2025-06-02T21:54:45Z

Purpose

Fix failing tests where tests unexpectedly run out of cuda memory

Background

In order to clean up model memory, the LLM Compressor tests rely on the python garbage collector to recognize dereferenced model objects and remove them from memory. This, in turn, drops pytorch tensor references, which, through the pytorch caching allocator, are recognized and lead to cuda memory being deallocated.

This whole collection process starts with the python garbage collector. However, the garbage collector is not perfect and will sometimes take longer to recognize some objects as dereferenced than others. Specifically, objects with cyclical references seem to take significantly longer to collect (this is because detecting reference cycles is more computationally expensive than standard reference counting)

These python objects can take so long to collect, to the point where cuda can run out of memory before the python garbage collector collects. Surprisingly, the pytorch caching allocator does not call gc.collect() prior to raising an OOM error, a fact which has been confirmed through my own tests and anecdotally matches @yewentao256's experience with the pytorch cuda caching allocator.

Garbage Collection and LLM Compressor

It seems that model objects which have been called by modify_save_pretrained produce reference cycles, as their overridden functions reference their own models directly. From local testing, I see that models which do not have reference cycles are cleaned up faster than models that do have reference cycles. However, this principle does not seem to generalize beyond one file, as the nightly tests still fail, even when no cycle is present.

Changes

In order to guarantee memory that can be collect is collected, add a gc.collect() call after every test finishes in order to make sure memory bugs do not persist across tests.
This may cause tests to run slightly slower

Testing

Nightly: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/15453072879

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-06-02T21:55:01Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

brian-dellabetta

nice use of weakref

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use weakref to model

487885c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs changed the title ~~[Hotfix] Use weakref to model~~ [Hotfix] Remove cyclical references created by modify_save_pretrained Jun 2, 2025

brian-dellabetta previously approved these changes Jun 2, 2025

View reviewed changes

explicitly call gc collect

71820e1

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs dismissed brian-dellabetta’s stale review via 71820e1 June 3, 2025 19:25

remove unused test

4ff874b

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs changed the title ~~[Hotfix] Remove cyclical references created by modify_save_pretrained~~ [Bugfix] [Tests] Perform explicit garbage collection in between tests Jun 3, 2025

yewentao256 approved these changes Jun 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503

[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503

Uh oh!

kylesayrs commented Jun 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503

Are you sure you want to change the base?

[Bugfix] [Tests] Perform explicit garbage collection in between tests #1503

Uh oh!

Conversation

kylesayrs commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Background

Garbage Collection and LLM Compressor

Changes

Testing

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs commented Jun 2, 2025 •

edited

Loading