Skip to content

Fix comprehensive memory leaks in solver setup, interpolation caching, and SubDM syncing#178

Merged
lmoresi merged 1 commit into
developmentfrom
fix/memory-leak-comprehensive
May 10, 2026
Merged

Fix comprehensive memory leaks in solver setup, interpolation caching, and SubDM syncing#178
lmoresi merged 1 commit into
developmentfrom
fix/memory-leak-comprehensive

Conversation

@bknight1
Copy link
Copy Markdown
Member

Resolves #176

Summary of Changes

This PR comprehensively addresses severe memory leaks identified in the solver loop, particularly during dynamic coordinate updates (e.g., swarm advection) and tight solve iterations.

  1. DMInterpolationCache LRU Policy: Implemented a Least-Recently-Used (LRU) eviction policy in the DMInterpolationCache with a max_entries cap (default: 10). This prevents unbounded memory growth when evaluation coordinates change every step.
  2. SubDM Cleanup in MeshVariable: Added explicit subdm.destroy() calls in _sync_lvec_to_gvec and other MeshVariable synchronization paths. Previously, temporary SubDM objects created during data syncs were orphaned.
  3. Solver Property Guards: Modified SymbolicProperty and solver setters to use self._needs_function_rewire = True instead of self.is_setup = False when only pointwise functions change. This avoids catastrophic, redundant teardowns and rebuilds of the PETSc DM hierarchy.
  4. Stokes & Solver Base Cleanup: Implemented comprehensive explicit destruction (.destroy()) of old PETSc FE objects, DM hierarchies, and SNES objects within the _setup_discretisation and _build paths before allocating new ones.

Verification and Test Findings

A new robust memory leak regression test (tests/test_0006_memory_leak.py) was added. The test isolates steady-state leakage by warming up for 30 steps and measuring absolute memory growth over the subsequent 70 steps on a 32x32 mesh.

  • Before PR (development branch): Memory grew by 77.83 MiB (~1.1 MiB leaked per step) over the final 70 steps, failing the test.
  • After PR (fix/memory-leak-comprehensive branch): Memory growth dropped to just 9.94 MiB (~0.14 MiB per step) over the same window. This remaining footprint represents high-water mark expansion rather than a leak. The test passes securely.

All standard regression and parallel tests continue to pass.

Copilot AI review requested due to automatic review settings May 10, 2026 10:17
@bknight1 bknight1 requested a review from lmoresi as a code owner May 10, 2026 10:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets PETSc-related memory leaks and rebuild churn in the Underworld3 solver loop, especially under frequent coordinate updates and repeated solve iterations (per issue #176).

Changes:

  • Added an LRU + max_entries cap to DMInterpolationCache to prevent unbounded cache growth when evaluation coordinates vary.
  • Improved PETSc object lifecycle management by explicitly destroying temporary SubDMs and cached decomposition objects (IS/SubDM), and by tearing down stale solver DM/SNES state before rebuilding.
  • Switched many solver setters from full is_setup=False rebuilds to the narrower _needs_function_rewire=True path, and added a new memory leak regression test.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_0006_memory_leak.py Adds a regression test intended to detect solver-loop memory growth across many coupled solves.
src/underworld3/utilities/_api_tools.py Updates SymbolicProperty change-detection behavior for solver setup invalidation.
src/underworld3/systems/solvers.py Replaces many is_setup=False invalidations with _needs_function_rewire=True; expands Stokes solve() signature.
src/underworld3/function/dminterpolation_cache.py Implements an LRU eviction policy with max_entries and eviction stats.
src/underworld3/discretisation/discretisation_mesh_variables.py Destroys temporary SubDM created during local→global vector sync.
src/underworld3/cython/petsc_generic_snes_solvers.pyx Adds explicit cleanup for solver DM/SNES and decomposition caches; caches IS sets to avoid per-solve allocations; fixes local/global Vec restore.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -97,14 +97,6 @@ def __get__(self, obj, objtype=None):

def __set__(self, obj, value):
"""Set the value, with automatic unwrapping."""
Comment on lines +113 to +118
if hasattr(obj, "is_setup"):
old_value = getattr(obj, self.attr_name, None)
# Use 'is not' check first for speed
if old_value is not value:
# Check for mathematical inequality (handles SymPy expressions)
if old_value != value:
if len(self._cache) >= self.max_entries:
# Remove oldest (first) item
oldest_key = next(iter(self._cache))
del self._cache[oldest_key]
dt = 0.001
advdiff.solve(timestep=dt)

with swarm.access(swarm):
for line in f:
if line.startswith('VmRSS:'):
return int(line.split()[1]) / 1024.0 # MiB
except:
Comment on lines +23 to +34
def test_stokes_advdiff_memory_leak():
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# Parameters
res = 32
n_steps = 100
warmup_step = 30

# Setup mesh
mesh = uw.meshing.StructuredQuadBox(elementRes=(res, res))
Comment on lines 1237 to 1245
zero_init_guess: bool = True,
timestep: float = None,
_force_setup: bool = False,
verbose=False,
verbose: bool = False,
debug: bool = False,
debug_name: str = None,
evalf=False,
order=None,
picard: int = 0,
new_dt = float(value.data) if hasattr(value, 'data') else float(value)
if np.isclose(old_dt, new_dt, rtol=1e-12, atol=1e-15):
return
except:
new_dt = float(value.data) if hasattr(value, 'data') else float(value)
if np.isclose(old_dt, new_dt, rtol=1e-12, atol=1e-15):
return
except:
Copy link
Copy Markdown
Member

@lmoresi lmoresi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Ben,

I added some diagnostics at the same time you figured out the fix. They should be complimentary and help find the issues quickly next time.

@lmoresi lmoresi merged commit 848b8b9 into development May 10, 2026
5 checks passed
@bknight1
Copy link
Copy Markdown
Member Author

Sounds good, thanks @lmoresi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants