Add memprobe diagnostic module for memory-growth tracking (#176)#179
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an opt-in uw.utilities.memprobe diagnostic module to help attribute memory growth in long MPI runs (motivated by issue #176), including lightweight RSS/KDTree counters, optional GC-based class counting, PETSc leak-dump option helper, solver-level instrumentation hooks, documentation, and smoke tests.
Changes:
- Introduces
underworld3.utilities.memprobewith snapshot/diff/probe/decorator APIs and a PETSc finalize leak-dump helper. - Instruments KDTree lifecycle counts in
ckdtree.pyxand decoratesStokes.solve()/NavierStokes.solve()with memprobe instrumentation. - Adds developer guide + pytest smoke tests for the new module.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/underworld3/utilities/memprobe.py |
Implements snapshot/diff formatting, probe context manager, instrumentation decorator, and PETSc option helper. |
src/underworld3/ckdtree.pyx |
Adds module-level live/constructed counters and exposes them for diagnostics. |
src/underworld3/systems/solvers.py |
Decorates key solver solve() methods to emit memprobe diffs when enabled. |
src/underworld3/utilities/__init__.py |
Exposes memprobe under underworld3.utilities. |
tests/test_0780_memprobe.py |
Smoke tests for memprobe API, KDTree counters, and enabled/disabled behavior. |
docs/developer/guides/memory-diagnostics.md |
Documents how to use memprobe and PETSc leak-tracking flags. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+79
to
+89
| def _rss_mb() -> float: | ||
| """Current resident-set size in MiB. | ||
|
|
||
| On Linux ``ru_maxrss`` is in KiB; on macOS it is in bytes. We probe the | ||
| platform once to avoid getting it wrong silently. | ||
| """ | ||
| rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss | ||
| if sys.platform == "darwin": | ||
| return rss / (1024 * 1024) | ||
| return rss / 1024 | ||
|
|
| """ | ||
| delta: dict[str, Any] = {} | ||
|
|
||
| drss = after["rss_mb"] - before["rss_mb"] |
Comment on lines
+262
to
+265
| opts.setValue("-malloc_dump", "") | ||
| opts.setValue("-objects_dump", "") | ||
| if filename: | ||
| opts.setValue("-malloc_view", filename) |
| itself can verify. | ||
| """ | ||
| import gc | ||
| import os |
Comment on lines
+19
to
+22
| # Incremented in __cinit__ and decremented in __dealloc__ — Cython | ||
| # guarantees deterministic destruction so this stays accurate across | ||
| # normal use. Read via uw.utilities.memprobe.snapshot(), or directly | ||
| # via uw.kdtree.live_count(). |
|
|
||
| | Signal | Source | Cost | | ||
| |---|---|---| | ||
| | Process RSS (MiB) | `resource.getrusage` | free | |
lmoresi
added a commit
that referenced
this pull request
May 10, 2026
- _rss_mb(): switch from resource.ru_maxrss (peak/high-water RSS) to psutil.Process().memory_info().rss (current RSS), with /proc/self/statm Linux fallback and ru_maxrss kept only as a last resort. Current RSS is preferred so freed memory shows as a negative delta. (Copilot #1, #6) - diff(): apply 0.01 MiB threshold before including rss_mb in the delta. Smaller noise prints as "+0.00 MiB" and defeats the "no change" fast path. (Copilot #2) - dump_petsc_leaks_at_finalize(): drop leading "-" from PETSc option keys (codebase convention is no dash; "-malloc_dump" was a no-op), and align docstring with what the function actually sets. (Copilot #3) - Remove unused `os` import from test module. (Copilot #4) - Soften ckdtree.pyx comment claiming Cython "guarantees deterministic destruction" — that's CPython refcounting, which can lag for objects trapped in reference cycles until cyclic GC runs. (Copilot #5) - Test: assert KDTree count deltas relative to immediate before/after, never to absolute baselines. Earlier tests in the same pytest session can leave KDTree refs alive that the cyclic GC may collect at any time, shifting the absolute count. CI surfaced this with "assert 332 == 338" — six trees from earlier tests collected during our gc.collect(). Underworld development team with AI support from Claude Code
lmoresi
added a commit
that referenced
this pull request
May 10, 2026
CI on PR #179 surfaced a flaky failure in test_bc_accepts_raw_numbers: any(k[0] == name for k in UWexpression._ephemeral_expr_names) RuntimeError: dictionary changed size during iteration The dict is mutated asynchronously by weakref finalizers running during cyclic GC, so iterating it directly races against those callbacks. The fix is to snapshot the keys with list(...) before iterating — at most a few hundred entries, negligible cost. Pre-existing bug — surfaces depending on test ordering, GC pressure, and timing. The memprobe PR's added tests happen to shift teardown state enough to trigger it consistently. Worth landing here so #179 isn't blocked. Underworld development team with AI support from Claude Code
Long parallel runs occasionally OOM on HPC even when each step looks
small. memprobe gives a way to "light up" memory tracking on demand,
sample at regular intervals, and pin which subsystem is growing.
Components:
- src/underworld3/utilities/memprobe.py — snapshot/diff/probe/instrument
API. Snapshots capture process RSS (resource.getrusage), KDTree live +
total-constructed counts, and (with full=True) per-class Python
instance counts via gc.get_objects. Diffs filter out unchanged keys
and sort py_classes by absolute change so dominant suspects surface
first.
- ckdtree.pyx — module-level live-instance counter, accurate via
Cython's deterministic __cinit__/__dealloc__. Exposed as
uw.kdtree.live_count() and uw.kdtree.total_constructed().
- Stokes.solve() and NavierStokes.solve() decorated with
@memprobe.instrument(...). The decorator fast-returns when ENABLED
is False (one bool check, sub-microsecond), so it's safe on hot paths.
Activation:
- UW_MEMPROBE=1 env var → enables instrumentation hooks at import time.
- memprobe.enable() / disable() for runtime toggling.
- with memprobe.probe("label"): … for ad-hoc blocks.
Skipped: parsing PETSc.Log object tables from Python. The runtime
flags -log_view and -malloc_dump give the same information more
reliably; documented in the guide and exposed via
memprobe.dump_petsc_leaks_at_finalize().
Adds tests/test_0780_memprobe.py (9 smoke tests) and
docs/developer/guides/memory-diagnostics.md with debugging recipes.
Does not fix issue #176 (Ben's OOM on HPC) — provides the
instrumentation he needs to bisect it.
Underworld development team with AI support from Claude Code
- _rss_mb(): switch from resource.ru_maxrss (peak/high-water RSS) to psutil.Process().memory_info().rss (current RSS), with /proc/self/statm Linux fallback and ru_maxrss kept only as a last resort. Current RSS is preferred so freed memory shows as a negative delta. (Copilot #1, #6) - diff(): apply 0.01 MiB threshold before including rss_mb in the delta. Smaller noise prints as "+0.00 MiB" and defeats the "no change" fast path. (Copilot #2) - dump_petsc_leaks_at_finalize(): drop leading "-" from PETSc option keys (codebase convention is no dash; "-malloc_dump" was a no-op), and align docstring with what the function actually sets. (Copilot #3) - Remove unused `os` import from test module. (Copilot #4) - Soften ckdtree.pyx comment claiming Cython "guarantees deterministic destruction" — that's CPython refcounting, which can lag for objects trapped in reference cycles until cyclic GC runs. (Copilot #5) - Test: assert KDTree count deltas relative to immediate before/after, never to absolute baselines. Earlier tests in the same pytest session can leave KDTree refs alive that the cyclic GC may collect at any time, shifting the absolute count. CI surfaced this with "assert 332 == 338" — six trees from earlier tests collected during our gc.collect(). Underworld development team with AI support from Claude Code
CI on PR #179 surfaced a flaky failure in test_bc_accepts_raw_numbers: any(k[0] == name for k in UWexpression._ephemeral_expr_names) RuntimeError: dictionary changed size during iteration The dict is mutated asynchronously by weakref finalizers running during cyclic GC, so iterating it directly races against those callbacks. The fix is to snapshot the keys with list(...) before iterating — at most a few hundred entries, negligible cost. Pre-existing bug — surfaces depending on test ordering, GC pressure, and timing. The memprobe PR's added tests happen to shift teardown state enough to trigger it consistently. Worth landing here so #179 isn't blocked. Underworld development team with AI support from Claude Code
d1a66a4 to
5f64de2
Compare
This was referenced May 11, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
uw.utilities.memprobe, an opt-in diagnostic module for surfacing memory growth in long parallel runs. Complementary to #178 (the fix for #176): #178 plugged the specific leak holes, this PR is the leak-detector that survives for the next time someone reports OOM.What it tracks
psutil→/proc/self/statm(Linux) →resource.ru_maxrss(last resort, peak only)KDTreelive + total-constructed__cinit__/__dealloc__gc.get_objects()walkfull=TrueKDTree counts are the signal neither PETSc's tracker nor #178's regression test covers — UW's nanoflann wrapper is invisible to PETSc.
PETSc-side object/allocation tracking is not parsed from Python. PETSc's
-log_viewand-malloc_dumpruntime flags give the same information more reliably and are documented in the guide.memprobe.dump_petsc_leaks_at_finalize()is a small helper that turns those on programmatically.Activation
UW_MEMPROBE=1env var → enables at import time, decorated solvers start emitting per-call diffs.memprobe.enable()/disable()for runtime toggling.with memprobe.probe("step 42"):for ad-hoc blocks.@memprobe.instrument("label")decorator — fast-returns when disabled (one bool check), safe on hot paths. Pre-applied toStokes.solve()andNavierStokes.solve().Sample output
Bonus: dictionary-iteration race fix
UWexpression._ephemeral_expr_nameswas iterated directly while weakref finalizers mutated it under cyclic GC, raisingRuntimeError: dictionary changed size during iteration. Pre-existing bug — surfaced flakily depending on test ordering. Fixed by snapshotting keys vialist(...)before iterating. Independent of memprobe but landed here since the new tests reliably triggered it.Files
src/underworld3/utilities/memprobe.py— modulesrc/underworld3/ckdtree.pyx— KDTree counter instrumentationsrc/underworld3/systems/solvers.py— decorate Stokes + NavierStokes solvesrc/underworld3/function/expressions.py— race fixtests/test_0780_memprobe.py— 9 smoke testsdocs/developer/guides/memory-diagnostics.md— usage + debugging recipesValidation against #178
Once merged, a useful cross-check: run any earlier-leaking workload with
UW_MEMPROBE=1and confirm flat per-step RSS deltas where they used to climb. That validates both PRs at once.Test plan
pytest tests/test_0780_memprobe.py— 9 passed locally and on CIUW_MEMPROBE=1emits expected diffslivereturns to baseline afterdel tree; gc.collect()test_0006_memory_leak.py(Ben's Fix comprehensive memory leaks in solver setup, interpolation caching, and SubDM syncing #178 regression test) — passesNote on Copilot review
Copilot's two outstanding inline comments on
_rss_mb()and the RSS threshold referenceresource.getrusage/ a missing threshold. Both were addressed inb4665d1— psutil is now the primary source (resource is a last-resort fallback only), and the threshold isabs(drss) >= 0.01 MiB. The comments are pinned to the same line numbers in the rebased file and so re-surfaced after the rebase.Underworld development team with AI support from Claude Code