Skip to content

Add memprobe diagnostic module for memory-growth tracking (#176)#179

Merged
lmoresi merged 3 commits into
developmentfrom
bugfix/memprobe-diagnostics
May 11, 2026
Merged

Add memprobe diagnostic module for memory-growth tracking (#176)#179
lmoresi merged 3 commits into
developmentfrom
bugfix/memprobe-diagnostics

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 10, 2026

Summary

Adds uw.utilities.memprobe, an opt-in diagnostic module for surfacing memory growth in long parallel runs. Complementary to #178 (the fix for #176): #178 plugged the specific leak holes, this PR is the leak-detector that survives for the next time someone reports OOM.

What it tracks

Signal Source Cost
Process RSS (MiB, current) psutil/proc/self/statm (Linux) → resource.ru_maxrss (last resort, peak only) free
KDTree live + total-constructed Cython class counters in __cinit__/__dealloc__ free
Per-class Python instance counts gc.get_objects() walk slow — gated behind full=True

KDTree counts are the signal neither PETSc's tracker nor #178's regression test covers — UW's nanoflann wrapper is invisible to PETSc.

PETSc-side object/allocation tracking is not parsed from Python. PETSc's -log_view and -malloc_dump runtime flags give the same information more reliably and are documented in the guide. memprobe.dump_petsc_leaks_at_finalize() is a small helper that turns those on programmatically.

Activation

  • UW_MEMPROBE=1 env var → enables at import time, decorated solvers start emitting per-call diffs.
  • memprobe.enable() / disable() for runtime toggling.
  • with memprobe.probe("step 42"): for ad-hoc blocks.
  • @memprobe.instrument("label") decorator — fast-returns when disabled (one bool check), safe on hot paths. Pre-applied to Stokes.solve() and NavierStokes.solve().

Sample output

[memprobe] Stokes.solve:
  RSS +0.42 MiB
[memprobe] build-kdtree-batch:
  RSS +0.03 MiB
  kdtree: live +5, total_constructed +5
[memprobe] free-kdtree-batch:
  kdtree: live -5

Bonus: dictionary-iteration race fix

UWexpression._ephemeral_expr_names was iterated directly while weakref finalizers mutated it under cyclic GC, raising RuntimeError: dictionary changed size during iteration. Pre-existing bug — surfaced flakily depending on test ordering. Fixed by snapshotting keys via list(...) before iterating. Independent of memprobe but landed here since the new tests reliably triggered it.

Files

  • src/underworld3/utilities/memprobe.py — module
  • src/underworld3/ckdtree.pyx — KDTree counter instrumentation
  • src/underworld3/systems/solvers.py — decorate Stokes + NavierStokes solve
  • src/underworld3/function/expressions.py — race fix
  • tests/test_0780_memprobe.py — 9 smoke tests
  • docs/developer/guides/memory-diagnostics.md — usage + debugging recipes

Validation against #178

Once merged, a useful cross-check: run any earlier-leaking workload with UW_MEMPROBE=1 and confirm flat per-step RSS deltas where they used to climb. That validates both PRs at once.

Test plan

Note on Copilot review

Copilot's two outstanding inline comments on _rss_mb() and the RSS threshold reference resource.getrusage / a missing threshold. Both were addressed in b4665d1 — psutil is now the primary source (resource is a last-resort fallback only), and the threshold is abs(drss) >= 0.01 MiB. The comments are pinned to the same line numbers in the rebased file and so re-surfaced after the rebase.

Underworld development team with AI support from Claude Code

Copilot AI review requested due to automatic review settings May 10, 2026 10:40
@lmoresi lmoresi mentioned this pull request May 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in uw.utilities.memprobe diagnostic module to help attribute memory growth in long MPI runs (motivated by issue #176), including lightweight RSS/KDTree counters, optional GC-based class counting, PETSc leak-dump option helper, solver-level instrumentation hooks, documentation, and smoke tests.

Changes:

  • Introduces underworld3.utilities.memprobe with snapshot/diff/probe/decorator APIs and a PETSc finalize leak-dump helper.
  • Instruments KDTree lifecycle counts in ckdtree.pyx and decorates Stokes.solve() / NavierStokes.solve() with memprobe instrumentation.
  • Adds developer guide + pytest smoke tests for the new module.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/underworld3/utilities/memprobe.py Implements snapshot/diff formatting, probe context manager, instrumentation decorator, and PETSc option helper.
src/underworld3/ckdtree.pyx Adds module-level live/constructed counters and exposes them for diagnostics.
src/underworld3/systems/solvers.py Decorates key solver solve() methods to emit memprobe diffs when enabled.
src/underworld3/utilities/__init__.py Exposes memprobe under underworld3.utilities.
tests/test_0780_memprobe.py Smoke tests for memprobe API, KDTree counters, and enabled/disabled behavior.
docs/developer/guides/memory-diagnostics.md Documents how to use memprobe and PETSc leak-tracking flags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +89
def _rss_mb() -> float:
"""Current resident-set size in MiB.

On Linux ``ru_maxrss`` is in KiB; on macOS it is in bytes. We probe the
platform once to avoid getting it wrong silently.
"""
rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if sys.platform == "darwin":
return rss / (1024 * 1024)
return rss / 1024

"""
delta: dict[str, Any] = {}

drss = after["rss_mb"] - before["rss_mb"]
Comment thread src/underworld3/utilities/memprobe.py Outdated
Comment on lines +262 to +265
opts.setValue("-malloc_dump", "")
opts.setValue("-objects_dump", "")
if filename:
opts.setValue("-malloc_view", filename)
Comment thread tests/test_0780_memprobe.py Outdated
itself can verify.
"""
import gc
import os
Comment thread src/underworld3/ckdtree.pyx Outdated
Comment on lines +19 to +22
# Incremented in __cinit__ and decremented in __dealloc__ — Cython
# guarantees deterministic destruction so this stays accurate across
# normal use. Read via uw.utilities.memprobe.snapshot(), or directly
# via uw.kdtree.live_count().

| Signal | Source | Cost |
|---|---|---|
| Process RSS (MiB) | `resource.getrusage` | free |
lmoresi added a commit that referenced this pull request May 10, 2026
- _rss_mb(): switch from resource.ru_maxrss (peak/high-water RSS) to
  psutil.Process().memory_info().rss (current RSS), with /proc/self/statm
  Linux fallback and ru_maxrss kept only as a last resort. Current RSS
  is preferred so freed memory shows as a negative delta. (Copilot #1, #6)

- diff(): apply 0.01 MiB threshold before including rss_mb in the delta.
  Smaller noise prints as "+0.00 MiB" and defeats the "no change" fast
  path. (Copilot #2)

- dump_petsc_leaks_at_finalize(): drop leading "-" from PETSc option
  keys (codebase convention is no dash; "-malloc_dump" was a no-op),
  and align docstring with what the function actually sets. (Copilot #3)

- Remove unused `os` import from test module. (Copilot #4)

- Soften ckdtree.pyx comment claiming Cython "guarantees deterministic
  destruction" — that's CPython refcounting, which can lag for objects
  trapped in reference cycles until cyclic GC runs. (Copilot #5)

- Test: assert KDTree count deltas relative to immediate before/after,
  never to absolute baselines. Earlier tests in the same pytest session
  can leave KDTree refs alive that the cyclic GC may collect at any
  time, shifting the absolute count. CI surfaced this with
  "assert 332 == 338" — six trees from earlier tests collected during
  our gc.collect().

Underworld development team with AI support from Claude Code
lmoresi added a commit that referenced this pull request May 10, 2026
CI on PR #179 surfaced a flaky failure in test_bc_accepts_raw_numbers:

  any(k[0] == name for k in UWexpression._ephemeral_expr_names)
  RuntimeError: dictionary changed size during iteration

The dict is mutated asynchronously by weakref finalizers running during
cyclic GC, so iterating it directly races against those callbacks. The
fix is to snapshot the keys with list(...) before iterating — at most a
few hundred entries, negligible cost.

Pre-existing bug — surfaces depending on test ordering, GC pressure,
and timing. The memprobe PR's added tests happen to shift teardown
state enough to trigger it consistently. Worth landing here so #179
isn't blocked.

Underworld development team with AI support from Claude Code
lmoresi added 3 commits May 10, 2026 22:42
Long parallel runs occasionally OOM on HPC even when each step looks
small. memprobe gives a way to "light up" memory tracking on demand,
sample at regular intervals, and pin which subsystem is growing.

Components:
- src/underworld3/utilities/memprobe.py — snapshot/diff/probe/instrument
  API. Snapshots capture process RSS (resource.getrusage), KDTree live +
  total-constructed counts, and (with full=True) per-class Python
  instance counts via gc.get_objects. Diffs filter out unchanged keys
  and sort py_classes by absolute change so dominant suspects surface
  first.
- ckdtree.pyx — module-level live-instance counter, accurate via
  Cython's deterministic __cinit__/__dealloc__. Exposed as
  uw.kdtree.live_count() and uw.kdtree.total_constructed().
- Stokes.solve() and NavierStokes.solve() decorated with
  @memprobe.instrument(...). The decorator fast-returns when ENABLED
  is False (one bool check, sub-microsecond), so it's safe on hot paths.

Activation:
- UW_MEMPROBE=1 env var → enables instrumentation hooks at import time.
- memprobe.enable() / disable() for runtime toggling.
- with memprobe.probe("label"): … for ad-hoc blocks.

Skipped: parsing PETSc.Log object tables from Python. The runtime
flags -log_view and -malloc_dump give the same information more
reliably; documented in the guide and exposed via
memprobe.dump_petsc_leaks_at_finalize().

Adds tests/test_0780_memprobe.py (9 smoke tests) and
docs/developer/guides/memory-diagnostics.md with debugging recipes.

Does not fix issue #176 (Ben's OOM on HPC) — provides the
instrumentation he needs to bisect it.

Underworld development team with AI support from Claude Code
- _rss_mb(): switch from resource.ru_maxrss (peak/high-water RSS) to
  psutil.Process().memory_info().rss (current RSS), with /proc/self/statm
  Linux fallback and ru_maxrss kept only as a last resort. Current RSS
  is preferred so freed memory shows as a negative delta. (Copilot #1, #6)

- diff(): apply 0.01 MiB threshold before including rss_mb in the delta.
  Smaller noise prints as "+0.00 MiB" and defeats the "no change" fast
  path. (Copilot #2)

- dump_petsc_leaks_at_finalize(): drop leading "-" from PETSc option
  keys (codebase convention is no dash; "-malloc_dump" was a no-op),
  and align docstring with what the function actually sets. (Copilot #3)

- Remove unused `os` import from test module. (Copilot #4)

- Soften ckdtree.pyx comment claiming Cython "guarantees deterministic
  destruction" — that's CPython refcounting, which can lag for objects
  trapped in reference cycles until cyclic GC runs. (Copilot #5)

- Test: assert KDTree count deltas relative to immediate before/after,
  never to absolute baselines. Earlier tests in the same pytest session
  can leave KDTree refs alive that the cyclic GC may collect at any
  time, shifting the absolute count. CI surfaced this with
  "assert 332 == 338" — six trees from earlier tests collected during
  our gc.collect().

Underworld development team with AI support from Claude Code
CI on PR #179 surfaced a flaky failure in test_bc_accepts_raw_numbers:

  any(k[0] == name for k in UWexpression._ephemeral_expr_names)
  RuntimeError: dictionary changed size during iteration

The dict is mutated asynchronously by weakref finalizers running during
cyclic GC, so iterating it directly races against those callbacks. The
fix is to snapshot the keys with list(...) before iterating — at most a
few hundred entries, negligible cost.

Pre-existing bug — surfaces depending on test ordering, GC pressure,
and timing. The memprobe PR's added tests happen to shift teardown
state enough to trigger it consistently. Worth landing here so #179
isn't blocked.

Underworld development team with AI support from Claude Code
@lmoresi lmoresi force-pushed the bugfix/memprobe-diagnostics branch from d1a66a4 to 5f64de2 Compare May 10, 2026 12:45
@lmoresi lmoresi merged commit 21b037c into development May 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants