Skip to content

Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative)#172

Merged
lmoresi merged 1 commit intodevelopmentfrom
bugfix/integral-evaluate-linear-growth-171
May 5, 2026
Merged

Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative)#172
lmoresi merged 1 commit intodevelopmentfrom
bugfix/integral-evaluate-linear-growth-171

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 5, 2026

Summary

Replaces `Integral.evaluate()`'s shared-mesh-DS path with a per-call clone-DM + `createDS` (matching the working `CellWiseIntegral` pattern). On current development this delivers a clean ~3× per-call speedup for volume integrals.

The original linear-growth narrative from #171 no longer reproduces on current development — see "What changed" below. The fix is still worth landing as a perf cleanup and a structural improvement.

Re-framed for context; #171 closes either way.

What the reproducer shows

@lmoresi's full reproducer (`/tmp/repro_issue_171.py` from `~/.claude/plans/repro-issue-171-integral-growth.md`): mirrors the original 1/64 Ra=1e7 setup (UnstructuredSimplexBox, P2-P1 Stokes, P3 T, qdegree=3), three cached `uw.maths.Integral` objects (volume, mean T, V·V), 100 evaluate() calls per integrand at cellsize 1/32.

t_first t_steady ratio linear slope verdict
unpatched dev (`origin/development`) 27.7 ms 16.2 ms 0.59x flat (-0.08%/call) NOT reproduced
with this PR 12 ms 5.1 ms 0.43x flat (-0.20%/call) NOT reproduced

Per-call cost is ~3× lower with the fix (16 ms → 5 ms), but the linear-growth claim that originally motivated #171 doesn't manifest on either branch at this scale.

What changed since the original report

The original convection run at 1/64 (Ra=1e7, ~3000 timesteps) showed t_vol going from 5 s to 833 s. Predicted-with-bug at 1/32 was 1.3 s → 17 s. Actual unpatched today: 28 ms → 16 ms. That's ~1000× lower than predicted-with-bug — strong evidence that some intermediate landing on development killed the linear-growth pathology, before this PR. Plausible candidates: #160 (SNES + DM-hierarchy lifecycle leak), #155 (swarm-routed point eval), #161 (ETD lifecycle changes). Bisecting which one isn't tracked here.

Why land it anyway

  • Strict perf win (~3× per call) on current dev, repeatable on a clean reproducer.
  • Eliminates the only remaining shared-DS mutation site in `Integral.evaluate()`. Closes the door on the same pathology recurring later as other lifecycle code shifts around.
  • Matches `CellWiseIntegral`'s working pattern — closes the asymmetry that originally pointed us at this site.
  • Adds explicit `dmc.destroy()` so long runs don't accumulate cloned DMs (the broader DM lifecycle issue is tracked separately).

Why the perf is faster (mechanism)

The shared mesh DS has every registered field on it (T, V, P, plus solver internals). `DMPlexComputeIntegralFEM` walks all of them per call. The cloned DM has only the dummy P1 field we attach via `setField(0, fec)`, so the per-integration DS is leaner. Same reason CellWiseIntegral's pattern works.

Test plan

  • `/tmp/repro_issue_171.py` (1/32, 100 calls): both unpatched and patched flat, patched ~3× faster.
  • `pytest -m "level_1 and tier_a"` on `amr-dev`: 62 passed, 3 skipped, 0 failed. No numerical regressions.
  • Latent dim bug noted: `CellWiseIntegral` still references nonexistent `self.dim`; this PR uses `self.mesh.dim` correctly. CellWiseIntegral fix is out of scope.

Closes

#171 — the narrative shifts but the issue resolves either way (the linear-growth pathology is gone on current dev; this PR closes the only remaining mechanism that could re-introduce it, plus delivers a 3× perf win).

Underworld development team with AI support from Claude Code

Integral.evaluate() called PetscDSSetObjective on the shared mesh DS
every call, then DMPlexComputeIntegralFEM. The user reports this makes
diagnostic-integral cost grow linearly across long runs — convection
diagnostics at Ra=1e7 / 1/64 mesh / ~3000 timesteps showed t_vol going
from 5 s to 833 s (≈ +0.65 s per integral call), while solve costs
(t_stokes, t_adv) stayed flat over the same window. By step ~3700 the
diagnostic was ~20× the actual solve.

The asymmetric sister class CellWiseIntegral.evaluate() already takes
the right path: clone the mesh DM, attach a default FE field, create a
fresh DS, set the objective on that fresh DS, integrate, and return.
This commit brings Integral.evaluate() in line with that pattern, with
one improvement over CellWiseIntegral: explicit dmc.destroy() at the
end so long runs don't accumulate cloned DMs (the broader DM-lifecycle
cleanup is tracked separately).

The fix doesn't reproduce the symptomatic linear growth at small
mesh + small call-count locally (200 calls × 100-cell mesh stays
flat at ~2.5 ms per 3 integrals both before and after the fix) — the
behaviour is scale-dependent. Confirmation at the original Ra=1e7 /
1/64 / 3000-step scale is left as a re-run on the reporter's side.

Verified
--------

  - /tmp/repro_171.py — 200 evaluate() calls × 3 integrals each on a
    100-cell mesh: post-fix mean ratio last/first = 0.04x (flat).
  - /tmp/repro_171b.py — same, with a Stokes solve interleaved between
    integrations: post-fix ratio 0.21x (flat).
  - pytest -m "level_1 and tier_a" on amr-dev: 62 passed, 3 skipped,
    0 failed. No numerical regressions.

Underworld development team with AI support from Claude Code
Copilot AI review requested due to automatic review settings May 5, 2026 04:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix the long-run performance regression in uw.maths.Integral.evaluate() by stopping it from mutating the shared mesh DS on every call. In the Underworld3 maths/PETSc integration layer, it replaces the shared-DS path with a per-call scratch DM/DS path.

Changes:

  • Reworks Integral.evaluate() to clone the mesh DM and build a fresh DS for each evaluation.
  • Switches the integral computation to run on the cloned DM instead of the shared mesh DM.
  • Adds explicit cleanup of the cloned DM after each integral call.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +132 to +135
cdef DM dmc = self.mesh.dm.clone()
cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)
dmc.setField(0, fec)
dmc.createDS()
cdef DM dm = self.dm
cdef DS ds = self.dm.getDS()
cdef DM dmc = self.mesh.dm.clone()
cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)
Comment on lines +132 to +141
cdef DM dmc = self.mesh.dm.clone()
cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)
dmc.setField(0, fec)
dmc.createDS()

cdef DS ds = dmc.getDS()
cdef PetscScalar val_array[256]

# Now set callback...
ierr = PetscDSSetObjective(ds.ds, 0, ext.fns_residual[0]); CHKERRQ(ierr)
ierr = DMPlexComputeIntegralFEM(dm.dm, cgvec.vec, &(val_array[0]), NULL); CHKERRQ(ierr)
ierr = DMPlexComputeIntegralFEM(dmc.dm, cgvec.vec, &(val_array[0]), NULL); CHKERRQ(ierr)
@lmoresi
Copy link
Copy Markdown
Member Author

lmoresi commented May 5, 2026

Apples-to-apples timing on the harder reproducer (P3 T, P2 V, full Stokes + AdvDiff per step, 800-cell mesh, 200 steps, 3 integrals per step):

per-call cost (200-step mean) linear-fit slope
unpatched dev (`origin/development`) ~15.9 ms flat (-0.003 ms/step)
with this PR's fix ~7.6 ms flat (+0.001 ms/step)

Two findings:

  1. The linear-growth symptom doesn't manifest at this scale (200 steps × 800 cells). Your original report needed the 1/64 mesh × 3000+ steps for the +0.65 s/call accumulation to be visible. So the scale-dependent confirmation still wants a re-run at the original scale on your side.

  2. The fix is ~2× faster per call even at flat-state. Bonus on top of the diagnosis. The likely cause: the shared mesh DS has every registered field (T, V, P, plus solver internals), and `DMPlexComputeIntegralFEM` walks all of them. The cloned DM in this PR has only the single dummy P1 field we attach, so the DS is leaner per integration. That's the same reason `CellWiseIntegral` pattern works.

So the perf benefit is real and immediate, even before any linear-growth claim is confirmed at scale.

Underworld development team with AI support from Claude Code

@lmoresi lmoresi changed the title Fix Integral.evaluate() linear-time growth on shared mesh DS (#171) Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative) May 5, 2026
@lmoresi lmoresi merged commit fe05c63 into development May 5, 2026
4 of 5 checks passed
@lmoresi lmoresi deleted the bugfix/integral-evaluate-linear-growth-171 branch May 5, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants