Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative) by lmoresi · Pull Request #172 · underworldcode/underworld3

lmoresi · 2026-05-05T04:22:27Z

Summary

Replaces `Integral.evaluate()`'s shared-mesh-DS path with a per-call clone-DM + `createDS` (matching the working `CellWiseIntegral` pattern). On current development this delivers a clean ~3× per-call speedup for volume integrals.

The original linear-growth narrative from #171 no longer reproduces on current development — see "What changed" below. The fix is still worth landing as a perf cleanup and a structural improvement.

Re-framed for context; #171 closes either way.

What the reproducer shows

@lmoresi's full reproducer (`/tmp/repro_issue_171.py` from `~/.claude/plans/repro-issue-171-integral-growth.md`): mirrors the original 1/64 Ra=1e7 setup (UnstructuredSimplexBox, P2-P1 Stokes, P3 T, qdegree=3), three cached `uw.maths.Integral` objects (volume, mean T, V·V), 100 evaluate() calls per integrand at cellsize 1/32.

	t_first	t_steady	ratio	linear slope	verdict
unpatched dev (`origin/development`)	27.7 ms	16.2 ms	0.59x	flat (-0.08%/call)	NOT reproduced
with this PR	12 ms	5.1 ms	0.43x	flat (-0.20%/call)	NOT reproduced

Per-call cost is ~3× lower with the fix (16 ms → 5 ms), but the linear-growth claim that originally motivated #171 doesn't manifest on either branch at this scale.

What changed since the original report

The original convection run at 1/64 (Ra=1e7, ~3000 timesteps) showed t_vol going from 5 s to 833 s. Predicted-with-bug at 1/32 was 1.3 s → 17 s. Actual unpatched today: 28 ms → 16 ms. That's ~1000× lower than predicted-with-bug — strong evidence that some intermediate landing on development killed the linear-growth pathology, before this PR. Plausible candidates: #160 (SNES + DM-hierarchy lifecycle leak), #155 (swarm-routed point eval), #161 (ETD lifecycle changes). Bisecting which one isn't tracked here.

Why land it anyway

Strict perf win (~3× per call) on current dev, repeatable on a clean reproducer.
Eliminates the only remaining shared-DS mutation site in `Integral.evaluate()`. Closes the door on the same pathology recurring later as other lifecycle code shifts around.
Matches `CellWiseIntegral`'s working pattern — closes the asymmetry that originally pointed us at this site.
Adds explicit `dmc.destroy()` so long runs don't accumulate cloned DMs (the broader DM lifecycle issue is tracked separately).

Why the perf is faster (mechanism)

The shared mesh DS has every registered field on it (T, V, P, plus solver internals). `DMPlexComputeIntegralFEM` walks all of them per call. The cloned DM has only the dummy P1 field we attach via `setField(0, fec)`, so the per-integration DS is leaner. Same reason CellWiseIntegral's pattern works.

Test plan

`/tmp/repro_issue_171.py` (1/32, 100 calls): both unpatched and patched flat, patched ~3× faster.
`pytest -m "level_1 and tier_a"` on `amr-dev`: 62 passed, 3 skipped, 0 failed. No numerical regressions.
Latent dim bug noted: `CellWiseIntegral` still references nonexistent `self.dim`; this PR uses `self.mesh.dim` correctly. CellWiseIntegral fix is out of scope.

Closes

#171 — the narrative shifts but the issue resolves either way (the linear-growth pathology is gone on current dev; this PR closes the only remaining mechanism that could re-introduce it, plus delivers a 3× perf win).

Underworld development team with AI support from Claude Code

Integral.evaluate() called PetscDSSetObjective on the shared mesh DS every call, then DMPlexComputeIntegralFEM. The user reports this makes diagnostic-integral cost grow linearly across long runs — convection diagnostics at Ra=1e7 / 1/64 mesh / ~3000 timesteps showed t_vol going from 5 s to 833 s (≈ +0.65 s per integral call), while solve costs (t_stokes, t_adv) stayed flat over the same window. By step ~3700 the diagnostic was ~20× the actual solve. The asymmetric sister class CellWiseIntegral.evaluate() already takes the right path: clone the mesh DM, attach a default FE field, create a fresh DS, set the objective on that fresh DS, integrate, and return. This commit brings Integral.evaluate() in line with that pattern, with one improvement over CellWiseIntegral: explicit dmc.destroy() at the end so long runs don't accumulate cloned DMs (the broader DM-lifecycle cleanup is tracked separately). The fix doesn't reproduce the symptomatic linear growth at small mesh + small call-count locally (200 calls × 100-cell mesh stays flat at ~2.5 ms per 3 integrals both before and after the fix) — the behaviour is scale-dependent. Confirmation at the original Ra=1e7 / 1/64 / 3000-step scale is left as a re-run on the reporter's side. Verified -------- - /tmp/repro_171.py — 200 evaluate() calls × 3 integrals each on a 100-cell mesh: post-fix mean ratio last/first = 0.04x (flat). - /tmp/repro_171b.py — same, with a Stokes solve interleaved between integrations: post-fix ratio 0.21x (flat). - pytest -m "level_1 and tier_a" on amr-dev: 62 passed, 3 skipped, 0 failed. No numerical regressions. Underworld development team with AI support from Claude Code

Copilot

Pull request overview

This PR aims to fix the long-run performance regression in uw.maths.Integral.evaluate() by stopping it from mutating the shared mesh DS on every call. In the Underworld3 maths/PETSc integration layer, it replaces the shared-DS path with a per-call scratch DM/DS path.

Changes:

Reworks Integral.evaluate() to clone the mesh DM and build a fresh DS for each evaluation.
Switches the integral computation to run on the cloned DM instead of the shared mesh DM.
Adds explicit cleanup of the cloned DM after each integral call.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        cdef DM dmc = self.mesh.dm.clone()
+        cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)
+        dmc.setField(0, fec)
+        dmc.createDS()


-        cdef DM dm = self.dm
-        cdef DS ds = self.dm.getDS()
+        cdef DM dmc = self.mesh.dm.clone()
+        cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)


+        cdef DM dmc = self.mesh.dm.clone()
+        cdef FE fec = FE().createDefault(self.mesh.dim, 1, False, -1)
+        dmc.setField(0, fec)
+        dmc.createDS()
+
+        cdef DS ds = dmc.getDS()
        cdef PetscScalar val_array[256]

-        # Now set callback...
        ierr = PetscDSSetObjective(ds.ds, 0, ext.fns_residual[0]); CHKERRQ(ierr)
-        ierr = DMPlexComputeIntegralFEM(dm.dm, cgvec.vec, &(val_array[0]), NULL); CHKERRQ(ierr)
+        ierr = DMPlexComputeIntegralFEM(dmc.dm, cgvec.vec, &(val_array[0]), NULL); CHKERRQ(ierr)


lmoresi · 2026-05-05T04:33:12Z

Apples-to-apples timing on the harder reproducer (P3 T, P2 V, full Stokes + AdvDiff per step, 800-cell mesh, 200 steps, 3 integrals per step):

	per-call cost (200-step mean)	linear-fit slope
unpatched dev (`origin/development`)	~15.9 ms	flat (-0.003 ms/step)
with this PR's fix	~7.6 ms	flat (+0.001 ms/step)

Two findings:

The linear-growth symptom doesn't manifest at this scale (200 steps × 800 cells). Your original report needed the 1/64 mesh × 3000+ steps for the +0.65 s/call accumulation to be visible. So the scale-dependent confirmation still wants a re-run at the original scale on your side.
The fix is ~2× faster per call even at flat-state. Bonus on top of the diagnosis. The likely cause: the shared mesh DS has every registered field (T, V, P, plus solver internals), and `DMPlexComputeIntegralFEM` walks all of them. The cloned DM in this PR has only the single dummy P1 field we attach, so the DS is leaner per integration. That's the same reason `CellWiseIntegral` pattern works.

So the perf benefit is real and immediate, even before any linear-growth claim is confirmed at scale.

Underworld development team with AI support from Claude Code

Copilot AI review requested due to automatic review settings May 5, 2026 04:22

Copilot started reviewing on behalf of lmoresi May 5, 2026 04:23 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

lmoresi changed the title ~~Fix Integral.evaluate() linear-time growth on shared mesh DS (#171)~~ Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative) May 5, 2026

lmoresi merged commit fe05c63 into development May 5, 2026
4 of 5 checks passed

lmoresi mentioned this pull request May 5, 2026

Volume integral evaluation time grows linearly across long runs (~80x in 3000 calls) #171

Closed

lmoresi deleted the bugfix/integral-evaluate-linear-growth-171 branch May 5, 2026 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative)#172

Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative)#172
lmoresi merged 1 commit intodevelopmentfrom
bugfix/integral-evaluate-linear-growth-171

lmoresi commented May 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

lmoresi commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmoresi commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the reproducer shows

What changed since the original report

Why land it anyway

Why the perf is faster (mechanism)

Test plan

Closes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

lmoresi commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmoresi commented May 5, 2026 •

edited

Loading