Symptom
In a long Rayleigh–Bénard run at Ra=1e7 (1/64 mesh, ~3100 timesteps), per-step solver costs (Stokes + advection–diffusion) stayed flat while the volume-integral diagnostic time (t_vol — used for Vrms, mean_T, Vmax) grew linearly with the number of integral evaluations.
| Quartile of run |
mean t_vol |
first sample |
last sample |
| Q1 (steps 645–1425, 157 diag. calls) |
44 s |
5 s |
84 s |
| Q2 (1430–2210) |
149 s |
84 s |
418 s |
| Q3 (2215–2990) |
267 s |
194 s |
833 s |
| Q4 (2995–3770) |
348 s |
296 s |
576 s |
Linear fit over the diagnostic samples: +0.13 s per simulation step (≈ +0.65 s per integral call). By step ~3700 the diagnostic step costs ~410 s — roughly 20× the actual solve (t_stokes + t_adv ≈ 21 s, also flat).
For comparison, t_stokes and t_adv over the same 3134 steps had slopes of +0.0001 s/step and +0.00009 s/step respectively (≈ 3% drift, well inside noise). So the growth is specific to the volume-integral path.
(Plot: t_vol vs step alongside t_stokes and t_adv, with the linear fit, can be attached.)
Reproduction
- Script: standard convection workflow (
docs/examples/workflows/convection.py) running _compute_diagnostics(...) — calls three Integral.evaluate() invocations per diagnostic step (Vrms, mean_T, Vmax).
- The
Integral objects themselves are cached (one object per integrand, reused — see convection_config.py:378–407), so this is not JIT recompilation: the JIT cache hits every call.
Suspect codepath
Integral.evaluate() in src/underworld3/cython/petsc_maths.pyx:
self.dm = self.mesh.dm # .clone() # line 89
...
cdef DS ds = self.dm.getDS() # line 105
ierr = PetscDSSetObjective(ds.ds, 0, ext.fns_residual[0]) # line 109
ierr = DMPlexComputeIntegralFEM(dm.dm, cgvec.vec, ...) # line 110
Every evaluate() call:
- takes a reference to the shared mesh DS (no clone, no clear),
- registers a new objective callback via
PetscDSSetObjective(ds.ds, 0, fn),
- computes the integral.
The DS is never cleared between calls. Hypothesis (the obvious one): the integration function pointers are being accumulated on the shared DS rather than replaced — every diagnostic call adds another callback that gets walked at integration time, producing the linear growth observed.
Supporting evidence inside the same file
The sister class CellWiseIntegral.evaluate() (lines 245–302) takes the opposite approach for the same operation:
dmc = self.mesh.dm.clone() # line 287
dmc.setField(0, fec)
dmc.createDS() # line 290
...
PetscDSSetObjective(...) # on the fresh DS
It clones the DM and creates a fresh DS for every integration. The asymmetry suggests that the accumulation issue was at least implicitly known when CellWiseIntegral was written — Integral.evaluate() inherits the bug.
The commented-out # .clone() on line 89 also looks like a hint that someone tried cloning here and reverted.
Proposed fix (to investigate, not yet implemented)
Either:
- Clear the DS objective before re-setting: before line 109, call something equivalent to
PetscDSSetObjective(ds.ds, 0, NULL) — or whatever PETSc exposes for resetting per-field objectives.
- Match
CellWiseIntegral: clone the DM and createDS() per call (heavier, but guaranteed isolation).
Option (1) is preferred if PETSc supports it cleanly, since cloning the DM allocates per call.
Impact
- The convection example becomes diagnostic-bound after a few thousand steps. For longer scientific runs (the 3134-step run above ran for ~24 wall-hours; ~85% of the wall-time was in
t_vol after step 2200), this is a hard ceiling on what the workflow can do unattended.
- Affects any user calling
mesh.integrate(...) (or Integral.evaluate()) repeatedly across a long run — diagnostics, time-series, mass-conservation checks, convergence monitors.
- Likely related to the broader DM lifecycle / object-accumulation pattern noted elsewhere in the codebase.
Discovered while
Validating the SLCN advection–diffusion fix and the Rayleigh–Bénard workflow tooling — see the long Ra=1e7 1/64 cubic baseline run in /Users/lmoresi/+Simulations/RayleighBenard/Ra1e7_1over64/.
Underworld development team with AI support from Claude Code
Symptom
In a long Rayleigh–Bénard run at Ra=1e7 (1/64 mesh, ~3100 timesteps), per-step solver costs (Stokes + advection–diffusion) stayed flat while the volume-integral diagnostic time (
t_vol— used forVrms,mean_T,Vmax) grew linearly with the number of integral evaluations.t_volLinear fit over the diagnostic samples: +0.13 s per simulation step (≈ +0.65 s per integral call). By step ~3700 the diagnostic step costs ~410 s — roughly 20× the actual solve (
t_stokes + t_adv ≈ 21 s, also flat).For comparison,
t_stokesandt_advover the same 3134 steps had slopes of +0.0001 s/step and +0.00009 s/step respectively (≈ 3% drift, well inside noise). So the growth is specific to the volume-integral path.(Plot:
t_volvs step alongsidet_stokesandt_adv, with the linear fit, can be attached.)Reproduction
docs/examples/workflows/convection.py) running_compute_diagnostics(...)— calls threeIntegral.evaluate()invocations per diagnostic step (Vrms, mean_T, Vmax).Integralobjects themselves are cached (one object per integrand, reused — seeconvection_config.py:378–407), so this is not JIT recompilation: the JIT cache hits every call.Suspect codepath
Integral.evaluate()insrc/underworld3/cython/petsc_maths.pyx:Every
evaluate()call:PetscDSSetObjective(ds.ds, 0, fn),The DS is never cleared between calls. Hypothesis (the obvious one): the integration function pointers are being accumulated on the shared DS rather than replaced — every diagnostic call adds another callback that gets walked at integration time, producing the linear growth observed.
Supporting evidence inside the same file
The sister class
CellWiseIntegral.evaluate()(lines 245–302) takes the opposite approach for the same operation:It clones the DM and creates a fresh DS for every integration. The asymmetry suggests that the accumulation issue was at least implicitly known when
CellWiseIntegralwas written —Integral.evaluate()inherits the bug.The commented-out
# .clone()on line 89 also looks like a hint that someone tried cloning here and reverted.Proposed fix (to investigate, not yet implemented)
Either:
PetscDSSetObjective(ds.ds, 0, NULL)— or whatever PETSc exposes for resetting per-field objectives.CellWiseIntegral: clone the DM andcreateDS()per call (heavier, but guaranteed isolation).Option (1) is preferred if PETSc supports it cleanly, since cloning the DM allocates per call.
Impact
t_volafter step 2200), this is a hard ceiling on what the workflow can do unattended.mesh.integrate(...)(orIntegral.evaluate()) repeatedly across a long run — diagnostics, time-series, mass-conservation checks, convergence monitors.Discovered while
Validating the SLCN advection–diffusion fix and the Rayleigh–Bénard workflow tooling — see the long Ra=1e7 1/64 cubic baseline run in
/Users/lmoresi/+Simulations/RayleighBenard/Ra1e7_1over64/.Underworld development team with AI support from Claude Code