Volume integral evaluation time grows linearly across long runs (~80x in 3000 calls)

## Symptom

In a long Rayleigh–Bénard run at Ra=1e7 (1/64 mesh, ~3100 timesteps), per-step solver costs (Stokes + advection–diffusion) stayed **flat** while the volume-integral diagnostic time (`t_vol` — used for `Vrms`, `mean_T`, `Vmax`) **grew linearly** with the number of integral evaluations.

| Quartile of run | mean `t_vol` | first sample | last sample |
|---|---|---|---|
| Q1 (steps 645–1425, 157 diag. calls) | 44 s | 5 s | 84 s |
| Q2 (1430–2210) | 149 s | 84 s | 418 s |
| Q3 (2215–2990) | 267 s | 194 s | 833 s |
| Q4 (2995–3770) | 348 s | 296 s | 576 s |

Linear fit over the diagnostic samples: **+0.13 s per simulation step** (≈ +0.65 s per integral call). By step ~3700 the diagnostic step costs ~410 s — roughly **20× the actual solve** (`t_stokes + t_adv ≈ 21 s`, also flat).

For comparison, `t_stokes` and `t_adv` over the same 3134 steps had slopes of +0.0001 s/step and +0.00009 s/step respectively (≈ 3% drift, well inside noise). So the growth is specific to the volume-integral path.

(Plot: `t_vol` vs step alongside `t_stokes` and `t_adv`, with the linear fit, can be attached.)

## Reproduction

- Script: standard convection workflow (`docs/examples/workflows/convection.py`) running `_compute_diagnostics(...)` — calls three `Integral.evaluate()` invocations per diagnostic step (Vrms, mean_T, Vmax).
- The `Integral` objects themselves are cached (one object per integrand, reused — see `convection_config.py:378–407`), so this is not JIT recompilation: the JIT cache hits every call.

## Suspect codepath

`Integral.evaluate()` in `src/underworld3/cython/petsc_maths.pyx`:

```cython
self.dm = self.mesh.dm  # .clone()                           # line 89
...
cdef DS ds = self.dm.getDS()                                  # line 105
ierr = PetscDSSetObjective(ds.ds, 0, ext.fns_residual[0])     # line 109
ierr = DMPlexComputeIntegralFEM(dm.dm, cgvec.vec, ...)        # line 110
```

Every `evaluate()` call:
1. takes a reference to the **shared** mesh DS (no clone, no clear),
2. registers a new objective callback via `PetscDSSetObjective(ds.ds, 0, fn)`,
3. computes the integral.

The DS is never cleared between calls. **Hypothesis (the obvious one):** the integration function pointers are being accumulated on the shared DS rather than replaced — every diagnostic call adds another callback that gets walked at integration time, producing the linear growth observed.

### Supporting evidence inside the same file

The sister class `CellWiseIntegral.evaluate()` (lines 245–302) takes the **opposite** approach for the same operation:

```cython
dmc = self.mesh.dm.clone()    # line 287
dmc.setField(0, fec)
dmc.createDS()                # line 290
...
PetscDSSetObjective(...)      # on the fresh DS
```

It clones the DM and creates a fresh DS for every integration. The asymmetry suggests that the accumulation issue was at least implicitly known when `CellWiseIntegral` was written — `Integral.evaluate()` inherits the bug.

The commented-out `# .clone()` on line 89 also looks like a hint that someone tried cloning here and reverted.

## Proposed fix (to investigate, not yet implemented)

Either:

1. **Clear the DS objective before re-setting:** before line 109, call something equivalent to `PetscDSSetObjective(ds.ds, 0, NULL)` — or whatever PETSc exposes for resetting per-field objectives.
2. **Match `CellWiseIntegral`:** clone the DM and `createDS()` per call (heavier, but guaranteed isolation).

Option (1) is preferred if PETSc supports it cleanly, since cloning the DM allocates per call.

## Impact

- The convection example becomes diagnostic-bound after a few thousand steps. For longer scientific runs (the 3134-step run above ran for ~24 wall-hours; ~85% of the wall-time was in `t_vol` after step 2200), this is a hard ceiling on what the workflow can do unattended.
- Affects any user calling `mesh.integrate(...)` (or `Integral.evaluate()`) repeatedly across a long run — diagnostics, time-series, mass-conservation checks, convergence monitors.
- Likely related to the broader DM lifecycle / object-accumulation pattern noted elsewhere in the codebase.

## Discovered while

Validating the SLCN advection–diffusion fix and the Rayleigh–Bénard workflow tooling — see the long Ra=1e7 1/64 cubic baseline run in `/Users/lmoresi/+Simulations/RayleighBenard/Ra1e7_1over64/`.

Underworld development team with AI support from [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume integral evaluation time grows linearly across long runs (~80x in 3000 calls) #171

Symptom

Reproduction

Suspect codepath

Supporting evidence inside the same file

Proposed fix (to investigate, not yet implemented)

Impact

Discovered while

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quartile of run	mean `t_vol`	first sample	last sample
Q1 (steps 645–1425, 157 diag. calls)	44 s	5 s	84 s
Q2 (1430–2210)	149 s	84 s	418 s
Q3 (2215–2990)	267 s	194 s	833 s
Q4 (2995–3770)	348 s	296 s	576 s

Volume integral evaluation time grows linearly across long runs (~80x in 3000 calls) #171

Description

Symptom

Reproduction

Suspect codepath

Supporting evidence inside the same file

Proposed fix (to investigate, not yet implemented)

Impact

Discovered while

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions