Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap by lmoresi · Pull Request #213 · underworldcode/underworld3

lmoresi · 2026-05-27T00:39:44Z

Summary

Makes parallel mesh adaptation (np>=3) correct for the OT/MA mesh-motion movers. Four parallel-correctness fixes, uncovered in one investigation (the last three surfaced by running the adaptive convection harness at np=4). Serial behaviour is unchanged/bit-identical throughout; tier-A is green (177 passed / 0 failed) at every stage.

1. MUMPS-in-parallel — the heap corruption (root cause)

MUMPS (parallel LU) re-solved in a loop at np>=3 corrupts the heap (probabilistic SEGV/SIGBUS/MPI-deadlock). NOT the singular pure-Neumann solve, n_components>dim, or DMClone-shared caches (all disproved). The default GMRES+GAMG never touches MUMPS and is clean. The movers crashed because every sub-solve was wired through _use_direct_solver (lagged MUMPS LU), and _use_iterative_solver reintroduced MUMPS as the GAMG coarse solver.

_use_direct_solver: iterative fallback under MPI; serial keeps lagged MUMPS.
_use_iterative_solver: GAMG coarse uses redundant+svd (not MUMPS) under MPI.
_wire: forwards elliptic.

2. Parallel-correct `_patch_volumes`

The per-vertex dual area (= lumped P1 mass diagonal) is assembled via the FE mass matrix (DM.createMassMatrix + M.1) in parallel so PETSc does the cross-rank localToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary vertices. TODO to switch to DMCreateMassMatrixLumped once petsc4py binds it.

3. Collective remesh decision

mesh_metric_mismatch computed misalignment with np.corrcoef on rank-local cells → ranks disagreed across skip_threshold and the collective mover deadlocked. Now: alignment from globally-reduced moment sums (serial bit-identical) + the skip/adapt decision OR-reduced — if any rank remeshes, all do.

4. `global_evaluate` for the post-adapt field remap

The field remap (evaluate old field at new DOF coords) used the local uw.function.evaluate; DOFs that move across a rank-partition seam land off-rank, where local evaluate returns stale/garbage → a growing field anomaly at the seams. Fixed to uw.function.global_evaluate (serial-identical) in core _ot_adapt.py (reference-mesh and adapted-position remaps — affects all mesh.OT_adapt users) and the harness.

Verification

np=5: OT/MA movers + mesh.OT_adapt(field) crash/hang-free; solvers converge and match serial to ~1e-10.
np=4 adaptive convection (8 steps): clean run, unanimous adapt/skip across ranks, valid non-tangled adapted mesh; the field anomaly at the partition seams is removed (smoother T, correct boundary-layer values).
tier-A: 177 passed, 0 failed (throughout).

Follow-ups (not in this PR)

A general guard/warning for pc_factor_mat_solver_type=mumps / pc_type=lu re-solved at np>=3 (UW3-wide; this PR covers the movers).
Boundary slivers at the partition seams (anisotropic mover slip handling) — lower priority, value-correct.

Design doc: docs/developer/design/parallel-repeated-solve-corruption.md.

Underworld development team with AI support from Claude Code

The OT/MA mesh-motion movers (meshing/smoothing.py) crashed the heap (probabilistic SEGV/SIGBUS/MPI-deadlock) in parallel at np>=3. Root cause: MUMPS (parallel LU) re-solved repeatedly corrupts the heap — NOT the singular pure-Neumann solve, n_components>dim, or DMClone-shared caches (all disproved; the default GMRES+GAMG never touches MUMPS and is clean). Every mover sub-solve was wired through _use_direct_solver (lagged MUMPS LU), and _use_iterative_solver reintroduced MUMPS as the GAMG coarse solver. Fixes (mover-local; serial bit-identical, MUMPS kept as the serial 10x lever): - _use_direct_solver: fall back to the iterative path when uw.mpi.size > 1. - _use_iterative_solver: GAMG coarse uses redundant+svd (not MUMPS) under MPI. - _wire: forward `elliptic` so the parallel fallback picks GAMG vs CG+Jacobi. - _patch_volumes: in parallel, assemble the per-vertex dual area (= lumped P1 mass diagonal) via the FE mass matrix (M.1) so PETSc does the cross-rank localToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary vertices (under-refined parallel grading). Serial keeps the numpy path. Annotated to use DMCreateMassMatrixLumped once petsc4py binds it. Verified at np=5: OT/MA movers + OT_adapt(field) crash/hang-free; all solver types converge and match serial to ~1e-10; _patch_volumes serial bit-identical (maxdiff 2.6e-18) and parallel area-conserving; tier-A 177 passed / 0 failed. Underworld development team with AI support from Claude Code

Underworld development team with AI support from Claude Code

Copilot

Pull request overview

This PR addresses a reproducible parallel crash/hang in the adaptive mesh movers by avoiding repeated parallel MUMPS usage (direct LU) and by fixing a parallel correctness issue in per-vertex patch-volume computation used by equidistribution.

Changes:

Disable the MUMPS-based “direct” solver path under MPI by falling back to the iterative configuration; also remove MUMPS from the GAMG coarse solve in parallel (use redundant+svd).
Make _patch_volumes parallel-correct by computing the lumped P1 mass diagonal via FE assembly (M·1) when running under MPI.
Add a design note documenting the investigation and fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/underworld3/meshing/smoothing.py`	Gates MUMPS usage under MPI, adjusts GAMG coarse options, and fixes parallel patch-volume assembly for equidistribution movers.
`docs/developer/design/parallel-repeated-solve-corruption.md`	Documents the root-cause investigation and the mover-local mitigation strategy.

Comments suppressed due to low confidence (2)

src/underworld3/meshing/smoothing.py:1414

In parallel (uw.mpi.size > 1), normalising patch with the local np.mean(patch) makes the scaling partition-dependent (each rank will use a different normalisation constant). Since patch is then used to define vol_field and to form b = rho_t * patch, this can introduce MPI-layout-dependent results. Prefer a partition-invariant normalisation (global mean over the MeshVariable’s global Vec) when running under MPI, while keeping the current numpy mean in serial for bit-identical behaviour.

            patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
            patch /= float(np.mean(patch))
        else:
            patch = np.ones(n_verts, dtype=np.double)
        _va = vol_field.array
        _va[...] = patch.reshape(_va.shape)

src/underworld3/meshing/smoothing.py:1761

patch_mean is currently computed as an average of per-rank means (allreduce(mean)/size), which is not the mean “over the domain” unless every rank owns the same number of vertices (and it can also be sensitive to ghost duplication depending on the local array layout). Since patch_mean is used to normalise both vol_field and the source term, this can make results depend on the partitioning. Prefer using the MeshVariable’s global mean under MPI (via vol_field.mean()), while keeping the numpy mean in serial for bit-identical behaviour.

            patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
        # Normalise so the mean over the domain is the cell mean.
        patch_mean = float(np.mean(patch))
        if uw.mpi.size > 1:
            patch_mean = uw.mpi.comm.allreduce(patch_mean) / uw.mpi.size

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if uw.mpi.size > 1:
+        _use_iterative_solver(solver, singular=singular, elliptic=elliptic)
+        return


+(lagged MUMPS LU)** path the movers wire in, *not* the Poisson solve, the DM, or
+singularity. The UW3 **default GMRES+GAMG solver is clean at np=5** (10/10, even
+for the singular `constant_nullspace` case). Fix is mover-local + low-risk.


@@ -0,0 +1,281 @@
+# Parallel repeated-FE-solve heap corruption (np ≥ 5)


Running the adaptive convection harness at np=4 deadlocked: mesh_metric_mismatch computed the mesh/metric misalignment with np.corrcoef over rank-LOCAL cells, so each rank got a different value straddling skip_threshold (0.34/0.91/0.77/0.45) → ranks disagreed on whether to remesh → the collective mover deadlocked (some ranks entered it, others skipped). Fix (meshing/smoothing.py): - mesh_metric_mismatch: compute the Pearson alignment from globally-reduced moment sums (Sx,Sy,Sxx,Syy,Sxy,n allreduced) so every rank agrees. Serial is bit-identical to np.corrcoef (the 1/n normalisation cancels in the ratio). - smooth_mesh_interior: OR-reduce the skip/adapt decision so that if ANY rank needs to remesh, ALL ranks remesh (and all skip together otherwise). The mover is collective, so the decision must be unanimous. Rank-0-only logging. Verified: np=4 adaptive convection (8 steps) runs clean — unanimous adapt at step 2, unanimous skip at 4/6/8, valid non-tangled adapted mesh; tier-A 177 passed / 0 failed; serial path unchanged. Underworld development team with AI support from Claude Code

After the mover moves nodes, the driving field is FE-remapped by evaluating the OLD field at the NEW DOF coordinates. This used the LOCAL uw.function.evaluate, but a DOF that moves near a rank-partition boundary lands in a neighbouring rank's subdomain where a local evaluate returns stale/garbage (it does not raise, so the bad value persists and convection amplifies it -> a growing field anomaly localised to the partition seams). Fix: use uw.function.global_evaluate (serial-identical drop-in, maxdiff 0.0; it gathers/resolves off-rank points across ranks): - meshing/_ot_adapt.py: the reference-mesh remap and the adapted-position remap (affects every mesh.OT_adapt(..., fields_to_remap=...) user). - scripts/stagnant_lid_adapt_loop.py: the harness's hand-rolled T remap. Verified: parallel OT_adapt(field) np=4 clean; np=4 convection T smoother at the seams (peak ring jump 0.0113 -> 0.0091) and recovers the correct hotter boundary-layer T; tier-A 177 passed / 0 failed; serial bit-identical. Underworld development team with AI support from Claude Code

Copilot AI review requested due to automatic review settings May 27, 2026 00:39

Copilot started reviewing on behalf of lmoresi May 27, 2026 00:39 View session

docs: mark parallel-mover-corruption design note as FIXED (PR #213)

a4484cf

Underworld development team with AI support from Claude Code

Copilot AI reviewed May 27, 2026

View reviewed changes

lmoresi added 2 commits May 27, 2026 11:07

lmoresi changed the title ~~Fix parallel heap-corruption in adaptive mesh-movers (MUMPS at np>=3)~~ Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213

Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213
lmoresi wants to merge 4 commits into
developmentfrom
bugfix/parallel-singular-corruption

lmoresi commented May 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,281 @@
		# Parallel repeated-FE-solve heap corruption (np ≥ 5)

Conversation

lmoresi commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. MUMPS-in-parallel — the heap corruption (root cause)

2. Parallel-correct _patch_volumes

3. Collective remesh decision

4. global_evaluate for the post-adapt field remap

Verification

Follow-ups (not in this PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmoresi commented May 27, 2026 •

edited

Loading

2. Parallel-correct `_patch_volumes`

4. `global_evaluate` for the post-adapt field remap