Skip to content

Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213

Open
lmoresi wants to merge 4 commits into
developmentfrom
bugfix/parallel-singular-corruption
Open

Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap#213
lmoresi wants to merge 4 commits into
developmentfrom
bugfix/parallel-singular-corruption

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 27, 2026

Summary

Makes parallel mesh adaptation (np>=3) correct for the OT/MA mesh-motion movers. Four parallel-correctness fixes, uncovered in one investigation (the last three surfaced by running the adaptive convection harness at np=4). Serial behaviour is unchanged/bit-identical throughout; tier-A is green (177 passed / 0 failed) at every stage.

1. MUMPS-in-parallel — the heap corruption (root cause)

MUMPS (parallel LU) re-solved in a loop at np>=3 corrupts the heap (probabilistic SEGV/SIGBUS/MPI-deadlock). NOT the singular pure-Neumann solve, n_components>dim, or DMClone-shared caches (all disproved). The default GMRES+GAMG never touches MUMPS and is clean. The movers crashed because every sub-solve was wired through _use_direct_solver (lagged MUMPS LU), and _use_iterative_solver reintroduced MUMPS as the GAMG coarse solver.

  • _use_direct_solver: iterative fallback under MPI; serial keeps lagged MUMPS.
  • _use_iterative_solver: GAMG coarse uses redundant+svd (not MUMPS) under MPI.
  • _wire: forwards elliptic.

2. Parallel-correct _patch_volumes

The per-vertex dual area (= lumped P1 mass diagonal) is assembled via the FE mass matrix (DM.createMassMatrix + M.1) in parallel so PETSc does the cross-rank localToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary vertices. TODO to switch to DMCreateMassMatrixLumped once petsc4py binds it.

3. Collective remesh decision

mesh_metric_mismatch computed misalignment with np.corrcoef on rank-local cells → ranks disagreed across skip_threshold and the collective mover deadlocked. Now: alignment from globally-reduced moment sums (serial bit-identical) + the skip/adapt decision OR-reduced — if any rank remeshes, all do.

4. global_evaluate for the post-adapt field remap

The field remap (evaluate old field at new DOF coords) used the local uw.function.evaluate; DOFs that move across a rank-partition seam land off-rank, where local evaluate returns stale/garbage → a growing field anomaly at the seams. Fixed to uw.function.global_evaluate (serial-identical) in core _ot_adapt.py (reference-mesh and adapted-position remaps — affects all mesh.OT_adapt users) and the harness.

Verification

  • np=5: OT/MA movers + mesh.OT_adapt(field) crash/hang-free; solvers converge and match serial to ~1e-10.
  • np=4 adaptive convection (8 steps): clean run, unanimous adapt/skip across ranks, valid non-tangled adapted mesh; the field anomaly at the partition seams is removed (smoother T, correct boundary-layer values).
  • tier-A: 177 passed, 0 failed (throughout).

Follow-ups (not in this PR)

  • A general guard/warning for pc_factor_mat_solver_type=mumps / pc_type=lu re-solved at np>=3 (UW3-wide; this PR covers the movers).
  • Boundary slivers at the partition seams (anisotropic mover slip handling) — lower priority, value-correct.

Design doc: docs/developer/design/parallel-repeated-solve-corruption.md.

Underworld development team with AI support from Claude Code

The OT/MA mesh-motion movers (meshing/smoothing.py) crashed the heap
(probabilistic SEGV/SIGBUS/MPI-deadlock) in parallel at np>=3. Root cause:
MUMPS (parallel LU) re-solved repeatedly corrupts the heap — NOT the singular
pure-Neumann solve, n_components>dim, or DMClone-shared caches (all disproved;
the default GMRES+GAMG never touches MUMPS and is clean). Every mover sub-solve
was wired through _use_direct_solver (lagged MUMPS LU), and _use_iterative_solver
reintroduced MUMPS as the GAMG coarse solver.

Fixes (mover-local; serial bit-identical, MUMPS kept as the serial 10x lever):
- _use_direct_solver: fall back to the iterative path when uw.mpi.size > 1.
- _use_iterative_solver: GAMG coarse uses redundant+svd (not MUMPS) under MPI.
- _wire: forward `elliptic` so the parallel fallback picks GAMG vs CG+Jacobi.
- _patch_volumes: in parallel, assemble the per-vertex dual area (= lumped P1
  mass diagonal) via the FE mass matrix (M.1) so PETSc does the cross-rank
  localToGlobal(ADD); the hand-rolled local sum under-counted partition-boundary
  vertices (under-refined parallel grading). Serial keeps the numpy path.
  Annotated to use DMCreateMassMatrixLumped once petsc4py binds it.

Verified at np=5: OT/MA movers + OT_adapt(field) crash/hang-free; all solver
types converge and match serial to ~1e-10; _patch_volumes serial bit-identical
(maxdiff 2.6e-18) and parallel area-conserving; tier-A 177 passed / 0 failed.

Underworld development team with AI support from Claude Code
Copilot AI review requested due to automatic review settings May 27, 2026 00:39
Underworld development team with AI support from Claude Code
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a reproducible parallel crash/hang in the adaptive mesh movers by avoiding repeated parallel MUMPS usage (direct LU) and by fixing a parallel correctness issue in per-vertex patch-volume computation used by equidistribution.

Changes:

  • Disable the MUMPS-based “direct” solver path under MPI by falling back to the iterative configuration; also remove MUMPS from the GAMG coarse solve in parallel (use redundant+svd).
  • Make _patch_volumes parallel-correct by computing the lumped P1 mass diagonal via FE assembly (M·1) when running under MPI.
  • Add a design note documenting the investigation and fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/underworld3/meshing/smoothing.py Gates MUMPS usage under MPI, adjusts GAMG coarse options, and fixes parallel patch-volume assembly for equidistribution movers.
docs/developer/design/parallel-repeated-solve-corruption.md Documents the root-cause investigation and the mover-local mitigation strategy.
Comments suppressed due to low confidence (2)

src/underworld3/meshing/smoothing.py:1414

  • In parallel (uw.mpi.size > 1), normalising patch with the local np.mean(patch) makes the scaling partition-dependent (each rank will use a different normalisation constant). Since patch is then used to define vol_field and to form b = rho_t * patch, this can introduce MPI-layout-dependent results. Prefer a partition-invariant normalisation (global mean over the MeshVariable’s global Vec) when running under MPI, while keeping the current numpy mean in serial for bit-identical behaviour.
            patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
            patch /= float(np.mean(patch))
        else:
            patch = np.ones(n_verts, dtype=np.double)
        _va = vol_field.array
        _va[...] = patch.reshape(_va.shape)

src/underworld3/meshing/smoothing.py:1761

  • patch_mean is currently computed as an average of per-rank means (allreduce(mean)/size), which is not the mean “over the domain” unless every rank owns the same number of vertices (and it can also be sensitive to ghost duplication depending on the local array layout). Since patch_mean is used to normalise both vol_field and the source term, this can make results depend on the partitioning. Prefer using the MeshVariable’s global mean under MPI (via vol_field.mean()), while keeping the numpy mean in serial for bit-identical behaviour.
            patch = _patch_volumes(tris, old_coords, n_verts, vol_field)
        # Normalise so the mean over the domain is the cell mean.
        patch_mean = float(np.mean(patch))
        if uw.mpi.size > 1:
            patch_mean = uw.mpi.comm.allreduce(patch_mean) / uw.mpi.size

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +895 to +897
if uw.mpi.size > 1:
_use_iterative_solver(solver, singular=singular, elliptic=elliptic)
return
Comment on lines +5 to +7
(lagged MUMPS LU)** path the movers wire in, *not* the Poisson solve, the DM, or
singularity. The UW3 **default GMRES+GAMG solver is clean at np=5** (10/10, even
for the singular `constant_nullspace` case). Fix is mover-local + low-risk.
@@ -0,0 +1,281 @@
# Parallel repeated-FE-solve heap corruption (np ≥ 5)
lmoresi added 2 commits May 27, 2026 11:07
Running the adaptive convection harness at np=4 deadlocked: mesh_metric_mismatch
computed the mesh/metric misalignment with np.corrcoef over rank-LOCAL cells, so
each rank got a different value straddling skip_threshold (0.34/0.91/0.77/0.45) →
ranks disagreed on whether to remesh → the collective mover deadlocked (some
ranks entered it, others skipped).

Fix (meshing/smoothing.py):
- mesh_metric_mismatch: compute the Pearson alignment from globally-reduced
  moment sums (Sx,Sy,Sxx,Syy,Sxy,n allreduced) so every rank agrees. Serial is
  bit-identical to np.corrcoef (the 1/n normalisation cancels in the ratio).
- smooth_mesh_interior: OR-reduce the skip/adapt decision so that if ANY rank
  needs to remesh, ALL ranks remesh (and all skip together otherwise). The mover
  is collective, so the decision must be unanimous. Rank-0-only logging.

Verified: np=4 adaptive convection (8 steps) runs clean — unanimous adapt at
step 2, unanimous skip at 4/6/8, valid non-tangled adapted mesh; tier-A 177
passed / 0 failed; serial path unchanged.

Underworld development team with AI support from Claude Code
After the mover moves nodes, the driving field is FE-remapped by evaluating the
OLD field at the NEW DOF coordinates. This used the LOCAL uw.function.evaluate,
but a DOF that moves near a rank-partition boundary lands in a neighbouring
rank's subdomain where a local evaluate returns stale/garbage (it does not
raise, so the bad value persists and convection amplifies it -> a growing field
anomaly localised to the partition seams).

Fix: use uw.function.global_evaluate (serial-identical drop-in, maxdiff 0.0; it
gathers/resolves off-rank points across ranks):
- meshing/_ot_adapt.py: the reference-mesh remap and the adapted-position remap
  (affects every mesh.OT_adapt(..., fields_to_remap=...) user).
- scripts/stagnant_lid_adapt_loop.py: the harness's hand-rolled T remap.

Verified: parallel OT_adapt(field) np=4 clean; np=4 convection T smoother at the
seams (peak ring jump 0.0113 -> 0.0091) and recovers the correct hotter
boundary-layer T; tier-A 177 passed / 0 failed; serial bit-identical.

Underworld development team with AI support from Claude Code
@lmoresi lmoresi changed the title Fix parallel heap-corruption in adaptive mesh-movers (MUMPS at np>=3) Parallel mesh-adaptation fixes: MUMPS heap-corruption, patch volumes, collective remesh, field remap May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants