diff --git a/docs/advanced/mesh-adaptation.md b/docs/advanced/mesh-adaptation.md
index 6120fa2c..9d33db38 100644
--- a/docs/advanced/mesh-adaptation.md
+++ b/docs/advanced/mesh-adaptation.md
@@ -60,6 +60,28 @@ for each edge vector $\mathbf{e}$. Edges that are too long get subdivided; regio
 **Key insight**: Higher metric values produce finer mesh. If you want 10× refinement, the metric values should be ~100× larger (since $M \propto 1/h^2$).
 ```
 
+### Two families of mesh adaptation
+
+UW3 offers **two complementary** ways to put resolution where it is
+needed:
+
+| | `mesh.adapt(...)` (this page) | `smooth_mesh_interior(method="anisotropic")` |
+|---|---|---|
+| Mechanism | **Re-mesh** (MMG): insert/remove/retriangulate | **Redistribute** the existing nodes (move only) |
+| Node budget | *Changes* — targets an **absolute** edge length `h` | **Fixed** — relative redistribution to a target *density* |
+| Topology | New mesh, **variables reset** (must transfer) | **Unchanged** — variables, DOFs, partition preserved |
+| Grading reach | Strong (can add nodes → ~10×) | Capped by the node count (~1.5–2×) |
+| Cell shape | Isotropic (`M = h⁻²I`) | **Anisotropic** — cells aligned to the feature |
+| Cost | Re-mesh + full variable transfer | A few cheap SPD elliptic solves (no re-mesh) |
+| Parallel | MMG re-partition | O(N), GAMG-parallelisable, no transfer |
+
+Use `mesh.adapt` when you need a genuinely finer mesh (more
+elements) and can afford to rebuild the problem. Use the
+**node-snuggling** redistribution when you want to *reshape* the
+existing mesh toward a feature every timestep cheaply, keeping the
+topology (and all fields) intact — see the **Node redistribution**
+section below.
+
 ---
 
 ## Metric Creation Functions
@@ -353,6 +375,69 @@ For the mathematically inclined, see the [Developer Design Document](../develope
 
 ---
 
+## Node redistribution — the snuggling mover
+
+When you want to concentrate resolution on an evolving feature
+**every timestep** without re-meshing — keeping the topology and
+all field data intact — use the anisotropic metric mover instead
+of `mesh.adapt`:
+
+```python
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+# ... mesh + a temperature field T after some solve ...
+
+# Relative target DENSITY from |∇T| (the fixed-node-budget
+# analogue of metric_from_gradient: same percentile-window idea,
+# but ρ is a *density*, not an absolute h — there is no node
+# budget to spend on an absolute size).
+rho = metric_density_from_gradient(mesh, T, amp=8.0)
+
+# Move the nodes to that metric (topology / DOFs / variables
+# all preserved — no transfer needed).
+smooth_mesh_interior(
+    mesh, metric=rho, method="anisotropic",
+    method_kwargs=dict(aniso_cap=2.0, relax=0.2, n_outer=12))
+```
+
+`metric_density_from_gradient` builds
+$\rho = 1 + \mathrm{amp}\cdot t$, $t = \mathrm{clip}\big((|\nabla
+T| - g_{lo})/(g_{hi}-g_{lo}),0,1\big)$ with $g_{lo},g_{hi}$ the
+lo/hi percentiles of $|\nabla T|$ — deliberately the same shape as
+{py:func}`underworld3.adaptivity.metric_from_gradient`, so the
+*intent* you express is identical whichever family you choose.
+The mover then builds a gradient-derived **anisotropic tensor**
+metric internally and solves an M-weighted Laplace (Winslow)
+coordinate map.
+
+```{important}
+This is a **gradient** metric: it resolves where the field
+*changes* (boundary layers, fronts, plume edges), and is
+isotropic-coarse at a smooth *peak* ($\nabla\rho=0$) — it
+deliberately de-refines a feature's core. For core resolution a
+curvature (Hessian) metric is the (future) tool. It also does
+**not** beat the fixed node-count cap — for a *separable* feature
+the explicit 1-D OT is exact and cheaper; the mover earns its keep
+on general non-separable features and on cell-alignment / quality
+(it never produces slivers).
+```
+
+Key knobs (via `method_kwargs`): `aniso_cap` (max cell anisotropy
+— the binding stability lever; ≈2 robust, ≳6 folds), `relax`
+(damping), `n_outer` (composed damped steps), `linear_solver`
+(`"direct"` MUMPS, or `"gamg"` for the parallel-scalable path —
+validated bit-parity). The full mathematical derivation (OT /
+Monge–Ampère, the metric-tensor / Winslow mover, dynamic field
+handling, Nusselt) is in
+{doc}`/developer/design/mesh-adaptation-formulation`; operational
+detail in {doc}`/developer/subsystems/mesh-metric-redistribution`;
+the dated R&D log in
+`docs/developer/design/ma-newton-cofactor-exploration.md`.
+
+---
+
 ## References
 
 1. MMG Platform: https://www.mmgtools.org/
diff --git a/docs/developer/design/ma-newton-cofactor-exploration.md b/docs/developer/design/ma-newton-cofactor-exploration.md
new file mode 100644
index 00000000..72acfc89
--- /dev/null
+++ b/docs/developer/design/ma-newton-cofactor-exploration.md
@@ -0,0 +1,888 @@
+# Monge–Ampère mesh redistribution: Newton/cofactor linearisation
+
+> **Status**: exploration (Phase 0), `feature/winslow-mesh-smoother`,
+> 2026-05-17. Companion to
+> `docs/developer/subsystems/mesh-metric-redistribution.md` (the
+> shipped BFO-Picard + direct-solver work) and the project memory
+> `project-ma-efficiency-direct-solver`.
+
+## Motivation
+
+The shipped MA path (`_winslow_elliptic`) is a damped
+**Benamou–Froese–Oberman Picard** iteration: each iteration solves a
+*constant-coefficient* Poisson `Δφ = √((φxx−φyy)²+4φxy²+4g)−2` with
+the recovered Hessian of the previous iterate, ~20–25 iterations,
+under-relaxation `ω=0.4`. The constant operator is what made the
+factor-once-reuse direct-solver speedup (~10×) possible — but that is
+a **serial** expedient (sparse direct factorisation does not scale to
+large-3D parallel per-timestep use; this build has only MUMPS + GAMG,
+no hypre/SuperLU_DIST).
+
+A **Newton / quasi-Newton** linearisation is the textbook approach for
+smooth MA / mesh redistribution / OT. Linearising
+`R(φ)=det(I+D²φ)−g`:
+
+$$ \operatorname{cof}(I+D^2\varphi_k) : D^2\,\delta\varphi
+   \;=\; g-\det(I+D^2\varphi_k), \qquad
+   \varphi_{k+1}=\varphi_k+\lambda\,\delta\varphi $$
+
+Using the Jacobi (Piola) identity `∂_i cof(M)_{ij}=0`, the weak form
+is the **symmetric variable-coefficient elliptic** problem
+
+$$ a(\delta\varphi,v)=\int (C_k\nabla\delta\varphi)\cdot\nabla v,
+   \qquad C_k=\operatorname{cof}(I+D^2\varphi_k), $$
+
+with `C_k` SPD iff `φ_k` is convex (Brenier branch). In 2D
+`C_k = [[1+φyy, −φxy],[−φxy, 1+φxx]] = det(M_k)·M_k⁻ᵀ`. Only **first
+derivatives of the unknown** appear (in the flux `F1=C_k∇δφ`); all
+2nd-derivative content is in the *coefficient* `C_k`, read from the
+existing recovered-Hessian field (`_hessian_recovery_class`,
+first-derivatives-only — UW3-legal).
+
+It slots into the existing `uw.systems.Poisson` (`SNES_Scalar`):
+`F1 = constitutive_model.flux = c·∇u`, so a `DiffusionModel` subclass
+with `_c = C_k` *is* the Newton operator; `f = det(I+H_k)−g` is the
+source; `constant_nullspace` handles the pure-Neumann singularity
+exactly as the BFO path does. Single-field scalar SNES — **not** the
+rejected fully-coupled (φ,H) SNES.
+
+### What it can and cannot change
+
+- **Cannot** change the fixed-node grading ceiling (≈1.5–1.8× for an
+  8–20× target). Same equation, same recovered Hessian ⇒ same fixed
+  point. The OT ~10× needs *more nodes* (settled — see
+  `project-ma-recovered-hessian-picard-inadequate`). Newton is **not a
+  grading lever**; `ma_cost_grading.py` (1.02/1.43/1.71/1.54) is the
+  regression guard.
+- **Can** change convergence: few Newton iterations vs ~20–25 Picard
+  ⇒ insensitive to per-iteration setup cost (the GAMG-resetup failure
+  mode), and the per-step operator is SPD variable-coefficient
+  elliptic ⇒ **AMG-friendly** ⇒ the right structure for the parallel
+  rework.
+
+## Phase 0 — residual-contraction quantification
+
+**Goal**: confirm Newton contracts the MA residual
+`r_k = det(I+H_k) − g` in a handful of iterations vs the BFO-Picard's
+~20–25, on the canonical res-16 Annulus, *before* any source changes.
+Both schemes share the φ field, the recovered-Hessian solver, the `c`
+normalisation, `g`, the constant nullspace and pinned BCs — the **only
+difference is the inner potential update**. Geometry is held fixed
+(no node move) to isolate solver contraction.
+
+Script: `scripts/ma_newton_phase0.py` (no `src/` changes; uses
+`smoothing._hessian_recovery_class`, `_use_direct_solver`,
+`_auto_pinned_labels`).
+
+### Results
+
+**Run 1 (AMP=8, RES=16) — a methodological finding.** Measuring
+contraction of `r_k = det(I+H_k) − g` (H recovered) was the *wrong
+yardstick*: it has a large **irreducible floor** that *neither*
+scheme reduces — BFO plateaus at `‖r‖≈0.29` (from 0.46), Newton at
+`≈0.34`. That floor is precisely the recovered-Hessian
+under-estimation of `det(D²φ)` that the project memory identifies as
+the root cause of the ≈1.5–1.8× single-solve cap (the FE-MA fixed
+point is *self-consistently under-deformed*; `det(I+H_rec)−g` is O(0.3)
+even at the exact FE solution). Strong confirmation that **Newton on
+the cofactor operator cannot beat the grading ceiling** (same
+recovered Hessian, same floor) — exactly as predicted; it is not a
+grading lever.
+
+Consequence for the experiment: a residual-decrease line search on
+`‖det(I+H_rec)−g‖` is meaningless here (it rejected almost every
+Newton step, collapsing `λ→0.008` and freezing the iteration — *not*
+a fair Newton test). The correct Phase-0 question is the
+*efficiency* one: **does Newton reach the same fixed-node transport
+map in far fewer iterations than BFO's ~20–25?** The valid metric is
+the **transport-map increment** `Δ_k = ‖∇φ_k − ∇φ_{k-1}‖∞` (→0 as
+the map converges) and the realised `max|∇φ|` / honest grading
+(must match BFO — the regression guard). Run 2 uses that, with
+fixed damping (no residual-rejection; keep only a `det(I+H)>0`
+convexity backtrack).
+
+**Run 2 (AMP=8, RES=16) — transport-map contraction.** Metric:
+`d_k=max|∇φ_k|`, increment `Δ_k=max|∇φ_k−∇φ_{k-1}|`; final honest
+`d/n` after one signed-area-backtracked move. Three Newton
+convexity-safeguard variants, all vs the shipped BFO-Picard.
+
+| scheme | converges? | iters (Δ<1e-3·d₀) | final d/n | note |
+|---|---|---|---|---|
+| **BFO-Picard** (`+√` branch, ω=0.4) | yes | **16** | **1.713** | shipped; reference |
+| Newton, residual line-search | no | — (frozen) | — | λ→0.008; the `det(I+H_rec)−g` floor (Run 1) makes the search objective meaningless |
+| Newton, `det>0` backtrack only | no | — (stalls) | 1.58 | recovered-H noise breaks convexity under a finite step ⇒ λ→0.002, under-deforms |
+| Newton, **PD-projected H** (eps=0.05) | no | — (creeps) | 1.49 | no λ collapse, but `Δ_k` plateaus ≈2e-3 (never contracts), overshoots `max|∇φ|` past BFO, map inverts cells (move scale→0.5) |
+
+### Verdict — Newton/cofactor is NOT the efficiency/parallel path
+
+Decisive negative result, consistent with and extending the settled
+memory:
+
+1. **It cannot beat the grading cap** (predicted): same recovered
+   Hessian ⇒ same `det(I+H_rec)−g` floor (Run 1). Not a grading lever.
+2. **It is *less robust* than BFO at the same recovered-Hessian
+   quality** (new): all three convexity safeguards from the standard
+   remedy list fail to reach BFO's fixed point — the iteration either
+   freezes, stalls under-deformed (1.58), or creeps past it into a
+   cell-inverting state (1.49). BFO reaches d/n 1.713 in 16 iters.
+3. **Root cause**: BFO's `Δφ=√((φxx−φyy)²+4φxy²+4g)−2` is not "just a
+   linearisation" — it is a *closed-form convex-branch solve* that
+   expresses the new Laplacian via `g` and only the **deviatoric**
+   part of the recovered Hessian, side-stepping the noisy/
+   under-estimated full `det`. The cofactor-Newton operator feeds the
+   full noisy recovered Hessian into *both* the variable coefficient
+   `C_k` *and* the residual `det(I+H_k)−g`; at this recovery quality
+   that is fragile (non-convex repulsion) or, once convexity is
+   forced by projection, no longer the true MA equation (drifts
+   instead of contracting). UW3 forbids 2nd derivatives of mesh-var
+   functions, so a genuinely sharp `D²φ` (which Newton needs) is not
+   available — the original footgun. Newton would only pay off with a
+   fundamentally better Hessian / a wide-stencil MA discretisation:
+   research effort, **no expected grading gain (settled) and now a
+   demonstrated robustness loss**. Do not pursue.
+
+### Implication for the parallel requirement
+
+The validated efficiency lever stays the **factor/setup-once-reuse on
+the constant BFO Laplacian** (shipped, ~10× serial via MUMPS). For
+**parallel**, port that exact pattern to GAMG (the only AMG in this
+build): build the GAMG hierarchy **once per `_winslow_elliptic` call**
+(the operator is constant across the ~25 BFO iters) via
+`snes_lag_jacobian=-2` / `KSPSetReusePreconditioner` with the constant
+near-nullspace already wired, and warm-start the Krylov from the
+previous Picard φ. Parallel-scalable, keeps BFO's robust convex-branch
+structure, preserves grading. This — not Newton — is the parallel
+work item. Script: `scripts/ma_newton_phase0.py`; data
+`/tmp/metric_mesh/ma_newton_phase0.npz`.
+
+## BFO + GAMG-reuse parallel prototype — tested, fragile (2026-05-17)
+
+Wired as a *selectable* path: `_winslow_elliptic(...,
+linear_solver="gamg")` (default stays `"direct"`).
+`_use_iterative_solver`: FGMRES + GAMG(SOR smoother) for the elliptic
+φ-Poisson — CG was *not* justified there (UW3 DMPlex-FEM assembly +
+Neumann/nullspace gives no exact symmetry guarantee, and the SOR
+smoother is non-symmetric ⇒ non-SPD preconditioner; FGMRES tolerates
+both); CG + Jacobi for the provably-SPD mass systems.
+`snes_lag_jacobian=-2` / `snes_lag_preconditioner=-2` so the GAMG
+hierarchy is built **once per call** and reused across the ~25 Picard
+iters (verified: φ-KSP iter count flat ≈75 once warm), Krylov
+warm-started from the previous Picard φ.
+
+The reuse mechanism works and **grading is bit-for-bit preserved
+where it converges**. But the path is **not robust and does not
+scale here** (`scripts/ma_solver_scaling.py`, AMP=8, direct = serial
+MUMPS):
+
+| RES | nodes | direct cold/warm | gamg cold/warm | d/n dir/gmg |
+|----|------|------------------|----------------|-------------|
+| 24 | 1748 | 3.1 / 3.8 s | 27.7 / 27.6 s | 1.712 / **1.007** ⚠ |
+| 32 | 3059 | 6.9 / 8.7 s | 7.2 / 15.1 s | 1.722 / 1.722 |
+| 48 | 6655 | 11.5 / 23.2 s | 16.3 / **69.2** s | 1.729 / 1.729 |
+
+- **res-24 fails outright** — `DIVERGED_LINEAR_SOLVE` after 0 iters,
+  φ≈0, d/n 1.007 (no-op). A *correctness* failure at one resolution
+  while 32/48 converge: the hallmark of the documented
+  GAMG-on-pure-Neumann + `constant_nullspace` + warm-resolve
+  fragility (see the `_attach_constant_nullspace` code comment and
+  `project-ma-efficiency-direct-solver`).
+- Where it converges it is **2–3× slower than direct** and the
+  **warm≫cold degradation returns** (res-48: gamg warm 69 s vs cold
+  16 s) — the precise pathology the direct path *eliminated*. The
+  gamg/direct ratio is erratic (7.3 / 1.75 / 3.0), **not** shrinking
+  with N: no scalability signal at feasible 2D sizes.
+
+### Two challenges that reshaped the verdict
+
+**(a) "Did you wire the nullspace in?"** Verified at runtime: yes —
+on the gamg path `ps.constant_nullspace=True` attaches the constant
+`MatNullSpace` to the operator, the near-nullspace *and* the KSP
+operator, cold *and* warm. The divergence is **not** a missing/
+unprojected nullspace; the warm KSP runs to `its=10000`,
+`reason=-3` (DIVERGED_ITS) — a GAMG *convergence* failure. The
+direct path masks this entirely (MUMPS `icntl_24` null-pivot
+detection solves the singular system irrespective of the PETSc
+nullspace), which is why the iterative path is the first place a
+conditioning problem surfaces.
+
+**(b) "Why P3?"** No good reason — inherited from the original BFO
+implementation. Sweeping φ∈{P1,P2,P3} × {direct,gamg}
+(`scripts/ma_phi_order.py`):
+
+| effect | finding |
+|---|---|
+| grading is set by φ **order**, not the solver | P2 ≡ P3 (≈1.71); **P1 is ~18 % weaker** (≈1.40) — P1 is *not* grading-equivalent, P2 is the floor |
+| P3 is a **major GAMG confound** | res-24: P2+gamg converges (its=77, d/n 1.709 ✓) exactly where P3+gamg catastrophically fails (10000 its, d/n 1.007 ✗) |
+| P2 does **not** fully cure GAMG | res-32 P2 *warm* still diverges — GAMG remains erratic across (res, cold/warm) even at P2 |
+
+### Bankable win, independent of the parallel question
+
+φ=P2 ≡ P3 grading to ~3 dp across AMP 0/2/8/20 on the **direct**
+path (1.022/1.434/1.707/1.542 vs the recorded 1.02/1.43/1.71/1.54;
+AMP=0 no-op exact; no tangle) at **~2× lower cost** (smaller
+matrices — which also *helps* the direct factorisation scale, the
+exact opposite of a scaling concern). **`phi_degree` default is now
+2.** Canonical `cost_compare.py` at P2: MA cold ≈0.7–0.9 s (vs ~12–18 s
+original), grading bit-for-bit. Combined with the factor-once-reuse
+work this is ~15–20× over the original GAMG baseline.
+
+### Verdict & recommendation
+
+GAMG's failure was *partly* an own-goal (P3) — at P2 it converges in
+many more cases — but P2 still leaves it **erratic on the warm
+(post-`_deform_mesh`) re-solve**, so it is not a robust parallel
+path yet. Combined with: no alternative AMG in this build (hypre/ML
+absent), 2D sparse-direct being near-optimal at every feasible size,
+and (decisively) the user's accepted position — **MUMPS direct is
+fine for now; smaller matrices (P2) only help its scaling.** Keep
+`linear_solver="direct"` (MUMPS — itself MPI-parallel) as the
+validated path; retain `"gamg"` as experimental/documented-fragile
+(do not delete — lag/reuse machinery is correct). A robust iterative
+path would still need the pure-Neumann operator de-fragilised
+(single Dirichlet pin, not the constant nullspace — ∇φ is unaffected
+by the additive constant) and/or hypre, and is **gated behind**
+parallel-exact assembly + 3D (the smoother is 2D-triangle-only,
+serial-exact-assembly-only — the linear solver is *not* the parallel
+bottleneck yet). Scripts: `ma_gamg_vs_direct.py`,
+`ma_solver_scaling.py`, `ma_phi_order.py`, `ma_phi2_validate.py`.
+
+### Spring as the MA initial guess — settled (do not re-run)
+
+Asked whether seeding MA from the cheap `_winslow_spring` result
+helps convergence. This is **settled-rejected** in
+`project-ma-recovered-hessian-picard-inadequate`: spring-as-MA-
+preconditioner is dead — at full AMP the spring drives a cell to
+near-degeneracy and MA's signed-area backtrack *prevents* inversion
+but cannot *cure* an already-degenerate start (it freezes); a
+mild-spring→MA does converge but is **net slower than MA-only**
+(the spring pass costs without cutting MA's ~25 Picard iters enough
+to pay for itself). The mechanism is geometric — independent of
+φ-order or solver speed — so the conclusion stands, and with MA now
+~0.8 s the spring complexity is even less attractive. Not pursued.
+
+### P1 vs P2 × GAMG, scaling with #triangles (check, 2026-05-17)
+
+`scripts/ma_p1_gamg_scaling.py`, AMP=8, RES 16→64 (1.5k→22.7k tris):
+
+- **P1 does not rescue GAMG.** When P1+GAMG converges it is
+  textbook-good — **18–22 iters, N-independent** (vs P2's
+  77→99→103, slowly growing) — confirming P1 is genuinely more
+  AMG-friendly. *But it still fails erratically*: P1+GAMG diverges
+  at res-32 (10000 its) and res-64 (r=-4, d/n collapses to 1.021
+  no-op). P2+GAMG fails at 16 and 32. Neither order is reliable
+  across the sweep — the pure-Neumann + warm-resolve breakdown is
+  **order-independent and resolution-erratic**. Direct (MUMPS) is
+  `✓` at every (res, order).
+- Grading holds at every resolution: P1 ≈1.40 (1.397–1.421), P2
+  ≈1.71–1.75 — P1 is ~18 % weaker *everywhere*, not a grading
+  option regardless of solver.
+- **More important side-finding (direct path):** the *warm* cost
+  scales badly with N. P2-direct warm: 1.3 s (res-16) → 17.8 s
+  (res-48) → **46.4 s (res-64)**, far above cold (9.5 s at res-64).
+  The per-call post-`_deform_mesh` rebuild + MUMPS refactorisation +
+  cache-invalidated `evaluate()` re-interpolation is O(N)-growing
+  and re-opens a warm≫cold gap at realistic resolution. This — not
+  the GAMG question — is the next per-timestep-scaling work item
+  (the res-16 warm≈cold result does not extrapolate). Scripts add
+  `ma_p1_gamg_scaling.py`.
+
+### d/n is anisotropy/sliver-blind — rim over-collapse (2026-05-17)
+
+User flagged the P2 rim cells as far tighter than the nominal 1/3.
+`scripts/ma_radial_anisotropy.py` (res-16, AMP=8, vs undeformed):
+
+| | band-mean radial (rim) | **min radial** | minA/meanA |
+|---|---|---|---|
+| undeformed | 1.00 | 1.00 | 0.575 |
+| P1 | 0.65 | 0.43 | 0.240 |
+| P2 | 0.38 (~1/3) | **0.14 (~1/7)** | **0.019** |
+| P3 | 0.38 | 0.13 | 0.026 |
+
+The reported deep/near ≈1.71 is a **per-node mean of all incident
+edges** — it averages the collapsed *radial* edges with the
+frozen/expanded *tangential* ones (tangential edges actually grow in
+the interior; see the figure) and so hides a near-degenerate radial
+sliver layer. Band-mean radial ≈0.38× matches the isotropic edge
+criterion, but the **thinnest layer is ≈0.14× (~1/7)** and the
+smallest cell is ~1/52 of the mean area.
+
+**Mechanism:** the outer ring is *pinned* (it is the boundary) and
+the metric peaks *exactly at* r=R_O — equidistribution demands
+maximal density where nodes cannot move, so it jams the next
+ring(s) against the fixed wall into one sliver layer, **independent
+of AMP**. The isotropic `AMP = 1/s² − 1` design rule is wrong here:
+in an annulus all transport is radial (tangential node count
+frozen) *and* a boundary-peaked metric against a pinned boundary
+over-collapses the wall layer.
+
+**Consequences:** (1) d/n is fine as a *regression/consistency*
+guard but does **not** certify mesh quality near a boundary-peaked
+feature — use `minA/meanA` or a radial/tangential split. (2) Levers:
+offset the Gaussian peak inward (`r=R_O−k·W`, k≈2–3) so the band
+sits where nodes can redistribute on both sides; or cap AMP to a
+quality floor (`minA/meanA ≥ 0.1` ⇒ AMP ≲ 3); or design the metric
+from the *pinned-boundary 1-D radial OT*, not the isotropic rule.
+Fig `/tmp/metric_mesh/ma_radial_profile.png`; script
+`ma_radial_anisotropy.py`.
+
+### Localised features: GAMG is robust + the "snuggle" metric fix (2026-05-17)
+
+User: nodes should "snuggle up close to the feature"; the rim
+example was "too local" (bulk has no metric gradient → doesn't
+move). Interior blob (0.78,0), AMP=8, `ma_localised_reach_gamg.py`
++ `ma_heavytail_metric.py`:
+
+| metric | far/near (resolution) | inward (distant→feature) | minA | GAMG |
+|---|---|---|---|---|
+| Gaussian W=0.12 | 2.42 | +0.008 | 0.105 | ✓ ~30 it |
+| Gaussian W=0.30 | 1.55 | +0.010 | 0.267 | ✓ ~30 it |
+| **Lorentzian (core 0.12 + 1/d² tail)** | **2.74** | **+0.025** | 0.089 | ✓ ~31 it |
+
+- **A wider Gaussian is the WRONG fix.** One Gaussian width sets
+  *both* the resolution scale and the reach: narrow ⇒ sharp but
+  isolated pucker (bulk idle); broad ⇒ global motion but the
+  feature washes out (far/near→1.5). The fix is a **heavy-tailed
+  (Lorentzian) monitor**: a sharp core (best feature resolution,
+  far/near 2.74) + a slow `1/d²` tail (∇ρ≠0 everywhere ⇒ distant
+  nodes migrate IN ~3× more). The whole mesh rakes coherently
+  toward the feature (`/tmp/metric_mesh/ma_heavytail.png`). Mild
+  quality cost (minA 0.089 vs 0.105), no tangle. This is the
+  standard r-adaptation lesson (monitor needs global reach — heavy
+  tail or post-smoothing — not a narrow bump).
+- **GAMG is ROBUST for localised interior cases — revises the
+  earlier verdict.** Every metric shape × width × resolution
+  converged in ~27–54 iters, cost competitive with direct, *zero*
+  failures. The earlier GAMG fragility was **specifically** the
+  boundary-peaked-metric-against-pinned-boundary pathology (metric
+  spiking where the operator is pinned/singular). For the realistic
+  localised-feature use case the parallel GAMG path is viable —
+  the blanket "GAMG fragile" should be read as "fragile only for a
+  metric peaked on the pinned boundary". Scripts:
+  `ma_localised_reach_gamg.py`, `ma_heavytail_metric.py`.
+
+### Polar metric + boundary slip — settled negative (2026-05-18)
+
+Tested "define the metric in (r,θ) so it pulls in θ" + boundary
+slip. `ma_polar_lorentzian_slip{,_v2}.py`, `ma_lorentzian_slip_final.py`,
+interior/near-rim feature, AMP=8 res-24 (compact Cartesian
+Lorentzian at an *interior* point gave far/near 2.74 — the
+reference):
+
+| variant | far/near | rim drift | GAMG |
+|---|---|---|---|
+| polar, chord 2(1−cosΔθ) | 1.38 | — | ✓ |
+| polar, true wrapped angle, balanced cores | 1.12 | 1e-16 | ✓ |
+| compact Cartesian Lorentzian near rim, slip off | 1.21 | 1e-16 | ✓ |
+| …slip on | 1.12 (minA 0.32→0.48) | 3e-16 | ✓ |
+
+1. **Separable (r,θ) Lorentzian is the wrong shape** — an
+   anisotropic spoke, not a blob: the chord `2(1−cosΔθ)` saturates
+   at the antipode (no angular reach); the balanced/true-angle
+   version is a low-gradient radial ridge the smoother washes out
+   (far/near≈1.1, ≈ no-op). Use a **compact `|X−P|²` Lorentzian
+   about the feature point** — it has the correct combined
+   radial+angular extent and pulls in θ automatically (far/near
+   2.74 at an interior point).
+2. **Slip works mechanically, is not a concentrator.** Rim radial
+   drift ~1e-16 (nodes provably stay on the ring); GAMG robust
+   (~31 it) throughout. But slip ON near a boundary feature
+   *relaxes* the mesh (far/near 1.21→1.12, minA 0.32→0.48) — it
+   removes the hard pin so the rim equalises; it does NOT drag rim
+   nodes tangentially toward θ₀ (rim count near θ₀ 16→18). Slip
+   buys boundary *quality*, not feature *concentration*.
+3. **Boundary-proximal features are choked.** The same compact
+   Lorentzian gives far/near 2.74 at r₀=0.78 (interior) but only
+   1.21 at r₀=0.88 (near rim) — no node room between feature and
+   pinned wall; slip relaxes rather than fills. Same fixed-node +
+   pinned-boundary limit, feature side.
+
+Net: compact Cartesian `|X−P|²` Lorentzian about the feature point
+(pulls in θ inherently); keep features with interior room; slip is
+safe and good for boundary *quality* but is not the lever for a
+tangential pull. Drop the polar-separable formulation. Figures
+`/tmp/metric_mesh/ma_polar_slip{,_v2}.png`, `ma_lorentzian_slip.png`.
+
+### Angular OT target vs anisotropic scalar (2026-05-18) — (2) is a dead end
+
+User: the metric should exploit the *abundant tangential* node
+budget (slide spare angular nodes toward the feature) rather than
+the *scarce pinned radial* one. Built (1) the exact 1-D angular OT
+as the target for (2) a new opt-in `move_anisotropy=(w_r,w_θ)`
+that rescales the realised displacement in the local
+radial/tangential frame. Angle-only feature ρ(θ)=1+AMP/(1+(Δθ/Wθ)²),
+AMP=8, res-24:
+
+| | far/near | frac@θ₀ | minA | radial drift |
+|---|---|---|---|---|
+| undeformed | 1.00 | 0.159 | 0.547 | 0 |
+| **(1) exact angular OT [TARGET]** | **2.21** | **0.415** | 0.209 | 1e-16 |
+| (2) winslow isotropic | 0.98 | 0.158 | 0.356 | 6.8e-2 |
+| (2) winslow tangential-preferred | 0.99 | 0.158 | 0.392 | 7.9e-3 |
+
+- **(1) is exactly right** — rakes spare angular nodes into the θ₀
+  sector (frac 0.16→0.42, far/near 2.2), radius untouched (drift
+  1e-16), no tangle. For separable/structured features the explicit
+  1-D OT is the correct tool, used *directly*.
+- **(2) is a structural dead end.** Scalar BFO on the same metric
+  produces ≈zero angular concentration (far/near 0.98, frac 0.158 ≈
+  uniform) for *any* weighting. `move_anisotropy` works as designed
+  — it suppresses *spurious radial* drift (6.8e-2→8e-3) — but there
+  is no angular concentration to preserve: the scalar potential
+  never generates the coherent tangential transport. Reweighting
+  can shape transport the solver produces, not manufacture
+  transport it does not.
+- **Root cause = the foundational cap, both directions.** A scalar
+  equidistribution potential with fixed topology cannot deliver
+  large coherent *bulk* transport — radial (the ~1.7 cap) *or*
+  tangential (here). Hoop/fixed-topology stiffness cuts both ways.
+
+Verdict: "(1) as a target for (2)" *proves (2) cannot reach it*.
+Use the explicit 1-D OT directly for separable features
+(directional / dimensional-split redistribution); the generalisable
+heavy route is a true anisotropic metric-*tensor* adaptation — not
+anisotropic diffusivity / move-weighting on the scalar potential.
+`move_anisotropy` is kept as an opt-in *quality* knob (suppresses
+off-direction drift), not a concentrator. Script
+`ma_angular_ot_target.py`; fig `/tmp/metric_mesh/ma_angular_ot.png`.
+
+### (3) metric-tensor machinery — construction verified (2026-05-18)
+
+`ma_metric_tensor_viz.py`: scalar density ρ(x) → `M = (1/h0²)[I +
+β ĝĝᵀ(|∇ρ|/∇ρ_ref)²]`, eigen-clamped to spacing ∈ [H_MIN,H_MAX]
+(≤8:1). Desired-cell ellipses drawn on a clean polar sample grid for
+a radial feature ρ(r) and an angular feature ρ(θ). Result is
+correct and confirms the design:
+
+- Radial feature → ellipses **tangentially elongated** (short ⟂ r,
+  long along the ring); circular where ∇ρ→0 (crest, far field).
+- Angular feature → ellipses **radially elongated** (short ⟂ θ,
+  long in r), concentrated in the θ₀ sector.
+- **The eigenframe auto-aligns to r̂ / θ̂ with no (r,θ) frame
+  specified anywhere** — M was fed only the Cartesian ∇ρ. This is
+  the resolution of the user's (r,θ) puzzle: scalar density in,
+  tensor alignment emergent from its gradient; API stays scalar.
+- Max anisotropy = the eigen-clamp band (8.3:1), as designed.
+
+Honest nuance (visible in the figure): a *gradient*-based metric
+refines where ρ **changes** (the flanks) and is isotropic at a
+smooth peak (∇ρ=0) and far away. Correct for "resolve the feature's
+structure"; for small cells at the feature *core* use smoothed
+`|∇ρ|` or the Hessian-based `M=|H(ρ)|` (curvature-aligned; needs the
+recovered-Hessian path, extra cost). Gradient form is the
+first-derivative, UW3-clean first cut.
+
+Status: the metric *construction* (the ~1-day half) is verified and
+cheap. Remaining for (3): the anisotropic **mover** (metric-Winslow
+/ M-weighted displacement solve — the medium-effort half), with the
+standing caveat that it improves cell alignment/quality, not the
+fixed-node-count cap. Fig `/tmp/metric_mesh/ma_metric_tensor.png`.
+
+---
+
+## NEXT-PHASE KICKOFF BRIEF (read this first in a new session)
+
+**Goal:** build the anisotropic *mover* for approach (3). The metric
+*construction* is done & verified (`ma_metric_tensor_viz.py`,
+`M = (1/h0²)[I + β ĝĝᵀ(|∇ρ|/ref)²]`, eigen-clamped). What remains is
+the solver that moves nodes to satisfy a tensor metric M(x).
+
+**Read before starting (do NOT re-derive / re-explore):**
+- Memory `project-ma-efficiency-direct-solver` — the settled
+  dead-ends. Do not retry: Newton/cofactor; GAMG on a
+  boundary-peaked/pinned metric; polar-separable metrics; boundary
+  slip as a *concentrator*; anisotropic *reweighting of the scalar
+  BFO* (`move_anisotropy`) as a concentrator. All proven dead.
+- This design doc, the "(3) metric-tensor machinery" + the angular-
+  OT section (why scalar BFO can't do coherent bulk transport — the
+  fixed-topology cap, both directions).
+- `src/underworld3/meshing/smoothing.py`: the cache/lag/MUMPS infra,
+  `_use_direct_solver` / `_use_iterative_solver`, `linear_solver`,
+  `phi_degree=2` default, `move_anisotropy` (keep as a quality knob),
+  and the Phase-0 `_CofDiff` pattern (script
+  `ma_newton_phase0.py`) — the working example of a variable
+  *tensor*-coefficient `SNES_Scalar` in UW3 (reuse this for M).
+
+**Concrete plan:** a metric-Winslow / MMPDE M-weighted displacement
+solve — `∇·(M ∇ξ)=0`-type vector system (or the M-weighted Laplace
+smooth of the coordinate map), M the gradient-derived tensor field
+above, move = the solved displacement, with the existing signed-area
+backtrack + `boundary_slip`. Reuse: the tensor-constitutive pattern
+(`_CofDiff`-style `DiffusionModel` subclass with `_c = M`), the
+factor-once-reuse solver options, the cache. Validate on the SAME
+model problems with the SAME honest, anisotropy-aware diagnostics
+(`ma_radial_anisotropy.py`: minA + radial/tangential split, NOT
+d/n) and against the explicit 1-D OT target (`ma_angular_ot_target.py`,
+`ma_analytic_check.py`).
+
+**Standing caveat (accepted by the user):** (3) improves cell
+alignment/quality and removes the slivers/wasted-isotropic-resolution
+— it does **not** beat the fixed node-count cap (that needs
+`mesh.adapt`). For separable features the explicit 1-D OT (method 1)
+stays exact and strictly cheaper; (3) earns its keep only for the
+general non-separable case. Gradient-based M refines feature *edges*;
+Hessian-based `M=|H(ρ)|` (curvature-aligned, needs the recovered-
+Hessian path) is the follow-up if core-resolution is needed.
+
+**Scope estimate:** ~1–2 weeks to a validated prototype on the
+Annulus model problems. New feature branch off
+`feature/winslow-mesh-smoother`. Effort is the solver + its
+validation arc, not the metric (done).
+
+---
+
+## (3) anisotropic mover — IMPLEMENTED & VALIDATED (2026-05-18)
+
+Branch `feature/anisotropic-metric-mover` (off
+`feature/winslow-mesh-smoother`). `_winslow_anisotropic` in
+`smoothing.py`; `smooth_mesh_interior(..., method="anisotropic")`.
+
+### Formulation (as built)
+
+Displacement form of the **decoupled direct** M-weighted Laplace
+(Winslow) coordinate map. Per physical component `c`:
+
+$$ \nabla\!\cdot(D\nabla u_c) = -\textstyle\sum_j\partial_j D_{jc},
+   \qquad u_c=0 \text{ on the pinned boundary}, $$
+
+so `ψ_c = x_c + u_c` solves `∇·(D∇ψ_c)=0`, `ψ=x` on the boundary
+(the direct Winslow smoother — clusters nodes where `D` is large).
+`D = M` (the verified eigen-clamped `M = (1/h0²)[I + β ĝĝᵀ
+(|∇ρ|/gref)²]`). The two components share the *same* tensor
+operator `_c = D` via a `_CofDiff`-style `DiffusionModel`
+subclass; reuses `_use_direct_solver` (factor-once), the cache,
+the signed-area backtrack, `boundary_slip`, `move_anisotropy`.
+**Linear** — one solve/component/step, no Picard (cheaper than the
+BFO `_winslow_elliptic`). Homogeneous Dirichlet ⇒ non-singular ⇒
+**no `constant_nullspace`**, side-stepping the GAMG-pure-Neumann
+fragility entirely.
+
+### Two formulation findings (do NOT re-derive)
+
+1. **The metric must be built ONCE and held fixed & Lagrangian**
+   (like `_winslow_spring`'s rest-lengths/A0). Re-projecting ∇ρ on
+   the progressively distorted mesh inside the outer loop is a
+   *positive feedback* — `D` blows up on squashed cells →
+   catastrophic over-collapse (minA/meanA → 1e-3). With `D` fixed,
+   the outer loop is a stable damped fixed-point iteration of one
+   linear operator toward the M-harmonic map.
+2. **The decoupled direct Winslow form has no
+   Rado–Kneser–Choquet non-folding guarantee**, so its stable
+   regime is bounded by the metric anisotropy/contrast. A single
+   un-damped elliptic jump folds; under-relaxation (`relax`) +
+   `n_outer` damped steps is required (the analogue of the BFO
+   `picard_relax=0.4`). Characterised Pareto frontier
+   (`scripts/aniso_param_sweep.py`, interior radial feature): `β`
+   is *not* the binding lever — the **eigen-clamp `aniso_cap`** is.
+
+   | `aniso_cap` | needs | minA/meanA | note |
+   |---|---|---|---|
+   | 2 | `relax≈0.1–0.2` | **≈0.47–0.50** | robust default |
+   | 4 | `relax≈0.05`, `n_outer≳25` | ≈0.35 | sharper, still clean |
+   | ≳6 | — | ≲0.02 (folds) | needs coupled/inverse Winslow |
+
+   Defaults shipped: `aniso_cap=2`, `relax=0.2`, `n_outer=12`,
+   `β=200`. AMP=0 is an **exact isotropic no-op** (a scale-aware
+   `g_eps=1e-9` floor rejects the ~1e-18 projection round-off of a
+   uniform-ρ zero gradient — without it the noisy `gref` fabricated
+   O(1) anisotropy).
+
+### Validation arc (anisotropy-aware: radial/tangential split +
+minA/meanA, NOT the anisotropy-blind d/n; grids rendered)
+
+| problem (res, AMP=8) | metric | (3) minA/meanA | isotropic MA | spring |
+|---|---|---|---|---|
+| radial @R_O (pathology) | — | **0.240** | 0.019 | 0.177 |
+| radial interior r=0.70 | — | **0.466** | 0.182 | 0.253 |
+| angular-only (separable) | — | **0.243** | 0.144 | — |
+| non-separable blob | — | **0.295** | 0.109 | 0.119 |
+
+- **(3) is the cleanest method everywhere** — 2.6–12× better
+  minA/meanA than the isotropic MA, never slivers, linear/cheap
+  (~3 s res-16, no Picard).
+- **Concentration is milder** than MA (radial interior far/near ≈
+  MA; non-separable far/near 1.10 vs MA 1.37; angular ≈ uniform).
+  (3) trades grading *magnitude* for clean anisotropic *cell
+  alignment* — exactly its intended role.
+- **Separable features confirm the settled cap**: angular-only
+  (3) ≈ uniform concentration (far/near 1.02, frac@θ0 0.160) — it
+  CANNOT beat the explicit 1-D OT (`ma_angular_ot_target.py`
+  target far/near 2.21), same fixed-topology limit as the scalar
+  paths. (3) is for the **non-separable** case + quality, not
+  separable concentration.
+- Figures: `/tmp/metric_mesh/aniso_radial_peak{1p00,0p70}.png`,
+  `aniso_angular.png`, `aniso_nonsep.png` (the non-separable zoom
+  is the clearest: MA/spring pull a degenerate slivered knot into
+  the blob; (3) gives a clean, well-shaped, blob-aligned
+  densification).
+
+### Verdict
+
+A **validated prototype matching the brief**: (3) improves cell
+alignment/quality and removes the slivers/wasted isotropic
+resolution; it does **not** beat the fixed node-count cap (the
+explicit 1-D OT stays exact + cheaper for separable features).
+Open follow-ups (out of prototype scope): the **coupled/inverse**
+Winslow (RKC-non-folding) to admit `aniso_cap ≳ 6`; Hessian-based
+`M=|H(ρ)|` for feature-core resolution; parallel-exact assembly.
+Scripts: `aniso_smoke.py`, `aniso_param_sweep.py`,
+`aniso_validate_{radial,angular,nonsep}.py`,
+`aniso_blob_metric.py` (target-vs-realised), `aniso_convection_demo.py`
+(Ra=1e5 → refine on ∇T).
+
+### Architecture (pipeline & components)
+
+`_winslow_anisotropic` in `src/underworld3/meshing/smoothing.py`;
+reached via `smooth_mesh_interior(mesh, metric=ρ,
+method="anisotropic")`. `ρ` is a target *density* (larger ⇒ finer)
+— typically a Lagrangian `f(frozen_field.sym)`.
+
+**Cache build (once per mesh/topology/params key):**
+
+1. `grho` — projected `∇ρ`: a `Vector_Projection` with
+   `uw_function = [ρ.diff(Xᵢ)]`, `smoothing=0`. A *first* derivative
+   of the Lagrangian density only (UW3-legal).
+2. `Df` — a `TENSOR` MeshVariable holding the metric tensor;
+   initialised to the identity.
+3. `_TensorDiff(DiffusionModel)` — `_build_c_tensor` sets
+   `_c = Df.sym` (the `_CofDiff` pattern from `ma_newton_phase0.py`:
+   a variable tensor-coefficient `SNES_Scalar`).
+4. Per coordinate component `c`: a scalar `uw.systems.Poisson` with
+   that constitutive tensor, source
+   `f_c = Σⱼ ∂D_{jc}/∂xⱼ`, **homogeneous Dirichlet `u_c=0`** on the
+   pinned boundary (non-singular → no `constant_nullspace` → no
+   GAMG-pure-Neumann fragility), wired to `_use_direct_solver`
+   (MUMPS, factor-once-reuse) or the `_use_iterative_solver` GAMG
+   path. (`boundary_slip=True` ⇒ pure-Neumann + `constant_nullspace`
+   + ring-projection instead, as in `_winslow_elliptic`.)
+
+**Per call:**
+
+5. **Build `D` ONCE on the undeformed mesh.** `gproj.solve()`;
+   per node `M = (1/h₀²)[I + β ĝĝᵀ(|∇ρ|/gref)²]`; eigen-decompose;
+   **clamp eigenvalues** to `[1/h_max², 1/h_min²]` (the `aniso_cap`
+   band); reassemble → write `Df`. A scale-aware `g_eps=1e-9` floor
+   makes uniform ρ an exact no-op (rejects the ~1e-18 projection
+   round-off of a zero gradient). `D` is thereafter **fixed and
+   Lagrangian** — it rides material points through `_deform_mesh`;
+   re-projecting it each step is the positive-feedback collapse
+   (settled).
+6. **Damped MMPDE outer loop** (`n_outer` steps): solve the `cdim`
+   displacement Poissons `∇·(D∇u_c) = −Σⱼ∂ⱼD_{jc}` (so `ψ=x+u` is
+   the M-harmonic coordinate map); optional `move_anisotropy`
+   reweight; `step = relax·disp`; **coherent global signed-area
+   backtrack** (halve the scale until no triangle inverts) + slip
+   ring-projection; `mesh._deform_mesh`; stop when
+   `max|Δx| < outer_tol`.
+
+Reuses `_winslow_elliptic`'s backtrack, `boundary_slip`,
+`move_anisotropy`, the solver cache and the MUMPS
+factor-once-reuse wiring verbatim. **Linear** — one solve per
+component per outer step, no Picard (cheaper than the BFO MA).
+
+### GAMG parity + cost per step (2026-05-18 — measured)
+
+`scripts/aniso_cost_and_gamg.py`, interior radial feature, res
+16/24/32/48 (1.5k–12.9k tris), `direct` vs `gamg`. Times: **cold**
+(fresh mesh — MeshVariable+solver creation + 1st factorisation,
+one-off per remesh), **warm** (same mesh object, cache hit — the
+genuine per-timestep cost in a dynamic loop), per-outer-step, and
+the D-build.
+
+| res | ntri | warm direct | warm gamg | warm/outer | D-build | minA/meanA |
+|----|------|------|------|------|------|------|
+| 16 | 1522 | 3.08 s | 3.26 s | 0.25 s | 0.34 s | 0.4657 |
+| 24 | 3268 | 6.29 s | 6.30 s | 0.51 s | 0.64 s | 0.4256 |
+| 32 | 5814 | 10.94 s | 10.94 s | 0.89 s | 1.11 s | 0.3938 |
+| 48 | 12856 | 23.72 s | 23.98 s | 1.94 s | 2.41 s | 0.4452 |
+
+- **GAMG is robust here — bit-parity with direct**
+  (`|minA_g−minA_d| ≤ 5e-5` at every resolution). The mover is
+  **non-singular** (homogeneous Dirichlet, no constant nullspace),
+  so it does **not** hit the pure-Neumann + warm-resolve fragility
+  that made the MA `gamg` path erratic. This is the **first** of
+  the three metric methods with a working parity-preserving
+  parallel-capable solver path. (At feasible 2D sizes MUMPS is
+  near-optimal so `gamg` is not *faster* — the point is it *works
+  and matches*, so the parallel route is real.)
+- **cold ≈ warm at every resolution** — no warm-≫-cold
+  degradation (the MA path's O(N) post-deform rebuild pathology is
+  absent here; the cache reuses the MeshVariables/solvers, only the
+  operator is refactorised because `D`+geometry change each call).
+- **Cost is ~O(N) (linear in #cells).** warm 3.1→23.7 s for
+  ntri 1522→12856 (≈7.7× for ≈8.4× cells); per-outer-step and
+  D-build likewise ~O(N). No superlinear blow-up — the per-step
+  work is a fixed number of **sparse SPD-ish elliptic solves**
+  (the part GAMG parallelises with optimal O(N/P) complexity and
+  good weak scaling) plus embarrassingly-local per-node /
+  backtrack work.
+- **The cost lever is `n_outer`.** Default 12 ⇒ ~12 scalar
+  elliptic solves of the mesh size. The damped MMPDE converges
+  (most displacement is in the first few steps; `max|Δx|` decays),
+  so an `outer_tol` early-exit / a small `n_outer` cuts the warm
+  cost to ≈ `D-build + 3–5 · warm/outer` (≈1.5–2 s at res-16). The
+  per-step adaptation is then ≈ *a handful of pressure-solve-class
+  SPD solves* — genuinely cheap for an r-adaptation scheme (most
+  need nonlinear solves or global transport; this does not).
+- Honest hotspot: the per-node eigen-clamp is a Python loop
+  (`np.linalg.eigh` per node) — vectorisable to a batched
+  `eigh` on a stacked `(N,d,d)` array (a cheap win, matters more in
+  3D / at scale); currently dominated by the solves anyway.
+
+**Parallel verdict (the user's hypothesis, now evidenced):** the
+per-step cost is `1 ∇ρ projection + a vectorisable eigen-clamp +
+n_outer × (cdim non-singular SPD elliptic solves + a local
+backtrack)`, all O(N) and GAMG-parallelisable with proven
+2D parity. This is one of the few r-adaptation strategies with
+**no nonlinear solve and no global transport** — structurally
+inexpensive in parallel. (Caveat: the *assembly* — ∇ρ projection /
+D-build / backtrack — is still serial-exact; the parallel-exact
+cross-rank version is the remaining piece, not the solver.)
+
+### Solver limitations
+
+- **2D triangle meshes only** (hard `NotImplementedError`).
+- **Decoupled direct Winslow form → no Rado–Kneser–Choquet
+  non-folding guarantee.** Stable only for modest anisotropy:
+  `aniso_cap≈2` (robust default), `≈4` with gentler `relax` + more
+  `n_outer`, **`≳6` folds regardless**. The backtrack prevents
+  *inversion*, not extreme squashing — a property of the
+  formulation, not a tuning miss.
+- **Fixed node budget** — relative redistribution only; cannot
+  beat the node-count cap. For *separable* features the explicit
+  1-D OT is exact and strictly cheaper.
+- **Gradient metric resolves edges/fronts, not cores** —
+  isotropic-coarse (de-refined) where `∇ρ=0` (a smooth peak). Right
+  tool for boundary layers / interfaces / fronts; wrong tool for
+  resolving a smooth peak's centre (→ Hessian metric).
+- **Metric is Lagrangian-fixed** (built once). A tensor metric
+  should co-rotate with large deformation; we don't — fine for
+  modest moves, not large-strain.
+- **Serial-exact assembly only** — the ∇ρ projection / `D` build /
+  backtrack under-count at rank-partition boundaries (same caveat
+  as spring/MA). The *solver* is no longer the parallel blocker
+  (GAMG validated, see the cost section); the cross-rank
+  parallel-exact assembly is the remaining piece. MUMPS scales to
+  modest sizes; GAMG is the route beyond.
+- **Linear, component-decoupled** — an anisotropic Laplacian
+  smoother, not the full nonlinear (Jacobian-coupled) Winslow
+  generator.
+
+### Corners still unexplored
+
+- **Solution-accuracy proof.** Validated mesh *quality + alignment*
+  only — NOT yet that it *helps the PDE* (lower T-discretisation
+  error / better Nu at fixed node count vs a uniform mesh). That
+  accuracy/cost study is the real payoff and is untested.
+- **Dynamic-adaptive loop.** The demo is static ("20 steps then
+  refine once", `aniso_convection_demo.py`). Re-refining every N
+  steps with the metric riding the flow (ALE-style, interacting
+  with SLCN advection / the free-surface ALE) — the production use
+  case — is unexplored.
+- **Coupled / inverse Winslow** (computational ξ harmonic in
+  physical space → RKC-non-folding) to safely admit `aniso_cap ≳ 6`
+  and stronger alignment. The heavy MMPDE (map inversion /
+  resampling).
+- **Hessian metric `M=|H(ρ)|`** (curvature-aligned) for feature-
+  *core* resolution — reuse the recovered-Hessian path
+  (`_hessian_recovery_class`; first-derivative L2 recovery, since
+  UW3 forbids 2nd derivatives of mesh-var functions).
+- **A `metric_from_gradient`-style ρ helper** unifying the metric
+  API across `mesh.adapt` (absolute `h`, MMG re-meshes) and the
+  mover (relative `ρ`, fixed budget) — discussed, not built.
+- **GAMG path — VALIDATED (2026-05-18), see the cost section.**
+  Bit-parity with direct at res 16–48 (non-singular ⇒ no
+  pure-Neumann fragility); the parallel-scalable route is real.
+  *Remaining*: cross-rank **parallel-exact assembly** (the ∇ρ
+  projection / D-build / backtrack are serial-exact — the solver
+  is not the blocker), and a true MPI weak-scaling study.
+- **3D extensibility — concrete scope.** Already
+  dimension-general: the metric formula
+  `M=base[I+β ĝĝᵀ(|∇ρ|/gref)²]`, the eigen-clamp
+  (`np.linalg.eigh` works for 3×3), the `TENSOR` MeshVariable
+  (`dim²` comps), the displacement form `∇·(D∇u_c)=−Σⱼ∂ⱼD_{jc}`
+  over `c=0..cdim−1`, the per-component `Poisson` + `_TensorDiff`
+  (3×3 `_c`), and the solver wiring — and GAMG (now proven for
+  this operator) is exactly what makes 3D viable (3D sparse-direct
+  does not scale). 2D-specific work to remove: the
+  `cdim!=2` guard; `_tri_cells`/`_signed_areas` →
+  `_tet_cells`/`_signed_volumes` for the inversion backtrack (the
+  main piece — a shared limitation with spring/MA); ~5 lines of
+  the eigen-clamp / `Df.array[:,i,j]` writes generalised to
+  `cdim`; `boundary_slip`/`move_anisotropy` stay 2D (default
+  off/None). Modest, well-scoped (~1–2 days) — the solver core is
+  already dim-general; the careful step is validating the tet
+  signed-volume backtrack before it lands in the shared smoother.
+- **Auto-tuning** `aniso_cap`/`relax`/`n_outer` (largest cap that
+  keeps `minA/meanA` above a floor — the Pareto frontier is
+  characterised but not automated).
+- **Free-surface / deformed-boundary slip** (polyline projection —
+  shared open item with spring/MA).
+
+---
+
+## NEXT-PHASE KICKOFF BRIEF — dynamic adaptive convection (read first)
+
+**Phase just closed (2026-05-18):** the anisotropic mover is a
+validated 2D prototype, GAMG-parity, ~O(N), and the **API is
+locked in**:
+
+- `uw.meshing.smooth_mesh_interior(mesh, metric=ρ,
+  method="anisotropic", method_kwargs=dict(aniso_cap=2.0,
+  relax=0.2, n_outer=12, linear_solver="direct"))`
+- `uw.meshing.metric_density_from_gradient(mesh, field, amp=8.0,
+  lo_percentile=50, hi_percentile=97)` → the Lagrangian
+  `ρ = 1+amp·t` density (the relative analogue of
+  `adaptivity.metric_from_gradient`; cached for per-step use).
+- Docs: `docs/advanced/mesh-adaptation.md` (peer to `mesh.adapt`),
+  `docs/developer/subsystems/mesh-metric-redistribution.md`,
+  this design note.
+- Test harness: `scripts/adaptive_convection_harness.py`.
+
+**Goal of the next phase:** a *correct* dynamic-adaptive
+convection solve — coarse adaptively-snuggled mesh reproducing a
+fine uniform reference. The harness already runs the comparison
+(Ra=1e5, uniform res-24 reference vs res-16 adaptive,
+`Nu(t)`/`vrms(t)` rms error, figure).
+
+**THE open piece — the node-update / ALE correction.** When the
+mover displaces nodes by `Δx` over the step interval `Δt`, the
+mesh has velocity `v_mesh = Δx/Δt`. The SLCN advection–diffusion
+must transport along the material velocity *relative to the moving
+mesh*: `V_fn = v_fluid − v_mesh` for the post-adapt step (ALE), or
+T must be conservatively remapped onto the moved nodes. Without it
+the pure coordinate move is read as a spurious advection of T.
+**Precedent is settled in this codebase:** the free-surface ALE
+finding (memory `project_freesurface_ale_design` — a Lagrangian
+mesh move needs `V_fn = v − v_mesh` or convection is
+non-physically damped, Nu ~57 vs 143). The hook is
+`apply_adaptation_correction` in the harness: `--correction none`
+is the uncorrected baseline (expected to drift — it *quantifies*
+the error the correction must remove); `--correction ale` raises
+with the spec. **Acceptance test:** harness `rms ΔNu(adaptive
+res-16 vs uniform res-24)` small with the correction, large
+without.
+
+**Other follow-ups, priority order:**
+1. ALE correction + harness acceptance (above) — the headline.
+2. **3D port** — scoped ~1–2 days. The solver core is already
+   dimension-general; the 2D-specific work is the tet
+   signed-volume inversion backtrack (`_tri_cells`/`_signed_areas`
+   → tet) + dropping the `cdim!=2` guard + ~5 generalised lines.
+   **The metric stays `1/h²` per principal direction in 3D — it is
+   NOT `1/h³`** (a Riemannian metric measures *edge length*, which
+   is 1-D regardless of embedding dimension: `eᵀMe=1` ⇒ eigenvalue
+   `1/h²`; dimension enters only the complexity integral
+   `∫√(det M)` via `det M = ∏1/hᵢ²`). For the *mover* the overall
+   `D` scale is moreover irrelevant (the displacement PDE is
+   invariant under `D→αD`) — only the anisotropy/contrast ratios
+   matter, so 3D needs no scaling change at all.
+3. Parallel-exact cross-rank assembly + MPI weak-scaling (GAMG
+   solver path already validated bit-parity).
+4. Hessian metric `M=|H(ρ)|` for feature-*core* resolution.
+5. `aniso_cap`/`relax`/`n_outer` auto-tuning to a `minA/meanA`
+   floor.
+
+**How to resume:** run
+`python scripts/adaptive_convection_harness.py --correction none`
+for the baseline error, then implement `apply_adaptation_correction`
+(`--correction ale`) and re-run to show the gap closes.
diff --git a/docs/developer/design/media/adapt_convection_a16x.png b/docs/developer/design/media/adapt_convection_a16x.png
new file mode 100644
index 00000000..28058834
Binary files /dev/null and b/docs/developer/design/media/adapt_convection_a16x.png differ
diff --git a/docs/developer/design/media/adapt_metric_tensor_construction.png b/docs/developer/design/media/adapt_metric_tensor_construction.png
new file mode 100644
index 00000000..26370385
Binary files /dev/null and b/docs/developer/design/media/adapt_metric_tensor_construction.png differ
diff --git a/docs/developer/design/media/adapt_nonseparable_validation.png b/docs/developer/design/media/adapt_nonseparable_validation.png
new file mode 100644
index 00000000..abed67de
Binary files /dev/null and b/docs/developer/design/media/adapt_nonseparable_validation.png differ
diff --git a/docs/developer/design/mesh-adaptation-formulation.md b/docs/developer/design/mesh-adaptation-formulation.md
new file mode 100644
index 00000000..25334f09
--- /dev/null
+++ b/docs/developer/design/mesh-adaptation-formulation.md
@@ -0,0 +1,573 @@
+# Mesh adaptation by metric-driven node redistribution — mathematical formulation
+
+> **Scope.** This is the self-contained *mathematical* reference for the
+> topology-preserving mesh-adaptation family in UW3
+> (`uw.meshing.smooth_mesh_interior`). It derives the three solution
+> strategies — **optimal-transport / Monge–Ampère**, the **volumetric
+> elastic spring**, and the **anisotropic metric-tensor (Winslow/MMPDE)
+> mover** — the gradient-metric construction, the handling of fields
+> under mesh motion in time-dependent problems, and the Nusselt
+> diagnostic. Operational guidance (when to use which, parameters) is in
+> {doc}`/developer/subsystems/mesh-metric-redistribution` and
+> {doc}`/advanced/mesh-adaptation`; the dated R&D log is
+> `ma-newton-cofactor-exploration.md`. Formulae here are transcribed
+> from `src/underworld3/meshing/smoothing.py`.
+
+## 1. The equidistribution principle
+
+All three strategies share one goal. Given a strictly-positive
+**monitor** (target density) field $\rho(\mathbf x)$ — larger where the
+mesh should be finer — find a coordinate map that, **at fixed topology
+and fixed node count**, redistributes the interior nodes so the cell
+size tracks $\rho$. In $d$ dimensions the design criterion is
+
+$$ h(\mathbf x)\;\propto\;\rho(\mathbf x)^{-1/d}, $$
+
+equivalently the *equidistribution* condition that the monitor mass per
+cell be uniform,
+
+$$ \rho(\mathbf x)\,\bigl|\det \mathbf J\bigr| \;=\; \text{const},
+   \qquad \mathbf J=\partial\mathbf x/\partial\boldsymbol\xi, $$
+
+with $\boldsymbol\xi$ the (uniform) computational coordinate. Boundary
+vertices are pinned (or slide tangentially, {ref}`§6 <sec-dynamic>`), so
+the domain $\Omega$ is unchanged — this is *redistribution*, **not**
+re-meshing.
+
+```{important}
+**The fixed-node-count cap.** With a fixed number of nodes and fixed
+connectivity, the achievable grading is bounded. For an 8–20×
+density-contrast target the realisable deep/near edge-length ratio is
+only ≈1.5–1.8×; the exact optimal-transport map is ≈10×. Reaching the
+latter needs *more nodes* — a topology change (`mesh.adapt` / MMG), not
+this smoother. Every fixed-topology local method (graph-Laplacian,
+weighted-Laplacian, all Monge–Ampère variants, the elastic spring)
+converges to the same ≈1.0×–1.8× band: the cap is intrinsic to
+fixed-topology redistribution, not a solver deficiency. The strategies
+below differ in *cell shape/alignment quality* and *cost*, not in their
+ability to exceed this cap.
+```
+
+## 2. Strategy A — Optimal transport / Monge–Ampère
+
+### 2.1 Brenier map and the Monge–Ampère equation
+
+The $L^2$-optimal map carrying the uniform measure to the target
+measure $\propto\rho$ is, by Brenier's theorem, the gradient of a
+**convex** potential, $\mathbf x=\nabla\Phi(\boldsymbol\xi)$.
+Substituting into the equidistribution condition gives the
+**Monge–Ampère equation**
+
+$$ \rho\bigl(\nabla\Phi\bigr)\,\det\!\bigl(D^2\Phi\bigr)\;=\;c . $$
+
+Writing the map as a perturbation of the identity,
+$\mathbf x=\boldsymbol\xi+\nabla\varphi$ (so $D^2\Phi=I+D^2\varphi$),
+the implementation solves
+
+$$ \det\!\bigl(I+D^2\varphi\bigr)\;=\;g,
+   \qquad g \;=\; \frac{c\,\rho_{\mathrm{cur}}}{\rho_{\mathrm{tgt}}}, $$
+
+and moves nodes by $\nabla\varphi$. The normalisation constant is
+chosen so that a **uniform monitor is an exact no-op**:
+
+$$ c \;=\; \Bigl\langle\, b^{-1/2}\,\Bigr\rangle^{-2},
+   \qquad b=\rho_{\mathrm{tgt}}\,\rho_{\mathrm{cur}} ,$$
+
+(`c = 1/mean(1/sqrt(b))**2`), which makes the first Picard iterate
+mean-zero so that $\rho_{\mathrm{tgt}}\!=\!\text{const}\Rightarrow
+\nabla\varphi\equiv 0$.
+
+### 2.2 Benamou–Froese–Oberman convex branch (2-D)
+
+In 2-D, $\det(I+D^2\varphi)=(1+\varphi_{xx})(1+\varphi_{yy})-\varphi_{xy}^2$.
+Setting this equal to $g$ and solving the resulting quadratic for the
+Laplacian $\Delta\varphi=\varphi_{xx}+\varphi_{yy}$ gives the two
+roots; the **convex (Brenier) branch** is the $+\sqrt{\cdot}$ one:
+
+$$
+  \boxed{\;\Delta\varphi \;=\;
+   \sqrt{(\varphi_{xx}-\varphi_{yy})^2 + 4\,\varphi_{xy}^2 + 4\,g}
+   \;-\;2\;}
+$$
+
+(`f_src = sqrt((Hxx-Hyy)**2 + 4*Hxy**2 + 4*g) - 2`). The $+\sqrt{}$
+selects the convex root unconditionally — this is what makes the
+iteration stable without an explicit convexity safeguard. It is a
+**closed-form convex-branch solve**, not a linearisation: the new
+Laplacian is expressed through $g$ and only the *deviatoric* part of
+the Hessian, side-stepping the noisy/under-estimated full $\det$.
+
+### 2.3 Damped Picard, recovered Hessian, the move
+
+The equation is solved by a **damped Picard iteration**: each iterate
+solves a *constant-coefficient* Poisson problem for $\varphi$ with the
+above source evaluated at the previous Hessian, then under-relaxes,
+
+$$
+   \varphi \;\leftarrow\; (1-\omega)\,\varphi
+   \;+\;\omega\,\varphi^{\text{solve}}, \qquad \omega\approx 0.4 ; 
+$$
+
+without the relaxation the recovered Hessian grows unbounded and the
+(otherwise well-posed) Neumann solve diverges. The Poisson operator is
+**pure-Neumann** (the map's natural BC is $\nabla\varphi\cdot\hat n=0$)
+and is closed with a constant nullspace.
+
+Because UW3 forbids second derivatives of mesh-variable functions, the
+Hessian is obtained by a **variationally-consistent first-derivative
+recovery** — the SPD mass-matrix system
+
+$$
+  \int H_{ij}\,\tau_{ij}\,dV \;+\;
+   \int \frac{\partial\varphi}{\partial x_i}\,
+        \frac{\partial\tau_{ij}}{\partial x_j}\,dV \;=\;0
+   \quad\Longrightarrow\quad
+   H_{ij}\approx\frac{\partial^2\varphi}{\partial x_i\partial x_j}, 
+$$
+
+i.e. the weak form of $\int H_{ij}\tau_{ij}=-\int\partial^2_{ij}\varphi\,
+\tau_{ij}$ integrated by parts (boundary term dropped = natural). Only
+first derivatives of $\varphi$ appear.
+
+Nodes are then displaced by $\nabla\varphi$ subject to a **coherent
+global signed-area backtrack**: a single scalar step factor is halved
+until *no triangle inverts* (orientation of every cell preserved),
+guaranteeing a valid mesh. (UW3's `SNES_Poisson` uses $F_0=-f$, so the
+source is applied with a sign, `_EQUIDIST_SIGN=-1`, that makes the
+validated linear first iterate $\Delta\varphi=(g-1)$ grade nodes toward
+high target density.)
+
+### 2.4 The 1-D exact reference (separable features)
+
+For a *separable* monitor (e.g. radial $\rho(r)$ on an annulus, or
+angular $\rho(\theta)$) the exact equidistribution map is a 1-D
+**cumulative-mass inversion**, computable to machine precision with no
+FE solve: place node radii $r_k$ so that equal target mass
+
+$$
+m(r)=\int_{R_i}^{r}\rho(s)\,s\,\mathrm{d}s 
+$$
+
+(the $s\,\mathrm{d}r$ is the 2-D polar area element) lies between
+consecutive shells, $r_k=m^{-1}(k/N)$. This is the optimal-transport
+map under radial symmetry; it achieves the full ≈10× grading and is
+*exact and strictly cheaper than any FE solve* — for separable features
+it is the tool of choice. It also serves as the ground-truth target
+against which the FE strategies are measured.
+
+```{note}
+
+**Why the single FE Monge–Ampère solve caps at ≈1.5–1.8×.** Every
+FE-MA-potential variant (linear Picard; recovered-Hessian Picard,
+smoothed and variational; BFO convex-branch + damping; outer map
+composition) converges to the *same* ≈30 %-of-exact, self-consistent
+but under-deformed (non-Brenier / weak-branch) transport map — right
+shape and sign, never tangling, but deep/mid nodes move only ~30 % of
+the exact distance. This is a property of the FE-MA-potential
+*formulation at fixed topology*, not of the linear solver, Hessian
+recovery, branch, resolution, or single-vs-composed solves. The coupled
+$(\varphi,H)$ Newton SNES solves the same equation ⇒ same ceiling.
+Strategy C exists because a *scalar* potential cannot deliver coherent
+*anisotropic* bulk transport at fixed topology either.
+
+```
+
+## 3. Strategy B — Volumetric elastic-spring equilibrium
+
+Decouple **shape** from **size**. Every mesh edge is a linear spring of
+*uniform* rest length $\bar L$ (the current mean edge), a pure shape
+regulariser that drives cells equant and kills slivers; the *size*
+grading lives entirely in a per-cell area target. Minimise the truss
+energy
+
+$$ 
+  E(\mathbf x)\;=\; w_{\text{shape}}
+     \sum_{e}\Bigl(\tfrac{|\mathbf x_i-\mathbf x_j|-\bar L}{\bar L}\Bigr)^{2}
+   \;+\; w_{\text{size}}
+     \sum_{t}\Bigl(\tfrac{A_t-A^0_t}{A^0_t}\Bigr)^{2},
+$$
+
+with per-cell target area $A^0_t\propto 1/\rho_{\mathrm{tgt}}$, rescaled
+so $\sum A^0_t=\sum a^{\text{init}}_t$ (total area conserved — pure
+redistribution). Defaults $w_{\text{shape}}=1,\ w_{\text{size}}=8$;
+results are robust to them. Minimised by **Jacobi-preconditioned
+nonlinear conjugate gradients** (Polak–Ribière$^+$) with an Armijo line
+search that rejects any cell-inverting trial — the tangle guard lives
+*inside* the optimiser, so it converges to the true equilibrium rather
+than creeping against a per-sweep freeze. Fast (≈0.3 s on a res-16
+annulus), robust, never degenerates; slightly streaky/anisotropic at
+sharp interior features.
+
+## 4. Strategy C — Anisotropic metric-tensor mover (production)
+
+A scalar equidistribution potential is isotropic and, at fixed
+topology, cannot produce coherent anisotropic bulk transport. Strategy
+C instead reshapes cells with a **gradient-derived anisotropic metric
+tensor** and an M-weighted harmonic (Winslow / MMPDE) coordinate map.
+
+### 4.1 The gradient-derived metric tensor
+
+From the scalar density $\rho$, form the *projected* gradient
+$\nabla\rho$ (a **first** derivative — UW3-clean; via a
+`Vector_Projection`), and at each node build
+
+$$ 
+\boxed{\;M \;=\; \frac{1}{h_0^{2}}
+   \Bigl[\, I \;+\; \beta\,\hat{\mathbf g}\hat{\mathbf g}^{\mathsf T}
+   \bigl(|\nabla\rho|/\nabla\rho_{\mathrm{ref}}\bigr)^{2}\Bigr],\qquad
+   \hat{\mathbf g}=\nabla\rho/|\nabla\rho|\;}
+$$
+
+(`M = base*(I + beta*(gn/gref)**2 * outer(gh,gh))`), then
+**eigen-clamp**: $M=\sum_i\lambda_i\mathbf v_i\mathbf v_i^{\mathsf T}$,
+clip $\lambda_i\in[\,1/h_{\max}^2,\;1/h_{\min}^2\,]$ with
+$h_{\min}=h_0/\sqrt{\texttt{aniso\_cap}}$,
+$h_{\max}=h_0$, and reassemble. $h_0$ is the mean edge length;
+$\nabla\rho_{\mathrm{ref}}$ the max projected $|\nabla\rho|$.
+
+The eigenframe **auto-aligns to the feature from the Cartesian
+$\nabla\rho$ alone** — no $(r,\theta)$ frame is supplied anywhere
+(figure below). A radial feature yields tangentially-elongated cells
+(short $\perp\hat{\mathbf r}$, long along the ring); an angular feature
+yields radially-elongated cells. Being a **gradient** metric it refines
+where $\rho$ *changes* (feature edges/flanks) and is isotropic-coarse at
+a smooth peak ($\nabla\rho\to0$) and in the far field — the correct
+behaviour for resolving fronts/interfaces; resolving a feature *core*
+needs a curvature (Hessian) metric instead.
+
+#### Single-knob equidistribution (`resolution_ratio`)
+
+The anisotropic term is positive-semidefinite, so the bare metric is
+$M\succeq\tfrac1{h_0^2}I$: it can **only refine** (it keeps just
+$\nabla\rho$ and discards $\rho$'s magnitude, so it never asks for a
+cell coarser than $h_0$). On a **fixed node budget** that is fatally
+one-sided — flat regions cannot release nodes, the globally-steepest
+feature scavenges the budget and the interior plumes starve. *This is
+structural, not a tuning deficit:* no `aniso_cap`, $\beta$, or
+percentile setting frees the budget; they only re-aim one that is
+never released.
+
+The fix makes the isotropic part a genuinely **equidistributed**
+density. Evaluate $\rho$ on the (near-uniform, *undeformed*) metric
+mesh, form the geometric mean $G=\exp\langle\ln\rho\rangle$, and set
+
+$$
+\boxed{\;M \;=\; s(\mathbf x)\bigl(I
+   \;+\; \beta\,\hat{\mathbf g}\hat{\mathbf g}^{\mathsf T}
+   \bigl(|\nabla\rho|/\nabla\rho_{\mathrm{ref}}\bigr)^{2}\bigr),
+   \qquad
+   s(\mathbf x)=\tfrac1{h_0^{2}}\;\frac{\rho(\mathbf x)}{G}\;}
+$$
+
+eigen-clamped to $\lambda_i\in[\,1/h_{\max}^2,\,1/h_{\min}^2\,]$ with
+**$h_{\min}=h_0/R$, $h_{\max}=h_0R$** for the single knob
+$R=\texttt{resolution\_ratio}$. Because $\langle\ln s\rangle=\ln
+(1/h_0^2)$, the node budget is **centred**: steep regions ($\rho>G$)
+refine and flat regions ($\rho<G$) coarsen by *exactly complementary*
+amounts — **the conservation law of §1 supplies the de-resolution
+automatically; there is no coarsening parameter**. M-harmonic
+scale-invariance makes the normalisation constant irrelevant to the
+realised mesh, so $G$ only serves to place the clamp band
+symmetrically about the bulk; $R$ alone bounds the realised fine:coarse
+ratio at $R^2$. $R=1$ collapses the band → the metric reverts
+**bit-identically** to the refine-only historical default (an exact
+no-op vs. every prior result); $R\!\approx\!2$ is the validated
+production point.
+
+*Why a single normalised knob rather than two caps.* The earlier
+two-knob form ($s=h_0^{-2}\texttt{coarsen\_cap}^{\,q-1}$, separate
+$h_{\min}$/$h_{\max}$) exposed the refine⇄coarsen split as a free
+parameter, and a sweep showed that split is **not free** — it is a
+single quality budget with a sharp knee:
+
+| legacy `aniso·cc` | $\min A/\overline A$ | edge p95/p05 | mesh |
+|---|---|---|---|
+| 4  (refine-only)  | ~0.27 | ~2.0 | clean, **no de-resolution** |
+| 6  (cc 1.5)       | 0.20–0.25 | ~3.4 | clean, subtle grading |
+| 8  (cc 2)         | 0.17–0.23 | ~3.8 | clean, clearly graded |
+| 16 (cc 4)         | 0.04–0.14 | ~5.4 | sliver mess (over-coarse) |
+
+The good operating point was *asymmetric* (refine $2\times$, coarsen
+only $\approx1.4\times$): over-coarsening slivers the flats. The
+equidistribution normalisation makes that balance **automatic and
+field-adaptive** instead of a hand-tuned cap product — flat regions
+release exactly the budget the fronts consume, so the user sets only
+*how much resolution range* ($R$), never *where the split lies*. The
+legacy `aniso_cap`/`coarsen_cap` pair is retained solely as a
+bit-for-bit expert override for $R\le1$ so historical scripts
+reproduce; `resolution_ratio` is the documented API.
+
+**Temporal damping of the normaliser.** $G$ is recomputed from
+the *instantaneous* field at every adaptation event. The eigen-clamp
+band $[\,b/R^2,bR^2\,]$ is *fixed*, but $G$ floats; during a violent
+transient (e.g. the $\mathrm{Nu}\!\approx\!17$ convective overshoot)
+a $G$ excursion translates the whole $\rho/G$ distribution sideways
+across that fixed band, so a large fraction of nodes simultaneously
+saturate against a clamp limit and the mesh visibly *lurches* — and
+even in steady state the fine⇄coarse contrast pulses as $G$
+breathes. Because $G$ is geometric, low-pass it **in log space**
+across events,
+
+$$\ln G_{\mathrm{eff}}^{(n)}=a\,\ln G^{(n)}
+   +(1-a)\,\ln G_{\mathrm{eff}}^{(n-1)},$$
+
+with $a=\texttt{geom\_mean\_smoothing}$ ($a\!=\!1$ ⇒ no damping /
+instantaneous; $a\!\approx\!0.25$ ⇒ strongly damped; the first event
+seeds the state). This smooths **only the single global intensity
+scalar** — the spatial $\rho(\mathbf x)$ pattern still tracks the
+current field every event, so *where* the mesh refines stays fully
+responsive; only the global de-resolution *magnitude* is
+time-filtered. It is an internal constant (one carried scalar, not a
+grading knob): the user-facing API stays single-knob. Equivalent to
+equidistributing a *time-averaged* density — for a slowly evolving
+field arguably more desirable than chasing every fluctuation, and
+strictly better through the startup transient.
+
+```{figure} media/adapt_metric_tensor_construction.png
+The eigen-clamped metric tensor for a radial $\rho(r)$ (left) and an
+angular $\rho(\theta)$ (right): desired-cell ellipses (short axis
+$\parallel\nabla\rho$). The eigenframe aligns to $\hat{\mathbf r}$ /
+$\hat{\boldsymbol\theta}$ purely from the Cartesian $\nabla\rho$; the
+anisotropy is bounded by the eigen-clamp band.
+```
+
+### 4.2 The M-weighted (Winslow) coordinate map
+
+Solve the displacement form of the M-weighted Laplace map, **per
+physical coordinate component $c$**,
+
+$$ 
+   \boxed{\;\nabla\!\cdot\!\bigl(D\,\nabla u_c\bigr)
+   \;=\; -\,\nabla\!\cdot\!\bigl(D\,\mathbf e_c\bigr)
+   \;=\; -\sum_j \partial_j D_{jc},\qquad
+   u_c=0 \ \text{on the pinned boundary},\;}
+   
+$$
+
+with $D=M$ the eigen-clamped tensor (`src = Σ_j Dsym[j,c].diff(X[j])`).
+Then $\psi_c=x_c+u_c$ is exactly the M-harmonic coordinate map
+$\nabla\!\cdot(D\nabla\psi_c)=0$, $\psi=x$ on the boundary (since
+$\nabla x_c=\mathbf e_c$). The 1-D analysis $(D\psi')'=0$ gives
+$\psi'\propto1/D$, so the direct Winslow smoother **clusters nodes
+where $D$ is large** — hence $D=M$ (large eigenvalues = small target
+spacing) grades the mesh toward the metric. The two components share
+the *same* tensor operator (a `_CofDiff`-style `DiffusionModel` with
+$\mathbf c=D$); the system is **linear** (one solve per component, no
+Picard) and homogeneous-Dirichlet ⇒ non-singular (no constant
+nullspace, side-stepping the GAMG-pure-Neumann fragility). The overall
+*scale* of $D$ is irrelevant — the PDE is invariant under $D\to\alpha
+D$ — only its anisotropy and spatial variation matter; the $1/h_0^2$
+normalisation only fixes the interpretation of the eigen-clamp band.
+
+The displacement is applied with the same coherent signed-area
+backtrack as Strategy A.
+
+### 4.3 Stability — damped MMPDE
+
+The *decoupled, direct* Winslow form (each physical coordinate
+M-harmonic independently) has **no Rado–Kneser–Choquet non-folding
+guarantee**, so a single un-damped elliptic jump folds. It is therefore
+run as a **damped MMPDE**: the metric tensor is built **once** on the
+undeformed mesh and held fixed-Lagrangian (re-projecting $\nabla\rho$ on
+the progressively distorted mesh is a positive feedback that collapses
+the mesh); the displacement is under-relaxed,
+$\mathbf x\leftarrow\mathbf x+\texttt{relax}\cdot\mathbf u$, and
+composed over `n_outer` steps. The binding stability lever is the
+**eigen-clamp `aniso_cap`**, not $\beta$: `aniso_cap≈2` is robust at
+`relax≈0.1–0.2`; `≈4` is clean with a gentler `relax≈0.05` and more
+`n_outer`; `≳6` folds regardless (that would need the coupled/inverse
+Winslow — out of scope). A scale-aware floor `g_eps` makes a
+(near-)uniform $\rho$ an *exact* identity (it rejects the
+$\sim10^{-18}$ round-off of the projected zero gradient, which would
+otherwise be percentile-normalised into a spurious metric).
+
+## 5. Metric construction from a field gradient
+
+For the common case "refine where field $f$ has steep gradients" the
+helper `uw.meshing.metric_density_from_gradient` builds the **relative
+target density**
+
+$$ \rho \;=\; 1+\texttt{amp}\cdot t,\qquad
+   t=\operatorname{clip}\!\Bigl(
+     \tfrac{|\nabla f|-g_{\mathrm{lo}}}{g_{\mathrm{hi}}-g_{\mathrm{lo}}},
+     0,1\Bigr), $$
+
+with $g_{\mathrm{lo}},g_{\mathrm{hi}}$ the lo/hi percentiles of the
+projected $|\nabla f|$. This is the deliberate, intent-identical
+analogue of `uw.adaptivity.metric_from_gradient` (which maps the same
+normalised $|\nabla f|$ to an **absolute** target edge length
+$h\in[h_{\min},h_{\max}]$ for the MMG re-mesher). The distinction is the
+*node budget*:
+
+```{important}
+**`amp` is a no-op for the anisotropic mover (Strategy C); the
+effective metric-construction knobs are the percentile window and,
+in the mover, `aniso_cap`/$\beta$.** Strategy C builds $M$ from
+$|\nabla\rho|/g_{\mathrm{ref}}$ with $g_{\mathrm{ref}}=\max|\nabla\rho|$
+(§4.1). With $\rho=1+\texttt{amp}\cdot t$ both $|\nabla\rho|$ and
+$g_{\mathrm{ref}}$ scale linearly with `amp`, so it **cancels
+exactly** — $M$ is independent of `amp` (verified to machine
+precision; `amp`=16 vs 24 give bit-identical metrics). What does
+*not* cancel is the **percentile window**
+$(g_{\mathrm{lo}},g_{\mathrm{hi}})$: it reshapes $t$ (which gradient
+quantile is clipped flat vs in the linear ramp), hence $\nabla t$,
+hence $M$ — so it chooses *which* gradient strength triggers
+refinement. For the mover the user-facing model is now a **single
+knob, `resolution_ratio` $R$** (§4.1): it sets the resolution range
+($h_0/R$ … $h_0R$) while the equidistribution normalisation supplies
+the complementary coarsening *automatically* — there is no separate
+budget/percentile/coarsen lever to balance. The percentile window
+still shapes *where* the gradient counts as a front (a property of
+`metric_density_from_gradient`, upstream of the mover), but the
+refine⇄coarsen split is no longer a parameter. `amp` is inert here
+but *is* a real bunching intensity for the **isotropic** spring /
+Monge–Ampère methods (where the absolute $\rho$ magnitude enters
+$A^0\propto1/\rho$ and $g\propto1/\rho$) — it is method-dependent.
+```
+
+| | `mesh.adapt` (MMG) | `smooth_mesh_interior` (this) |
+|---|---|---|
+| mechanism | re-mesh: insert/remove nodes | redistribute existing nodes |
+| metric target | **absolute** $h$ | **relative** density $\rho$ |
+| variables | reset (must transfer) | preserved (topology fixed) |
+
+```{important}
+**The metric is $1/h^{2}$ per principal direction in 2-D *and* 3-D —
+not $1/h^{d}$.** A Riemannian metric measures *edge length* (a 1-D
+quantity: $\mathbf e^{\mathsf T}M\mathbf e=1\Rightarrow$ eigenvalue
+$1/h^2$), independent of embedding dimension. Dimension enters only the
+element-count integral $\int\!\sqrt{\det M}\,$ via
+$\det M=\prod_i 1/h_i^2$. So the 3-D extension keeps the same
+construction; no exponent change.
+```
+
+(sec-dynamic)=
+## 6. Dynamic adaptation — fields under mesh motion
+
+In a time-dependent problem the mesh is adapted every few steps. The
+mover only moves coordinates: `_deform_mesh` rewrites the coordinate
+vector and invalidates caches but **never touches MeshVariable data**,
+so fields are **Lagrangian-carried** (DOF values unchanged, their
+support relocated). Used uncorrected, this injects the pure mesh
+displacement as a spurious advection. Two corrections:
+
+* **Interpolation / remap (general, solver-agnostic).** Because this
+  adapter is *topology-preserving* (vector size, DOF order and the
+  parallel partition are invariant) and the boundary is pinned (domain
+  unchanged, every new node in-domain), the old field can be evaluated
+  at the new node positions by the **local FE evaluate** at maximum
+  fidelity (true P3 basis, no cross-rank migration) — distinct from,
+  and much sharper than, the kd-tree `read_timestep` path used for the
+  decomposition-changing `mesh.adapt`. This is the relative-budget
+  analogue of the standard re-mesh-then-transfer workflow.
+* **ALE (specialised).** Keep the Lagrangian carry but drive the
+  advection–diffusion solver with the *relative* velocity
+  $V_{\mathrm{fn}}=\mathbf v-\mathbf v_{\mathrm{mesh}}$,
+  $\mathbf v_{\mathrm{mesh}}=\Delta\mathbf x/\Delta t$. Requires the
+  adapter to be *bound* to that solver (a coupling the remap does not
+  need); avoids interpolation diffusion.
+
+```{important}
+**Pristine re-mesh for the dynamic loop.** Re-adapting the
+*already-graded* mesh each event compounds compression (mesh quality
+collapses over ~tens of events). The fix — the across-events analogue
+of holding the metric Lagrangian within a call — is to re-derive the
+adapted mesh from the **original** node positions at every adaptation:
+each event is a single, fresh *uniform → graded* map, so compression is
+bounded to single-adaptation quality indefinitely. With this, the
+strong-metric setting stays healthy over a full saturated run.
+```
+
+```{figure} media/adapt_convection_a16x.png
+Applied example: Ra=10⁵ annulus thermal convection, res-16 base mesh,
+pristine re-mesh every 5 steps with a strong $|\nabla T|$ metric. The
+mesh grades into the inner thermal boundary layer and the plume
+conduits while the cells stay well-shaped. Animate with
+`scripts/aniso_movie.py`.
+```
+
+## 7. Validation summary
+
+Validated with anisotropy-aware diagnostics (radial/tangential
+edge-length split and `minA/meanA`, *not* the anisotropy-blind d/n)
+against the exact 1-D OT reference.
+
+* The anisotropic mover (C) is the **cleanest** method everywhere
+  (`minA/meanA` 2.6–12× better than the isotropic Monge–Ampère, never
+  slivers), linear and cheap.
+* It **does not beat the fixed-node-count cap** (§1); for *separable*
+  features the exact 1-D OT (§2.4) is strictly better and cheaper. It
+  earns its keep on **non-separable** features and on cell
+  alignment/quality.
+* In a dynamic Ra=10⁵ adaptive run the adapted res-16 mesh reproduces
+  the res-24 reference's heat transport (Nu within ≈1 %) and kinetic
+  energy ($v_{\mathrm{rms}}$ within ≈3 %) at ≈0.69× the reference
+  wall-time; the adaptation overhead itself is a small fraction of one
+  Stokes solve.
+* **Single-knob equidistribution (`resolution_ratio`), fully
+  validated.** Gating/clamp probe: $R\!=\!1$ has *no* sub-`base`
+  eigenvalue and ratio $=$ `aniso_cap` exactly (the refine-only
+  no-op, confirmed empirically as well as by construction);
+  $R\!=\!2$ respects the clamp $[\,b/4,4b\,]$ with **0 violations**,
+  eigenvalues span `base` both ways, $\approx$62 % of nodes
+  coarsened to fund the fronts (parameter-free complementary
+  de-resolution). A fresh→saturation Ra=10⁵ run at $R\!=\!2$
+  (`a16e`, 326 steps, 65+ adaptation events) holds
+  `minA/meanA`$\approx$0.20 with **zero folds** through the full
+  overshoot→settle cycle and settles at $\mathrm{Nu}\approx3.87$ —
+  *quantitatively matching the hand-tuned legacy two-knob point*
+  (`coarsen_cap=2`: `minA/meanA`$\approx$0.20, $\mathrm{Nu}\approx
+  3.89$) **with one parameter-free knob**. The geometric-mean
+  centring self-lands in the sliver-safe regime; the user sets only
+  the resolution range, never the refine/coarsen split.
+
+```{figure} media/adapt_nonseparable_validation.png
+Non-separable Gaussian blob: target metric (top-right), the realised
+anisotropic-mover mesh, and zooms vs the isotropic Monge–Ampère /
+spring. The tensor mover produces clean, blob-aligned cells where the
+scalar methods pull a degenerate slivered knot.
+```
+
+## 8. The Nusselt diagnostic
+
+Heat transfer is reported as the **measured total radial heat flux
+relative to the conductive flux**. The total radial flux density is
+$q_r=v_r\,T-\partial_r T$ (advective + diffusive). It is projected to a
+nodal field and integrated over an **interior shell**:
+
+$$ Q(r)=\oint_{|\mathbf x|=r}\!\!\bigl(v_r T-\partial_r T\bigr)\,r\,
+   \mathrm{d}\theta,\qquad
+   \mathrm{Nu}=\frac{Q\bigl(\tfrac{R_i+R_o}{2}\bigr)}{Q_{\mathrm{cond}}}. $$
+
+At steady state $Q(r)$ is shell-independent (conservation), and an
+interior shell is **immune to thermal-boundary-layer resolution** —
+unlike a near-wall $\partial T/\partial r$ stencil, which under-resolves
+a sub-element boundary layer and reports ≈2–3× too low. The conductive
+normalisation uses the **true annular conduction** solution of
+$\nabla^2T=0$ with $T(R_i)=1,\,T(R_o)=0$, which is *logarithmic*
+$T_{\mathrm{cond}}=\ln(r/R_o)/\ln(R_i/R_o)$ (not the linear slab
+profile used for the Boussinesq buoyancy reference), giving the total
+conductive flow
+
+$$ Q_{\mathrm{cond}} \;=\; \frac{2\pi}{\ln\!\left(R_o/R_i\right)} . $$
+
+By construction $\mathrm{Nu}=1$ at pure conduction (verified to
+$1.0000$ on the analytic profile, shell-independent). Note the
+*local* conductive flux density is $1/(r\ln(R_o/R_i))\approx1.4$–$2.9$
+(Cartesian-like ≈2), while $Q_{\mathrm{cond}}\approx9.06$ is the
+*total* power through the circumference; the Nu ratio is invariant to
+this choice.
+
+## References & cross-links
+
+* {doc}`/developer/subsystems/mesh-metric-redistribution` — operational
+  guide (parameters, when to use which).
+* {doc}`/advanced/mesh-adaptation` — user-facing, alongside
+  `mesh.adapt`.
+* `docs/developer/design/ma-newton-cofactor-exploration.md` — the dated
+  R&D log (Newton/cofactor, GAMG, the validation arc).
+* Implementation: `src/underworld3/meshing/smoothing.py`
+  (`_winslow_elliptic`, `_winslow_spring`, `_winslow_anisotropic`,
+  `metric_density_from_gradient`). Reproduce the figures with
+  `scripts/ma_metric_tensor_viz.py`, `scripts/aniso_validate_*.py`,
+  `scripts/adaptive_saturation*.py`, `scripts/aniso_movie.py`.
diff --git a/docs/developer/design/ot-adapt-api-proposal.md b/docs/developer/design/ot-adapt-api-proposal.md
new file mode 100644
index 00000000..216cd2d7
--- /dev/null
+++ b/docs/developer/design/ot-adapt-api-proposal.md
@@ -0,0 +1,272 @@
+---
+title: "Mesh.OT_adapt() — public API proposal"
+date: 2026-05-24
+status: proposal
+---
+
+# `mesh.OT_adapt()` — public API proposal
+
+## Background
+
+The validated production pattern for time-stepping convection with
+metric-driven mesh adaptation (per the 2026-05-23/24 investigation,
+see `project_ot_reset_validated.md`) is, internally:
+
+1. Reset the mesh to its IC uniform coords
+2. FE-remap the driving field T onto the uniform-mesh DOFs
+3. Build the gradient-based metric ρ
+4. Run the OT mover from the uniform canvas
+5. FE-remap simulation fields onto the OT-adapted positions
+
+But the "reset" is an **implementation detail** of what makes the
+OT mover behave well across many adapt cycles. From the user's
+point of view this is just "adapt the mesh to track ∇T". The API
+should reflect that.
+
+## API surface
+
+A single method on the mesh:
+
+```python
+moved = mesh.OT_adapt(
+    field,
+    *,
+    refinement=3.0,
+    coarsening="auto",
+    grad_smoothing_length="auto",
+    metric_choice="front-following",
+    fields_to_remap=None,
+    fields_to_zero=None,
+    skip_threshold=None,
+    verbose=False,
+)
+```
+
+**Required:**
+
+- `field` — the scalar MeshVariable whose gradient drives
+  refinement (typically `T`)
+
+**Metric knobs** (production defaults validated):
+
+- `refinement` — cell-size envelope (h0/R). Primary feature knob;
+  validated range 1.5–5, 3 ≈ Nu sweet spot
+- `coarsening` — `"auto"` (= refinement^(1/d)) or numeric
+- `grad_smoothing_length` — screened-Poisson de-noising length L for
+  ∣∇field∣ (the most effective sliver lever). `"auto"` (default) ≈ the
+  mesh's uniform cell size — needed to keep R≈3 sliver-free; `None` = off;
+  ≈ 2·h0 stronger. User-supplied lengths are **unit-aware** (Pint /
+  non-dimensionalised via the projection)
+- `metric_choice` — `"front-following"` or `"gradient-uniform"`
+
+**Field handling:**
+
+- `fields_to_remap` — list of MeshVariables to FE-remap onto the
+  adapted positions (preserve as continuous fields). `field`
+  itself is always remapped implicitly
+- `fields_to_zero` — list of MeshVariables to zero post-adapt
+  (e.g. velocity, pressure on a cold restart)
+
+**Behaviour:**
+
+- `skip_threshold` — if mesh is already aligned within this
+  misalignment, skip the whole step and return False
+- Returns `True` if the mesh moved, `False` otherwise
+
+## Reference coordinates — function-managed cache
+
+The "reset target" coords are cached lazily on first call:
+
+```python
+def OT_adapt(self, field, **kwargs):
+    if not hasattr(self, "_ot_adapt_reference_coords"):
+        # First call — snapshot the current mesh state as
+        # the reset target for all future OT_adapt calls
+        self._ot_adapt_reference_coords = \
+            np.asarray(self.X.coords).copy()
+    ...
+```
+
+This keeps mesh classes (`Annulus`, `Box`, …) unchanged — no
+modifications to their `__init__`. The cache lives on the mesh
+instance and survives across calls.
+
+**Staleness caveat.** If anyone (not OT_adapt itself) deforms the
+mesh between calls — e.g. a user manually calling
+`mesh._deform_mesh(some_X)` for unrelated purposes — the cached
+"pristine" coords will diverge from whatever the user thinks of
+as the reference state. The cached value is still well-defined
+(it's whatever the mesh was on the *first* OT_adapt call), but
+may no longer match the user's intent.
+
+Mitigations:
+
+- Document the caveat in the docstring (above the example)
+- Provide a `mesh.OT_adapt_reset_reference(coords=None)` method
+  to invalidate / override the cache (None → re-cache from
+  current state; explicit coords → use those as the new
+  reference). Lets the user opt-in to a deliberate re-baseline
+- Optional `reference_coords` kwarg on `OT_adapt` itself for
+  one-off override (doesn't update the cache)
+
+## Boundary-slip as a DOF constraint (analogous to Stokes BCs)
+
+Same framing as Stokes `essential_bc` — at each boundary node,
+constrain certain DOFs of the **displacement** to zero, leave
+others free:
+
+| BC type | Constraint on displacement at boundary node |
+|---|---|
+| pinned | all components zero (no motion) |
+| slip | component along outward normal = 0 (only tangential motion) |
+| free | no constraint |
+
+For curved boundaries this is first-order: zeroing the normal
+component of an arbitrary displacement leaves the node *near*
+but not *exactly on* the boundary. A small snap-back projection
+restores the node to the surface after each step (the same
+"snap to fixed \|r\|" that the current ring code does, but now
+derived from the BC rather than hardcoded).
+
+**Use the existing `mesh.Gamma_N`** — UW3 already exposes a
+`sympy.Matrix` row of the normalised outward boundary normal as
+`mesh.Gamma_N` (in `discretisation_mesh.py:2213`). Every mesh
+class already defines it. No new method required.
+
+`OT_adapt` evaluates `mesh.Gamma_N` at the boundary node
+coordinates to get the per-node normal, then zeros the normal
+component of the OT displacement at those nodes — same pattern
+Stokes uses with `add_essential_bc`. No geometry-specific
+'box' / 'ring' strings, no per-class snap-back projection
+bespoke code.
+
+```python
+def _apply_slip_constraint(self, displacement, boundary_mask):
+    """Zero the normal component of displacement at boundary
+    nodes. Tangential motion left free."""
+    boundary_coords = self.X.coords[boundary_mask]
+    n_hat = np.asarray(uw.function.evaluate(
+        self.Gamma_N, boundary_coords)).reshape(-1, self.cdim)
+    disp_bnd = displacement[boundary_mask]
+    disp_normal = (disp_bnd * n_hat).sum(axis=1, keepdims=True)
+    displacement[boundary_mask] -= disp_normal * n_hat
+    return displacement
+```
+
+This is just first-order — for curved boundaries (Annulus,
+SphericalShell), zeroing the normal component leaves the node
+*near* but not *exactly on* the boundary surface. A small
+snap-back projection restores nodes to the surface after each
+step (snap-to-fixed-\|r\| for radial cases — the current 'ring'
+code generalised). The snap-back can also be derived from the
+mesh's coordinate system (e.g. `mesh.CoordinateSystem`).
+
+**Sphere2D edge case**: its normal is also `Gamma_N`
+(radial-outward everywhere); the difference is that the
+constraint applies to *every node*, not just boundary nodes. A
+`mesh.is_manifold` flag (or `mesh.constraint_mask` exposing
+"these nodes need the slip projection") tells OT_adapt to
+extend the projection. The API hook (`Gamma_N`) is the same;
+only the "which nodes get constrained" logic differs.
+
+See the **Boundary-slip as a DOF constraint** section below for
+how the per-mesh `boundary_normal` method makes this uniform
+across geometries.
+
+## What's *not* in the API
+
+- The legacy `incremental` OT path (slivers accumulate)
+- The post-OT spring polish (converged spring loses Nu)
+- The escalating-R chain (no gain over single R)
+- `metric_degree > 1` (broken on reset path — cached projection
+  goes stale)
+- `boundary_slip` mode string — automatic: uses
+  `mesh._boundary_tangent_project` if defined, else falls back
+  to pinned boundaries
+- The "reset" itself — caller doesn't see it; just calls
+  `mesh.OT_adapt(...)`
+
+## What still needs upstream work
+
+Two production-readiness gaps remain (per
+`project_ot_production_blockers.md`):
+
+1. **Sphere2D constrained-manifold OT** — the only true manifold
+   mesh in the table; OT mover needs to constrain *every* node
+   (not just boundary nodes) to the spherical surface. The
+   NotImplementedError hook is the API contract; the actual
+   implementation is research.
+2. **Parallel JIT determinism error** — blocks ANY parallel UW3
+   run, not specific to OT_adapt.
+
+## Caller code
+
+After the API lands, the harness's `_adapt_step` becomes:
+
+```python
+def _adapt_step():
+    return mesh.OT_adapt(
+        T,
+        refinement=args.refinement,
+        coarsening=args.coarsening,
+        grad_smoothing_length=args.grad_smooth_length,
+        metric_choice=args.metric_choice,
+        fields_to_remap=[T],
+        fields_to_zero=[V, P],
+        verbose=True,
+    )
+```
+
+A user wanting a one-shot adapt (no time loop) writes:
+
+```python
+mesh = uw.meshing.Annulus(radiusOuter=1.0, radiusInner=0.5,
+                          cellSize=1/16, qdegree=3)
+T = uw.discretisation.MeshVariable(...)
+# ... initialise T somehow ...
+mesh.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+```
+
+## Implementation location
+
+- Method lives on `Mesh` base class in
+  `src/underworld3/discretisation/discretisation_mesh.py`
+- Common implementation in
+  `src/underworld3/meshing/_ot_adapt.py` (new file), called
+  from the method
+- Per-mesh hooks implemented in each mesh class file in
+  `src/underworld3/meshing/`
+- The existing `_winslow_equidistribute`'s box/ring handling
+  becomes legacy; new code uses `mesh._boundary_tangent_project`
+
+## Open questions
+
+1. Should `fields_to_remap` default to `[field]` (i.e. just remap
+   the driving field if nothing else specified)?
+2. Should the post-adapt FE-remap zero out V,P automatically when
+   the mesh changes topology? (Probably not — user knows their
+   physics; explicit `fields_to_zero` is cleaner.)
+3. Should there be a class-level constant on the mesh advertising
+   whether boundary slip is supported, so the caller can check
+   without try/except? E.g. `mesh.supports_boundary_slip`?
+4. Naming: `OT_adapt` (PascalCase to match `CoordinateSystem`
+   etc.) vs `ot_adapt` (snake_case, matches most UW3 method
+   conventions)? UW3 codebase mixes both — what's the project
+   preference?
+
+## Test plan
+
+- Unit test: `Annulus.OT_adapt(T)` on a fixed T moves mesh and
+  preserves T's spatial pattern within FE-remap tolerance
+- Regression test: harness using API matches hand-rolled current
+  version bit-for-bit
+- Negative test: `Sphere2D.OT_adapt(T)` raises
+  NotImplementedError with the expected message (the
+  constrained-manifold case)
+- Resume test: save + restart, call `OT_adapt` — the cache
+  initialises lazily from the *loaded* mesh's current coords
+  (which is the deformed state at the snapshot point). For
+  resume-from-snapshot scenarios the user should
+  `mesh.OT_adapt_reset_reference(coords=loaded_init_coords)`
+  with the explicit IC mesh, or otherwise document that resumed
+  runs use the snapshot's mesh as the "reference"
diff --git a/docs/developer/design/snes-atol-convergence-scale.md b/docs/developer/design/snes-atol-convergence-scale.md
new file mode 100644
index 00000000..9146d993
--- /dev/null
+++ b/docs/developer/design/snes-atol-convergence-scale.md
@@ -0,0 +1,216 @@
+---
+title: "SNES convergence: set snes_atol to the problem scale"
+---
+
+# SNES `snes_atol` — guess-independent convergence
+
+**Status:** design proposal, pending sign-off + benchmarking.
+**Scope:** UW3 SNES solver wrapper — internal to the `solve()`
+path in `cython/petsc_generic_snes_solvers.pyx` (which already
+branches on `zero_init_guess`). No user-facing API. Affects *every*
+UW3 SNES solve.
+**Origin:** the adaptive-mesh / Stokes warm-start divergence
+investigation (2026-05). This note is the root-cause writeup +
+proposed fix; the mesh-mover work was unrelated — it merely exposed
+this.
+
+## Summary
+
+UW3's `tolerance` setter configures `snes_rtol` but **never sets
+`snes_atol`**, leaving it at PETSc's default (`~1e-50`). PETSc's
+default convergence test then has only one viable criterion: a
+**relative tolerance referenced to the residual at the initial
+guess**. A warm-started solve whose initial residual is already
+small (re-solving a near-solved state — exactly what you *want* to be
+cheap) is handed an unreachably tight target and fails
+(`DIVERGED_LINE_SEARCH`), while the *same problem* cold-started
+converges. The fix is to also set `snes_atol` to the problem's
+natural residual scale, so convergence is judged **absolutely
+(guess-independent)** — including the desirable "re-solve the
+solution ⇒ zero iterations" behaviour.
+
+## Evidence (PETSc 3.25 source)
+
+`SNESConvergedDefault` (`src/snes/interface/snesut.c`):
+
+```c
+if (!it) {                              /* iteration 0 = initial guess */
+    snes->ttol   = fnorm * snes->rtol;  /* rtol target ∝ ‖F(x0)‖ */
+    snes->rnorm0 = fnorm;
+}
+...
+} else if (fnorm < snes->abstol && (it || !snes->forceiteration)) {
+    *reason = SNES_CONVERGED_FNORM_ABS;  /* absolute — guess-independent */
+} ...
+if (it && !*reason) {
+    if (fnorm <= snes->ttol) *reason = SNES_CONVERGED_FNORM_RELATIVE;
+    else if (snorm < snes->stol * xnorm)
+        *reason = SNES_CONVERGED_SNORM_RELATIVE;   /* it>=1 only */
+}
+```
+
+Key facts, verified in-tree:
+
+1. `rtol` is **defined** relative to the initial-guess residual
+   (`ttol = rtol·‖F(x0)‖`, set once at `it==0`). There is **no
+   option** to reference it to the problem/RHS scale. PETSc has not
+   changed this.
+2. The **absolute** path (`fnorm < snes_atol`) is gated by
+   `(it || !snes->forceiteration)`, so it is evaluated **even at
+   `it==0`**. With `snes_atol` set to the problem scale and
+   `snes_force_iteration` off (UW3's default), re-solving an
+   already-solved state converges at iteration 0 with **zero Newton
+   steps** — the intended behaviour.
+3. The step-norm path (`snorm < stol·xnorm`) is gated by `it && ...`
+   — it cannot deliver zero-iteration convergence and is pre-empted
+   when the line search aborts at the 0→1 transition.
+
+UW3 (`petsc_generic_snes_solvers.pyx`, `tolerance` setter):
+
+```python
+self.petsc_options["snes_rtol"] = self._tolerance       # set
+self.petsc_options["ksp_rtol"]  = self._tolerance * 1e-1 # set
+self.petsc_options["ksp_atol"]  = self._tolerance * 1e-6 # set
+#  snes_atol : NEVER set  → PETSc default ~1e-50 → absolute path dead
+```
+
+So convergence is decided **solely** by `fnorm ≤ rtol·‖F(x0)‖`.
+
+## Failure mechanism
+
+For a warm-started solve where the carried-forward guess is close to
+the solution, `‖F(x0)‖` is small ⇒ `ttol = rtol·‖F(x0)‖` is a tiny
+absolute number, often below what the (relative-tolerance) inner KSP
+delivers for the Newton correction. The line search cannot achieve
+sufficient decrease toward an unreachable target ⇒
+`DIVERGED_LINE_SEARCH`. Cold-start (`x0 = 0`) gives
+`‖F(x0)‖ ≈ ‖RHS‖` (large) ⇒ a sane `ttol` ⇒ converges. This is
+guess-relative, not problem-relative — and it means *improving the
+guess makes convergence harder*, the opposite of what a solver should
+do.
+
+Observed across the adaptive-convection runs: warm Stokes diverged
+repeatedly through violent transients (every step, until the field
+calmed), each instance recovering cleanly from a cold restart;
+`ksponly`/`basic` line-search "worked" only by bypassing the test;
+improving the warm guess (V,P remap) did **not** help — all exactly
+as the mechanism predicts.
+
+## Proposed fix
+
+**`snes_atol` is internal to the solver and never user-facing.**
+There is no new API, no `tolerance_abs` knob — exposing it would
+repeat the mistake this whole investigation argued against (robust
+defaults, not fragile expert knobs). The solver derives and applies
+it automatically, **per solve, conditioned on `zero_init_guess`**:
+
+```
+if not zero_init_guess:                 # WARM start
+    F0    = ‖F(x=0)‖   for the CURRENT operator/RHS   # problem scale
+    saved = snes_atol
+    snes_atol = snes_rtol * F0          # temporary, guess-independent
+    <SNES solve>                        # → SNES_CONVERGED_FNORM_ABS
+    snes_atol = saved                   # restore
+else:                                   # COLD start
+    <SNES solve>                        # rtol·‖F(x0=0)‖ already = scale
+```
+
+* **Warm solve:** the guess-relative `ttol = rtol·‖F(x_warm)‖` is
+  unreachable; the solver instead **computes the problem-scale
+  target residual and temporarily sets `snes_atol` to it for that
+  solve only**, then restores. Convergence then takes the absolute,
+  guess-independent path (`SNES_CONVERGED_FNORM_ABS`, evaluated even
+  at `it==0`), so re-solving an already-solved state converges in
+  **zero Newton iterations** — the intended behaviour.
+* **Cold solve:** untouched. `‖F(x0=0)‖` *is* the problem scale, so
+  the existing `rtol` path already targets the right residual; the
+  cold solve is also the natural place to (re)source the scale.
+
+**Scale currency (design decision).** The target must be the
+**current** problem scale, *recomputed each warm solve* — one extra
+function evaluation at `x=0`, negligible against the solve — **not**
+a frozen startup `‖F₀‖`. The RHS scale (e.g. `‖buoyancy‖`) varies
+substantially through a transient; a frozen scale would be stale
+exactly where warm-start divergence bites. `F(0)` remains a valid
+scale for nonlinear rheology, so this is a convergence-*criterion*
+fix independent of linearity. (The `--stokes-snes-atol-auto`
+confirmation harness uses a *frozen* startup scale — a valid proof
+of the mechanism, but a simplification; production recomputes.)
+
+## Impact & risk
+
+This changes the convergence criterion for **every UW3 SNES solve**
+(Stokes, scalar Poisson, projections, advection–diffusion; the
+mesh-mover's `ksponly` sub-solves are unaffected — no Newton test).
+Per the repository rule *solver stability is paramount — no changes
+without benchmarking*:
+
+* **Cold-started** solves: behaviour ≈ unchanged
+  (`atol` not applied; `rtol·‖F0_cold‖` already the accuracy floor).
+* **Warm-started** solves: spurious divergence *fixed*;
+  "re-solve the solution ⇒ 0 iterations" now works; accuracy is the
+  same `rtol·‖F(0)‖` a working cold solve targets — no under-solving.
+* The `snes_atol` mutation is **scoped to one solve and restored**,
+  so it cannot leak across solvers/steps or interact with a user's
+  own `petsc_options`.
+* Benchmark the standard suite (Stokes/Poisson convergence-order,
+  the `tier_a` set) before merge — it must show unchanged accuracy
+  and order, only removed spurious warm divergences.
+
+Recommended landing: internal to the `solve()` path in
+`petsc_generic_snes_solvers.pyx` (which already branches on
+`zero_init_guess`); no API surface; benchmark suite green; one line
+in the solver guide noting the automatic behaviour.
+
+## Validation
+
+* Root cause verified against PETSc 3.25 source (above) and the UW3
+  `tolerance` setter.
+* Confirmation experiment (`scripts/adaptive_saturation.py
+  --model a16r15a --stokes-snes-atol-auto`, equidist R=1.5, warm,
+  V,P-remap on, default `newtonls`+`bt`, **no cold-recover**;
+  `‖F0‖=24.75 ⇒ snes_atol=2.47e-4`): **full settled run** —
+  warm `STOKES DIVERGED` **24 → 31** (i.e. *no net benefit*, if
+  anything slightly worse, vs the identical run without the
+  absolute criterion). *(An earlier step-70 partial read showed a
+  spurious 24→9 — corrected here: it was a mid-trajectory snapshot
+  before the later transient windows, not the result.)*
+
+  This is *consistent with* the mechanism, and clarifies the scope:
+
+  * The absolute path (`SNES_CONVERGED_FNORM_ABS` at `it==0`) only
+    fires when `‖F(x_warm)‖ < snes_atol`. In a **violent-transient
+    -dominated** run the warm-guess residual is almost always
+    ≫ `atol` (the field changed substantially per step), so the
+    absolute path essentially never triggers; SNES proceeds to the
+    line search, which aborts on the inexact inner Newton step
+    *before any convergence test is consulted*. The
+    near-converged-guess class this fix targets is **nearly absent**
+    in this benchmark, so `snes_atol` provides no net benefit here
+    and merely perturbs which steps fail (net +7).
+  * Where it *does* help — and the reason it should still land — is
+    the regime it is *for*: steady-state continuation, restarts,
+    lightly-evolving problems, any re-solve of a near-solved state.
+    There `‖F(x_warm)‖ < atol` genuinely holds and SNES converges
+    in **zero Newton iterations** instead of failing on an
+    unreachable guess-relative `ttol`. That is a real, general UW3
+    gap (PETSc-source-verified), independent of this benchmark.
+    This experiment does **not** exhibit that regime, so it neither
+    confirms nor refutes the fix's value there — it only shows the
+    fix does not help violent transients (as the mechanism
+    predicts).
+
+**Conclusion:** `snes_atol` is a correct, general improvement for
+the near-converged-guess regime (justified by the PETSc-source
+diagnosis, *not* demonstrated by this transient-dominated run —
+which shows no benefit, as expected). It is **not** the cure for
+warm-start through a violent transient. That cure is a separate,
+*demonstrated* result: an accurate inner Newton solve (`a16r15d`,
+MUMPS-LU inner solve, warm, default `bt`, no recover/atol →
+**24 → 0** warm `STOKES DIVERGED`) — the inner KSP must deliver an
+acceptable step on the graded / stiff-Robin operator, generalised
+as a tight inner tolerance / strong PC / direct where affordable
+(not "always direct"). Cold-restart-on-divergence is the
+operational safety net. The pieces are independent and
+complementary; this note covers only the `snes_atol` piece — see
+the inner-solve result for the transient cure.
diff --git a/docs/developer/design/solver-strategies-catalogue.md b/docs/developer/design/solver-strategies-catalogue.md
new file mode 100644
index 00000000..3558b36d
--- /dev/null
+++ b/docs/developer/design/solver-strategies-catalogue.md
@@ -0,0 +1,382 @@
+---
+title: "Solver strategies catalogue"
+---
+
+# Solver strategies — switches, dials, and when to reach for them
+
+**Scope:** the index + picking guide for solver knowledge across
+**all UW3 PDE families** — Stokes (linear, variable viscosity,
+nonlinear / strain-rate-dependent, yield / viscoplastic), Darcy,
+Poisson, Navier–Stokes — plus time-integration *order* for
+visco-elastic (VE) and visco-elasto-plastic (VEP) problems,
+boundary-treatment / pressure-space / parallel-correctness
+choices, and the diagnostic tooling that supports them.
+
+**Status:** working notes / catalogue, ahead of full documentation.
+**Consult and contribute as standard PDE work** (see
+`memory/feedback_solver_strategies_catalogue.md`): start here when
+hitting a solver wall, and add findings back when settled. Each
+entry: what it does, the mechanism, the evidence, when to reach
+for it, and the caveats. The catalogue is the *aggregation point*
+— individual deep findings live in sibling design notes in this
+directory, linked from here.
+
+**Current body of content (2026-05):** the adaptive-mesh + Stokes
+warm-start investigation populated the catalogue with its first
+batch of entries (V,P remap, `snes_atol`, cold-restart, SNES
+line-search variants, GAMG anisotropy tuning, direct inner solve,
+`mesh.quality()`, the error-estimator and geometric-MG design
+arcs). **Intended growth:** entries for the other PDE families
+(Darcy, Poisson, Navier–Stokes), time-order guidance for VE/VEP
+(consolidating the existing project-memory findings on BDF
+order, yield-coupling, two-Stokes split, dt-yield interactions),
+variable-viscosity / viscosity-contrast pressure-space choices,
+and viscoplastic flow strategies. Extend as those threads land
+or as referenced project memories are touched.
+
+The investigation's mental model: solver fragility on an adaptive
+problem has several *independent* failure classes, each with its
+own appropriate cure. Reaching for the wrong cure for a given
+failure can give the right answer for the wrong reason and mask
+the real cause — so it's worth being explicit about which cure
+addresses which class.
+
+## Failure classes — quick reference
+
+| symptom | underlying class | indicated cure |
+|---|---|---|
+| Re-solving a near-solved state fails (`DIVERGED_LINE_SEARCH` from a tiny initial residual) | guess-relative-only convergence (`snes_atol` unset) | **snes_atol** absolute path |
+| Warm-start from a stale guess on a just-moved mesh fails | V,P not remapped across the mesh move | **V,P remap** (mirror T) |
+| Warm-start fails through a violent transient *despite* a fresh, correct previous solution | inner KSP gives an inexact Newton step that `bt` line search rejects on an anisotropic operator | **accurate inner solve** (best PC for the operator), or **bypass the line search** (l2 / direct) |
+| Failures recur in same-mesh bursts after a single failure | corrupted V,P propagates as next warm start | **cold-restart fallback** |
+| Adaptation degrades element regularity → AMG aggregation degrades | mesh-quality side of the coupled mesh⇄solver problem | **mesh.quality()** monitoring + a less-aggressive grading dial (the equidist `resolution_ratio` is the user-facing one; legacy `coarsen_cap` / `aniso_cap` are demoted overrides) |
+| Refinement bunches even where it isn't needed; can't say "add nodes" | percentile metric is relative-not-absolute | **error-estimator-driven metric** (design arc) |
+
+## Diagnostics (harness)
+
+The harness (`scripts/adaptive_saturation.py`) carries the flags
+the investigation accumulated; they belong as durable diagnostic
+tooling, not just one-off probes:
+
+* `--snes-debug` — after each adv/Stokes solve, query
+  `snes.getConvergedReason()` + `getIterationNumber()` and tag
+  which physics solver diverged + reason code + iter count.
+  Replaces the solver-anonymous PETSc retry message. **Does not**
+  set global PETSc viewers (they leak into the mover's `ksponly`
+  sub-solves and spam phantom `DIVERGED_MAX_IT iterations 0`).
+* `--resume-from N` + `--src-tag SRC` — restart from a specific
+  checkpoint of another model, write outputs under the current
+  `--model` tag. Enables the *clean-restart probe* pattern: a
+  reproducible failure window from a known state without re-running
+  the entire trajectory.
+* `--stokes-cold-recover N` — see "cold-restart fallback" below.
+* `--no-vp-remap` — A/B disable the V,P remap; see "V,P remap".
+* `--stokes-snes-opt {default,basic,l2,tr,ksponly,direct,
+  gamg-n1,gamg-thr,gamg-noagr,gamg-sor,gamg-full,...}` — selects a
+  preset bundle of PETSc options on the Stokes solver; see the
+  SNES-line-search and GAMG sections.
+* `--stokes-snes-atol-auto` — captures cold ‖F₀‖ and sets a fixed
+  `snes_atol`; see "snes_atol" / `snes-atol-convergence-scale.md`.
+
+The **`mesh.quality()` API** + the `view()` summary line is the
+mesh-side diagnostic — shape quality `q = 4√3·A/Σℓ²` (min /
+percentiles), max interior angle, aspect ratio, neighbour
+size-jump, and the joint "large-AND-stretched" cell count. The
+relevant *tail* metrics for FE conditioning are what `minA/meanA`
+hides.
+
+## V,P remap on mesh move
+
+**Class:** correctness, esp. for *nonlinear* solves where Newton
+has a small convergence basin.
+
+**What:** when the mesh moves, evaluate the previous V,P at the
+new DOF coords (the same FE-evaluate-at-new-coords that T already
+gets) and write the results onto the new mesh — *do not* leave the
+old nodal values on moved nodes.
+
+**Evidence:** without it, warm-start at adapt steps takes a
+spatially-scrambled guess → `DIVERGED_LINE_SEARCH` on adapt steps
+specifically (a16r15v cleared all adapt-step failures by remapping;
+24 → 24 (no adapt) vs failures concentrated on non-adapt steps).
+
+**Status:** implemented in the harness adapt functions
+(`adapt_local_fe_interp`, `adapt_pristine`). **Production version
+belongs in UW3's adaptation/deform path** (gated). The pristine /
+local subtlety: V,P live in `X_prev` geometry (not the pristine
+X0c geometry that T transfers through) — that asymmetry matters.
+
+## `snes_atol` — guess-independent convergence
+
+**Class:** near-converged-guess re-solve (steady-state continuation,
+restarts, lightly-evolving). The PETSc default `snes_atol ~ 1e-50`
+makes the absolute convergence path effectively dead; UW3 sets
+`snes_rtol` but not `snes_atol` ⇒ only the guess-relative
+`rtol·‖F(x₀)‖` criterion is live.
+
+**What:** set `snes_atol` to the problem's natural residual scale
+(e.g. `rtol · ‖F(x=0)‖_current`, recomputed per warm solve,
+temporarily applied and restored), so `SNES_CONVERGED_FNORM_ABS`
+fires at it==0 when the warm guess is already good — zero Newton
+iterations.
+
+**Evidence:** PETSc 3.25 `SNESConvergedDefault` source-verified;
+confirmation experiment showed exactly the predicted behaviour
+(works for the near-converged class; *did not* fix the violent-
+transient class — that's a different mechanism).
+
+**Status:** design note in `snes-atol-convergence-scale.md`,
+gated on sign-off + benchmarking. Internal/automatic, **no user
+API** (user never sets `atol` directly).
+
+## Cold-restart fallback
+
+**Class:** operational safety net for any divergence that survives
+the other fixes (genuine nonlinear divergence from a bad guess
+where no line-search config rescues; or the transient case
+described next, before the inner-solve fix is in place).
+
+**What:** on a Stokes `DIVERGED_LINE_SEARCH` (or any negative
+reason), discard the (now corrupted) warm V,P and re-solve cold
+(`zero_init_guess=True`) on the *same mesh, same T* before
+advancing. Warm-first, cold-on-failure — standard robust
+nonlinear-solver practice.
+
+**Evidence:** harness `--stokes-cold-recover N`. a16r15r:
+31/31 recoveries succeeded; the run settled cleanly. Important
+nuance: in a violent transient, cold-restart fires on *runs* of
+consecutive steps (not isolated events) — every step warm-fails
+because the previous step's true solution is itself a poor Newton
+start for the next step. Cold-restart guarantees correctness, but
+in the violent transient regime is *not* cheap (one cold solve
+per step in the danger window).
+
+**Status:** harness flag; production = port to UW3's SNES
+solve() path.
+
+## SNES line search / type variants
+
+| `--stokes-snes-opt` | mechanism | takeaway |
+|---|---|---|
+| `default` (`newtonls`+`bt`) | full backtracking | the existing default; brittle to inexact Newton steps |
+| `basic` | full step, no backtracking | works on *linear* problems; **removes globalisation → unsafe nonlinear** — diagnostic only |
+| `l2` | minimises ‖F‖ along the Newton direction | clean *general* line-search variant (legitimate fallback), but **slow** (extra residual evaluations); fixes the bt-rejection symptom, not the cause |
+| `tr` (`newtontr`) | trust region | **hopeless on the Stokes saddle point** (indefinite Jacobian, TR quadratic model ill-posed); 98 fails at *step 1* — do not use |
+| `ksponly` | one linear KSP solve, no Newton/line-search | works only because Stokes is *linear* here; **invalid for nonlinear rheology** |
+| `direct` | MUMPS LU on the full Stokes Jacobian | exact inner solve; 24→0 warm divergences. **Gold standard at small/2D scale; not feasible at scale** |
+
+The cleanest pattern (from the GAMG sweep, see below): **none of
+these is the production cure.** The principled fix is to make the
+*existing* default `newtonls`+`bt` work, by giving it an *accurate
+enough Newton step* — i.e., fix the inner KSP/PC, not the outer
+line search.
+
+## GAMG anisotropy tuning
+
+**Class:** AMG aggregation defaults degrade on anisotropic
+operators (stretched / graded cells from adaptive refinement),
+producing aggregates that span the weak direction. The inner KSP
+under-converges the Newton correction; `bt` line search rejects
+the step; SNES reports `DIVERGED_LINE_SEARCH`.
+
+UW3's default Stokes PC is **GAMG (aggregation AMG)** with
+`pc_gamg_type=agg`, `pc_gamg_agg_nsmooths=2` (PETSc default is 1),
+`pc_mg_type=additive`. Smoother defaults: Chebyshev + Jacobi.
+
+**CRITICAL — option scope (corrected 2026-05-20).** UW3 Stokes
+nests its GAMG inside the velocity Schur sub-block at prefix
+``fieldsplit_velocity_pc_gamg_*`` (see
+``cython/petsc_generic_snes_solvers.pyx`` ~L4199-4205). Setting
+``pc_gamg_*`` at the bare/global scope ⇒ silent no-op — PETSc
+reads the option key at the velocity sub-block prefix and never
+inherits from the bare prefix. Verified bit-identical KSP
+residuals to default on a static one-shot probe
+(``scripts/_sl_preset_verify.py``), and bit-identical warm-fail
+signature to default on the dynamic 40-step probe (both gave
+4 fails at steps 61-64 with iter counts [4,6,4,1] —
+indistinguishable).
+
+The **earlier GAMG sweep in this catalogue used the WRONG
+scope** and therefore "validated" a string of no-op presets
+against each other. Re-run with the correct
+``fieldsplit_velocity_pc_gamg_*`` prefix gives a very different
+table — including one preset that **actively breaks** the
+solver:
+
+**Corrected sweep (restart-from-50 testbed, 40 steps, baseline
+4 warm DIVERGED at steps 61–64, all options at the proper
+``fieldsplit_velocity_*`` prefix):**
+
+| `--stokes-snes-opt` | option(s) (at `fieldsplit_velocity_*` prefix) | warm fails | mechanism |
+|---|---|---|---|
+| `gamg-n1-corr` | `pc_gamg_agg_nsmooths=1` (PETSc default) | **0** ✓ | revert UW3's `=2` override; smoothed aggregates of degree 2 on graded mesh hurt |
+| `gamg-thr-corr` | `pc_gamg_threshold=0.02`, `threshold_scale=0.5` | **23** ✗ DANGEROUS | aggressive thresholding prunes the weak-direction connections AMG actually needs on adapted velocity operator — *worse* than default |
+| `gamg-noagr-corr` | `pc_gamg_aggressive_coarsening=0` | **0** ✓ | suppress finest-level MIS-2 aggressive coarsening |
+| `gamg-sor-corr` | `mg_levels_ksp_type=richardson`, `pc_type=sor`, `ksp_max_it=2` | **0** ✓ | stronger smoother absorbs sub-optimal aggregates |
+| `gamg-full-corr` | combined | **0** ✓ | no improvement over single fixes |
+| `gamg-noagrsor-corr` | noagr + sor | **0** ✓ | no improvement over either alone |
+
+**Findings:**
+
+1. Five of six correct-scope variants close the failure window
+   independently. They produce indistinguishable wall times
+   (≈5 min for 40 steps at res-16) → no clear performance winner
+   on this small problem. Any of them can serve as the
+   surgical fix.
+2. **`gamg-thr-corr` is dangerous** — 23 fails vs 4 baseline.
+   The threshold+threshold_scale pair at the velocity sub-block
+   removes structure GAMG needs. Do not use. (Was silently a
+   no-op at the wrong scope, masking this danger.)
+3. The mechanistic story (Cheb+Jac × poor aggregates →
+   divergence; fix either side and it works) survives — the
+   evidence base just shrunk to noagr/n1/sor/full/noagrsor.
+
+**Recommended UW3 default change (corrected):**
+``fieldsplit_velocity_pc_gamg_aggressive_coarsening = 0`` on the
+Stokes solver. Single integer; surgical; closes the failure
+window; preserves Cheb+Jac for HPC parallel scalability. **Note
+the scope** — bare ``pc_gamg_aggressive_coarsening = 0`` does
+nothing.
+
+**Verification methodology (mandatory for future GAMG-tuning
+claims):** before claiming a tuning helps, verify the option is
+actually applied to the GAMG instance it targets. Static probe:
+run the SAME problem twice with and without the option, on a
+fixed T snapshot, with ``snes_monitor`` and ``ksp_monitor``
+enabled. If the KSP residual values are bit-identical between
+the two runs, the option is a no-op (wrong scope) and any
+"benefit" elsewhere is illusory. See
+``scripts/_sl_preset_verify.py`` for the verification harness.
+
+**Caveats:**
+- The 40-step restart probe is a narrow window (4 failure
+  opportunities). Closing it does *not* prove a candidate
+  survives a full settled trajectory or harder problems.
+- These tests are on a *simple* PDE (constant-viscosity
+  Stokes, T-fixed buoyancy). The story may differ with
+  nonlinear rheology / yield / temperature- and strain-rate-
+  dependent viscosity. The next stress test is the harder
+  PDE family, not more aggressive Ra=1e6 of the same simple
+  problem.
+
+## Direct inner solve (MUMPS LU)
+
+**Class:** the gold-standard *demonstration* of the
+"accurate-inner-Newton-step → bt accepts λ=1 → robust" mechanism.
+At small/2D scale (e.g. res-16 annulus), MUMPS LU on the full
+Stokes Jacobian is cheap and exact.
+
+**What:** `pc_type=lu`, `ksp_type=preonly`,
+`pc_factor_mat_solver_type=mumps`, `mat_mumps_icntl_24=1`.
+
+**Evidence:** a16r15d (warm, default `bt`, no recover) → 0 warm
+DIVERGED (vs 24 baseline). The cleanest single-experiment proof
+that the failure is inner-step accuracy, not the outer solver
+type.
+
+**Status:** keep as a diagnostic / sanity tool. Generalise as
+"solve the inner Newton correction accurately on the adapted
+operator" — implemented in production via tight KSP or strong PC
+(see GAMG-tuning above), *not* by always-direct.
+
+## Error-estimator-driven metric (design arc)
+
+**Class:** the absolute, resolution-aware refinement criterion —
+the principled successor to the percentile metric. The
+percentile is purely relative (always bunches the top X% of
+*whatever* distribution; can't say "this needs more nodes than
+redistribution can give"; can't recognise "the uniform mesh is
+already fine"). This is the *adaptation analogue* of the missing
+`snes_atol`: in both cases the fix is "judge against the problem,
+not the distribution."
+
+**Routes:**
+- *(a) Recovery-based (ZZ) — cheap first cut:* recovered ∇u minus
+  FE ∇u as a per-cell error indicator. Reuses the existing
+  projected-gradient machinery; no hierarchy needed.
+- *(b) Hierarchical / τ two-grid estimator (richer):* leverage
+  UW3's `dm_hierarchy` for both the error estimator *and* a
+  **geometric multigrid preconditioner** that sidesteps
+  AMG-anisotropy entirely. Two birds from one structure.
+
+**Status:** scoped, not started. To be written up as a design
+note (cf. `snes-atol-convergence-scale.md`) before implementation.
+
+## Geometric MG via `dm_hierarchy`
+
+**Class:** the alternative to AMG that is *inherently*
+anisotropy-robust (the hierarchy is built geometrically, not from
+the operator's connection graph).
+
+**Status:** UW3 has `dm_hierarchy` / `refineHierarchy`
+infrastructure. Pairs naturally with the error-estimator design
+arc — the same multi-level structure yields both the
+anisotropy-robust PC and the absolute error indicator.
+
+## Mesh-quality / `mesh.quality()` API
+
+**Class:** the diagnostic on the *mesh* side of the coupled
+mesh⇄solver problem.
+
+**What:** `mesh.quality()` returns per-mesh aggregate + tail
+metrics — shape quality `q = 4√3·A/Σℓ²` (min, percentiles,
+mean), max interior angle, aspect ratio (max, p99), neighbour
+size-jump, joint "large-AND-stretched" count, plus the dimension-
+agnostic `vol_min_over_mean`. `mesh.view()` prints a one-line
+summary with a hazard flag for `q<0.2` cells.
+
+**Why it matters here:** bulk `minA/meanA` hid the equidist mover's
+poor-cell problem; the tail metrics exposed it. AMG aggregation
+degrades on poor cells (the GAMG anisotropy section above) —
+mesh-quality monitoring is therefore not aesthetic, it directly
+predicts solver robustness.
+
+## Failure-class → strategy map (the picking guide)
+
+```
+Symptom                           First-line cure         Backup
+--------------------------------  ----------------------  ----------------------
+"Re-solve = no fewer iterations"  snes_atol               cold-restart
+than a fresh solve                                        
+                                                          
+Warm-start fails at adapt step    V,P remap               cold-restart
+                                                          
+Warm-start fails in violent       Accurate inner          cold-restart
+transient (non-adapt)             solve (GAMG tuning /    + l2 (slow but safe)
+                                  direct at small scale)  
+                                                          
+Adaptive metric bunches a smooth  (design arc) error-     reduce R / use coarsen
+solution / can't signal "more     estimator metric        cap; mesh.quality()
+nodes needed"                                             monitors regularity
+                                                          
+AMG diverges on adapted mesh      pc_gamg_aggressive_     gamg-thr; gamg-sor
+                                  coarsening=0            geometric MG (long-term)
+```
+
+## Open follow-ups
+
+- Fresh full-settled validation of `pc_gamg_aggressive_coarsening=0`
+  alone on a16r15-equivalent (verify 24→0 on the full trajectory,
+  not just the 40-step probe).
+- Combined `gamg-noagrsor` discriminator run (in flight as of the
+  catalogue's first draft).
+- Test the strategies on a harder PDE family (nonlinear /
+  temperature- or strain-rate-dependent viscosity, yield) — the
+  current evidence is on simple Stokes only.
+- Design notes: error-estimator metric; geometric-MG via
+  `dm_hierarchy`.
+- Port the harness-side fixes (V,P remap, cold-restart) into the
+  UW3 core (adaptation/deform path + SNES `solve()`).
+
+## Related artefacts
+
+- `docs/developer/design/snes-atol-convergence-scale.md` —
+  full design note for the snes_atol fix.
+- `docs/developer/design/mesh-adaptation-formulation.md` —
+  the equidistribution mover formulation + single-knob
+  `resolution_ratio` API.
+- `scripts/adaptive_saturation.py` — the diagnostic harness
+  (the flags listed under "Diagnostics" above).
+- `scripts/_cellquality.py`, `_dial_quality_compare.py`,
+  `_pctl_parallel_check.py`, `_equidist_probe.py` — focused
+  validation / sweep scripts kept for reproducibility.
diff --git a/docs/developer/index.md b/docs/developer/index.md
index c53dcfc8..498f0a59 100644
--- a/docs/developer/index.md
+++ b/docs/developer/index.md
@@ -126,6 +126,7 @@ design/UNITS_SIMPLIFIED_DESIGN_2025-11
 design/WHY_UNITS_NOT_DIMENSIONALITY
 design/SYMBOL_DISAMBIGUATION_2025-12
 design/ADAPTIVE_MESHING_DESIGN
+design/mesh-adaptation-formulation
 design/ARCHITECTURE_ANALYSIS
 design/MATHEMATICAL_MIXIN_DESIGN
 design/COORDINATE_MIGRATION_GUIDE
diff --git a/docs/developer/subsystems/mesh-metric-redistribution.md b/docs/developer/subsystems/mesh-metric-redistribution.md
new file mode 100644
index 00000000..e4ce5a99
--- /dev/null
+++ b/docs/developer/subsystems/mesh-metric-redistribution.md
@@ -0,0 +1,245 @@
+# Metric-driven mesh redistribution (`smooth_mesh_interior`)
+
+Topology-preserving node redistribution toward a target *size /
+density* field. Vertex indices, DOF maps and the parallel partition
+are **unchanged** — only coordinates move (contrast `mesh.adapt()`,
+which remeshes / changes topology).
+
+> **Mathematics:** the full derivations (optimal-transport /
+> Monge–Ampère, the volumetric spring, the anisotropic
+> metric-tensor / Winslow mover, the gradient-metric construction,
+> dynamic field handling, and the Nusselt diagnostic) are in
+> {doc}`/developer/design/mesh-adaptation-formulation`. This page is
+> the operational guide.
+
+```python
+import underworld3 as uw
+from underworld3.meshing import smooth_mesh_interior
+
+smooth_mesh_interior(mesh, metric=f, method="spring")       # fast
+smooth_mesh_interior(mesh, metric=f, method="ma")           # robust
+smooth_mesh_interior(mesh, metric=f, method="anisotropic")  # cleanest, aligned
+```
+
+## When to use it
+
+- **Restore the grading of a previously-adapted mesh** after it has
+  deformed (free-surface evolution, large strain): a *Lagrangian*
+  metric rides the material points and pulls the design grading
+  back.
+- **Concentrate resolution at a feature** — bunch nodes by a factor
+  of ~2 around a high-gradient region (e.g. a moving fault, a
+  thermal boundary layer) without adding points.
+
+```{important}
+With a **fixed node count** the achievable grading is bounded:
+≈1.3–1.8× deep/near on the test problems. The optimal-transport
+ideal (≈10× for an 8× density target) requires *more nodes* — a
+topology change (`mesh.adapt`), not this smoother. A ×2-ish
+bunching is squarely in range; do not expect extreme refinement
+from redistribution alone.
+```
+
+## The metric
+
+A strictly-positive density expression; larger ⇒ smaller cells.
+For Lagrangian behaviour, build it from a frozen state variable
+set **once** to the reference coordinate and never reassigned, so
+its value rides each material point through deformation:
+
+```python
+r0 = uw.discretisation.MeshVariable("r0", mesh,
+        vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+r0.data[:, 0] = np.linalg.norm(mesh.X.coords, axis=1)   # set once
+f = 1 + 8 * sympy.exp(-((r0.sym[0] - 1.0) / 0.12) ** 2)  # design grading
+```
+
+`metric=None` (default) is the original graph-Laplacian Jacobi
+smoother (equalises connectivity; no grading) — unchanged.
+
+## The three solvers
+
+| | `method="spring"` (default) | `method="ma"` | `method="anisotropic"` |
+|---|---|---|---|
+| Operator | Volumetric elastic-spring equilibrium | Benamou–Froese–Oberman Monge–Ampère | Decoupled direct M-weighted Laplace (Winslow) coordinate map |
+| Idea | *equal* edge springs (shape) **+** per-cell `A0 ∝ 1/ρ_tgt` (size) | `det(I+D²φ)=g`, move by ∇φ, recovered-Hessian damped Picard | `∇·(D∇u_c)=-Σ∂_jD_{jc}`, `D` = eigen-clamped gradient-derived **tensor** `M`, fixed Lagrangian, damped MMPDE |
+| Refinement | isotropic | isotropic | **anisotropic** (cells short ⟂ feature, long along it) |
+| Cost (res-16 Annulus) | **~0.3 s** | ~12–20 s (~60×) | ~3 s (linear, no Picard) |
+| Grading magnitude | 1.65 / 1.79 | 1.71 / 1.54 | mild (trades magnitude for alignment) |
+| Mesh quality (minA/meanA) | healthy | healthy but **slivers near a sharp feature** (≈0.02–0.18) | **cleanest — never slivers** (≈0.24–0.50, 2.6–12× MA) |
+| Boundary sensitivity | high (see `boundary_slip`) | low (natural Neumann) | low (homogeneous Dirichlet, non-singular) |
+
+**Recommendation:** `spring` for routine per-step use in
+time-stepping (cheap, robust). `ma` when isotropic refinement
+*magnitude* around a localised feature matters and the cost is
+affordable. `anisotropic` when **cell alignment / quality**
+matters — it is the cleanest (never slivers, linear/cheap) and
+reshapes cells to the feature, but deliberately grades *gently*:
+it does **not** beat the fixed node-count cap, and for a
+*separable* feature the explicit 1-D OT (radial/angular
+cumulative-mass inversion) is exact and strictly cheaper —
+`anisotropic` earns its keep on the general **non-separable**
+case. See `docs/developer/design/ma-newton-cofactor-exploration.md`
+("(3) anisotropic mover — IMPLEMENTED & VALIDATED").
+
+### `boundary_slip`
+
+Off by default. When on, boundary nodes slide *tangentially* along
+their boundary and are snapped back to it every step (the radial
+DOF is removed — they **cannot** leave the surface; drift is
+machine-ε; circular/spherical boundaries only, serial). It
+**strongly helps the spring** (~+10 % grading, ~3× faster — its
+hard-pinned boundary was the bottleneck) and is a **near-no-op for
+`ma`**. It is off by default because for a free surface the
+boundary *is* the moving surface and sliding interacts with the
+free-surface coupling — enable per use-context.
+
+```{warning}
+The per-ring radius projection is exact only for
+circular/spherical boundaries. A general deformed / free-surface
+boundary needs projection onto the boundary *polyline* instead —
+not yet implemented (matters for the spring; low priority for MA,
+which is insensitive to the boundary treatment).
+```
+
+## Implementation notes
+
+- Spring equilibrium = minimise `½Σ_e((|x_i-x_j|-L̄)/L̄)² +
+  size_w·Σ_t((A_t-A0_t)/A0_t)²` by Jacobi-preconditioned nonlinear
+  CG (Polak–Ribière⁺) with an Armijo line search that **rejects
+  cell-inverting steps** (the tangle guard is inside the
+  optimiser). `shape_w/size_w` default 1/8 — results are robust to
+  them.
+- MA uses the core `SNES_Scalar.constant_nullspace` hook
+  (`petsc_generic_snes_solvers.pyx`) and a variationally-consistent
+  weak Hessian recovery (`_hessian_recovery_class`, an SPD
+  mass-matrix `SNES_MultiComponent` solve — only first derivatives
+  of φ, since UW3 forbids second derivatives of mesh-variable
+  functions).
+- **MA solver efficiency** (`_use_direct_solver`, 2026-05-17). The
+  Picard loop fixes the mesh, so the φ-Poisson Laplacian and the
+  Hessian-recovery mass matrix are *constant operators* re-solved
+  ~25× with only the RHS changing. The UW3 default (GMRES + GAMG)
+  paid a full multigrid **setup every inner solve** (the constant
+  near-nullspace re-attach forces it) — ~0.9 s/iter for the Hessian
+  alone. The cached φ/Hessian/∇φ sub-solvers are therefore put on:
+
+  | option | φ Poisson | Hessian / ∇φ |
+  |---|---|---|
+  | `snes_type` | `ksponly` | `ksponly` |
+  | `ksp_type` | `preonly` | `preonly` |
+  | `pc_type` | `lu` | `lu` |
+  | `pc_factor_mat_solver_type` | `mumps` | `mumps` |
+  | `snes_lag_jacobian` | `-2` | `-2` |
+  | `snes_lag_preconditioner` | `-2` | `-2` |
+  | `mat_mumps_icntl_24` | `1` (null-pivot) | — |
+
+  The lag (`-2` = compute once, never again) confines the
+  factorisation to the **first** inner solve; the rest are MUMPS
+  back-substitutions. `_deform_mesh` rebuilds the SNES
+  (`is_setup=False`) so the lag counter resets and the operator is
+  correctly re-factorised on the next call's first solve — reuse
+  never spans a coordinate change. A direct solve is *exact*
+  (tighter than the GMRES rtol) so the Picard fixed point — hence
+  the grading/quality — is **bit-for-bit unchanged** (validated
+  `ma_cost_grading.py`: d/n 1.02/1.43/1.71/1.54 identical to the
+  GAMG baseline). `n_picard` default 40→25 (grading flat from
+  iter ≈20).
+- **φ order: `phi_degree` default 3 → 2.** The deep/near grading is
+  set by the φ *order*, not the solver: P2 ≡ P3 to ~3 dp across
+  AMP 0/2/8/20 (matches the recorded baseline; AMP=0 no-op exact;
+  no tangle) at **~2× lower cost** (smaller matrices — which also
+  *help* the direct factorisation scale). **P1 is not
+  grading-equivalent** (≈1.40 vs 1.71 at AMP=8, ~18 % weaker) — P2
+  is the floor. Net with the reuse work: cold ~12–18 s → **~0.7–0.9 s**
+  (canonical `cost_compare.py`), grading bit-for-bit, ≈15–20×.
+
+  ```{warning}
+  Sparse direct factorisation does not scale to large-3D parallel
+  per-timestep use, so a `linear_solver="gamg"` path applying the
+  same factor/setup-once-reuse to FGMRES + GAMG was prototyped
+  (selectable; `"direct"` remains the default and sole validated
+  path). Findings: the constant nullspace **is** correctly wired
+  (verified — not the failure); P3 was a **major GAMG confound**
+  (P2+gamg converges where P3+gamg catastrophically diverges to a
+  no-op); but even at P2 the *warm* (post-`_deform_mesh`) GAMG
+  re-solve stays erratic, and this build has no alternative AMG
+  (hypre/ML absent). The reuse *pattern* is sound; GAMG on this
+  pure-Neumann operator is not robust here. Accepted position:
+  **MUMPS direct is fine for now** (it is itself MPI-parallel) and
+  the P2 size reduction only helps it. A robust iterative path would
+  still need the pure-Neumann operator de-fragilised (single
+  Dirichlet pin, not the constant nullspace — ∇φ unaffected by the
+  additive constant) and/or hypre, and is **gated behind** the
+  smoother first becoming parallel-exact in *assembly* + 3D (the
+  solver is not the parallel bottleneck yet). Full results:
+  `docs/developer/design/ma-newton-cofactor-exploration.md`.
+  ```
+- Both paths are **serial-exact**; spring/MA edge & cell sums are
+  accumulated over locally-visible entities, so rank-partition
+  boundaries under-count in parallel (the Jacobi `metric=None` path
+  *is* parallel-exact). Cross-rank assembly is future work.
+
+## Validation & diagnostics
+
+`scripts/` (not packaged): `show_metric_mesh.py` /
+`plot_metric_meshes.py` (Annulus surface band, Spring vs MA, honest
+metric + mesh pictures), `interior_refine.py` (localised interior
+blob — the realistic case), `slip_test.py` / `ma_slip_test.py`
+(boundary-slip A/B), `setup_sanity.py` (metric/pinning sanity),
+`ma_analytic_check.py` (exact radial equidistribution ground
+truth), `cost_compare.py`. Figures land in `/tmp/metric_mesh/`.
+
+```{note}
+The honest grading metric is **per-node mean incident edge length
+binned by final radius** (deep/near). An earlier centroid-band
+metric averaged the thin strong layer with the bulk Lagrangian
+shift and understated grading ~40% — use the per-node metric.
+```
+
+## Open items (future sessions)
+
+- **Monge–Ampère efficiency** — *done* (2026-05-17): ~10× via
+  factor-once-reuse direct sub-solves, grading bit-for-bit unchanged
+  (see the Implementation note). Two follow-on directions were then
+  explored and **both closed negative** (see the design doc
+  `ma-newton-cofactor-exploration.md`):
+  - *Newton / cofactor linearisation* — tested (Phase 0). Same MA
+    equation ⇒ same fixed-node grading (settled, not a lever), and it
+    is *less robust* than BFO at the recovered-Hessian quality
+    available (all standard convexity safeguards fail to reach BFO's
+    fixed point). UW3 forbids 2nd derivatives of mesh-var functions
+    so a sharp `D²φ` isn't available. Do not pursue.
+  - *BFO + GAMG-reuse* (parallel path) — prototyped as
+    `linear_solver="gamg"`. Reuse pattern fires; nullspace verified
+    correctly wired (not the failure); P3 was a major confound
+    (P2+gamg converges where P3+gamg diverges) but even at P2 the
+    warm re-solve stays erratic; no alternative AMG in this build.
+    Accepted: MUMPS direct (itself MPI-parallel) for now, and the
+    P2 size cut only helps it. A robust iterative path needs a
+    single-Dirichlet-pin (not the constant nullspace) and/or hypre,
+    gated behind parallel-exact assembly + 3D (the solver is not the
+    bottleneck yet). *Bankable spin-off: `phi_degree` default 3→2 —
+    grading-identical, ~2× cheaper, see the Implementation note.*
+- **Anisotropic tensor mover** (`method="anisotropic"`) — *done*
+  (2026-05-18, validated prototype). Decoupled direct M-weighted
+  Laplace coordinate map with the eigen-clamped gradient-derived
+  metric tensor; fixed-Lagrangian `D`, damped MMPDE. Cleanest
+  method everywhere (never slivers, 2.6–12× MA's minA/meanA),
+  linear/cheap; trades grading magnitude for clean anisotropic
+  alignment — does not beat the node-count cap. **GAMG validated
+  (2026-05-18): bit-parity with direct, no pure-Neumann fragility
+  (the operator is non-singular) — the parallel-scalable path is
+  real; cost ~O(N), ≈ a handful of SPD elliptic solves per
+  adaptation step.** Open follow-ups (out of prototype scope):
+  cross-rank parallel-exact assembly + MPI weak-scaling; **3D**
+  (solver core already dim-general — needs a tet signed-volume
+  backtrack + the `cdim!=2` guard removed); the **coupled/inverse**
+  Winslow (RKC-non-folding) to admit `aniso_cap ≳ 6`;
+  Hessian-based `M=|H(ρ)|` for feature-*core* resolution; the
+  solution-accuracy/cost study + the dynamic-adaptive loop. See
+  the design doc "(3) … IMPLEMENTED & VALIDATED" (Architecture /
+  GAMG+cost / limitations / corners).
+- General deformed / free-surface boundary slip (polyline
+  projection).
+- Parallel-exact spring/MA assembly.
diff --git a/docs/developer/subsystems/meshing.md b/docs/developer/subsystems/meshing.md
index ae7d0f1e..9681fbd3 100644
--- a/docs/developer/subsystems/meshing.md
+++ b/docs/developer/subsystems/meshing.md
@@ -60,6 +60,15 @@ This section needs:
 - Integration with discretisation system
 ```
 
+## Related
+
+- [Metric-driven mesh redistribution](mesh-metric-redistribution.md)
+  — topology-preserving node redistribution toward a target
+  size/density field (`smooth_mesh_interior`; spring &
+  Monge–Ampère methods). Restores the grading of a deformed
+  adapted mesh or bunches nodes ~2× at a feature; contrast
+  `mesh.adapt()` which remeshes.
+
 ---
 
 *This document serves as a placeholder for comprehensive meshing system documentation.*
\ No newline at end of file
diff --git a/scripts/_amp_check.py b/scripts/_amp_check.py
new file mode 100644
index 00000000..03f222fc
--- /dev/null
+++ b/scripts/_amp_check.py
@@ -0,0 +1,44 @@
+"""Does `amp` actually change the anisotropic mover's metric?
+The mover builds M = base[I + β ĝĝᵀ (|∇ρ|/gref)²], gref=max|∇ρ|.
+With ρ = 1 + amp·t the gradient is amp·∇t and gref is amp·max|∇t|
+⇒ (|∇ρ|/gref) and ĝ are amp-INVARIANT ⇒ M independent of amp.
+Verify numerically: build the metric at amp=16 and amp=24, project
+∇ρ, compare the normalised-gradient field the mover actually uses.
+"""
+import numpy as np, sympy
+import underworld3 as uw
+from underworld3.meshing import metric_density_from_gradient
+
+m = uw.meshing.Annulus(radiusOuter=1.0, radiusInner=0.5,
+                       cellSize=1/16, qdegree=3)
+T = uw.discretisation.MeshVariable("T", m, vtype=uw.VarType.SCALAR,
+                                   degree=3, continuous=True)
+r = np.sqrt((np.asarray(T.coords) ** 2).sum(1))
+T.data[:, 0] = np.exp(-((r - 0.75) / 0.1) ** 2)   # a feature
+X = m.CoordinateSystem.X
+
+
+def norm_grad_field(amp, name):
+    rho = metric_density_from_gradient(m, T, amp=amp, name=name)
+    g = uw.discretisation.MeshVariable(
+        f"g_{name}", m, vtype=uw.VarType.VECTOR, degree=1,
+        continuous=True)
+    p = uw.systems.Vector_Projection(m, g)
+    p.smoothing = 0.0
+    p.uw_function = sympy.Matrix([rho.diff(X[i])
+                                  for i in range(2)]).T
+    p.solve()
+    gv = np.asarray(uw.function.evaluate(
+        g.sym, np.asarray(g.coords))).reshape(-1, 2)
+    gn = np.linalg.norm(gv, axis=1)
+    gref = gn.max()
+    return gn / gref            # the (|∇ρ|/gref) the mover uses
+
+
+a16 = norm_grad_field(16.0, "a16")
+a24 = norm_grad_field(24.0, "a24")
+print(f"max |  (|∇ρ|/gref)_amp24  −  _amp16  | = "
+      f"{np.abs(a24 - a16).max():.3e}")
+print(f"⇒ metric tensor M is "
+      f"{'IDENTICAL (amp is a no-op)' if np.abs(a24-a16).max()<1e-9 else 'DIFFERENT'}"
+      f" between amp=16 and amp=24")
diff --git a/scripts/_nu_proper.py b/scripts/_nu_proper.py
new file mode 100644
index 00000000..b7e30e27
--- /dev/null
+++ b/scripts/_nu_proper.py
@@ -0,0 +1,106 @@
+"""Proper Nusselt: total radial heat flux q_r = v_r·T - ∂T/∂r
+projected to a nodal field, integrated over SHELLS (robust on
+interior shells where advection is smooth — not the
+BL-resolution-sensitive boundary ∂T/∂r).
+
+Decisive checks:
+  (A) analytic conduction (v=0, T=A ln r + B): every shell flow
+      must = Q_cond = 2π/ln(Ro/Ri) ⇒ Nu = 1 EXACTLY. This
+      objectively validates Q_cond and the method.
+  (B) a settled checkpoint: compare boundary-diffusive Nu (old
+      method) vs interior-shell total-flux Nu (proper) — if the
+      proper one is much larger the boundary integral was
+      under-resolving the 1-element BL (user's hypothesis).
+"""
+import numpy as np, glob, re, os
+import sympy
+import underworld3 as uw
+
+R_i, R_o = 0.5, 1.0
+Q_COND = 2.0 * np.pi / np.log(R_o / R_i)          # total cond flow
+D = "/tmp/metric_mesh/sat"
+
+
+def shell_flow(mesh, T, v, r, n=720):
+    """Total radial heat flow through the circle of radius r:
+    ∮ (v_r T - ∂T/∂r) r dθ, evaluated from a projected nodal
+    flux field (FE-consistent), sampled on the ring."""
+    X = mesh.CoordinateSystem.X
+    er = mesh.CoordinateSystem.unit_e_0
+    gradT_r = T.sym[0].diff(X[0]) * er[0] + T.sym[0].diff(X[1]) * er[1]
+    vr = (v.sym[0] * er[0] + v.sym[1] * er[1]) if v is not None \
+        else sympy.Integer(0)
+    qsym = vr * T.sym[0] - gradT_r
+    # project the flux to a nodal scalar (the user's "projected to
+    # the nodes") so the integrand is the FE field, not raw ∂T
+    qf = uw.discretisation.MeshVariable(
+        f"qr_{id(mesh)}", mesh, vtype=uw.VarType.SCALAR,
+        degree=2, continuous=True)
+    proj = uw.systems.Projection(mesh, qf)
+    proj.uw_function = qsym
+    proj.smoothing = 0.0
+    proj.solve()
+    th = np.linspace(0, 2 * np.pi, n, endpoint=False)
+    pts = np.column_stack([r * np.cos(th), r * np.sin(th)])
+    q = np.asarray(uw.function.evaluate(qf.sym[0], pts)).reshape(-1)
+    return float(q.mean() * r * 2.0 * np.pi)       # ∮ q r dθ
+
+
+def boundary_diffusive_flow(mesh, T):
+    X = mesh.CoordinateSystem.X
+    er = mesh.CoordinateSystem.unit_e_0
+    g = T.sym[0].diff(X[0]) * er[0] + T.sym[0].diff(X[1]) * er[1]
+    return -float(uw.maths.BdIntegral(
+        mesh, g, mesh.boundaries.Upper.name).evaluate())
+
+
+# ---- (A) analytic conduction test --------------------------------
+print(f"Q_cond (analytic total) = {Q_COND:.4f}")
+m = uw.meshing.Annulus(radiusOuter=R_o, radiusInner=R_i,
+                        cellSize=1.0 / 24, qdegree=3)
+Tc = uw.discretisation.MeshVariable("Tc", m, vtype=uw.VarType.SCALAR,
+                                    degree=3, continuous=True)
+A = 1.0 / np.log(R_i / R_o)
+r_e = np.sqrt((np.asarray(Tc.coords) ** 2).sum(axis=1))
+Tc.data[:, 0] = A * np.log(r_e) + (-A * np.log(R_o))   # =1@Ri,0@Ro
+print("(A) analytic conduction (v=0): Nu per shell "
+      "(must be ~1 everywhere):")
+for rr in (R_i + 1e-6, 0.6, 0.7, 0.75, 0.85, R_o - 1e-6):
+    fl = shell_flow(m, Tc, None, rr)
+    print(f"   r={rr:5.3f}  shell-flow={fl:8.4f}  "
+          f"Nu={fl / Q_COND:6.4f}")
+bd = boundary_diffusive_flow(m, Tc)
+print(f"   boundary-diffusive flow={bd:8.4f}  "
+      f"Nu={bd / Q_COND:6.4f}  (should also be ~1)")
+
+
+# ---- (B) settled checkpoints: old vs proper ----------------------
+def latest(tag):
+    ix = []
+    for f in glob.glob(f"{D}/sat_{tag}.mesh.T.*.h5"):
+        mm = re.search(r"\.mesh\.T\.(\d+)\.h5$", os.path.basename(f))
+        if mm:
+            ix.append(int(mm.group(1)))
+    return max(ix) if ix else None
+
+
+print("\n(B) settled checkpoints — Nu boundary-diffusive (old) "
+      "vs interior-shell total-flux (proper):")
+for tag in ("ref24", "u16", "a16s"):
+    i = latest(tag)
+    if i is None:
+        continue
+    mm = uw.discretisation.Mesh(f"{D}/sat_{tag}.mesh.{i:05}.h5")
+    T = uw.discretisation.MeshVariable("T", mm,
+        vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    V = uw.discretisation.MeshVariable("V", mm,
+        vtype=uw.VarType.VECTOR, degree=2, continuous=True)
+    T.read_timestep(f"sat_{tag}", "T", i, outputPath=D)
+    V.read_timestep(f"sat_{tag}", "V", i, outputPath=D)
+    bd = boundary_diffusive_flow(mm, T) / Q_COND
+    mid = shell_flow(mm, T, V, 0.5 * (R_i + R_o)) / Q_COND
+    s2 = shell_flow(mm, T, V, R_i + 0.30 * (R_o - R_i)) / Q_COND
+    s3 = shell_flow(mm, T, V, R_i + 0.70 * (R_o - R_i)) / Q_COND
+    print(f"  {tag:6s} ckpt{i:4d}: Nu_bdy(old)={bd:6.3f}  "
+          f"Nu_shell @0.30={s2:6.3f} @0.50={mid:6.3f} "
+          f"@0.70={s3:6.3f}  (steady ⇒ shells agree)")
diff --git a/scripts/_nu_recheck.py b/scripts/_nu_recheck.py
new file mode 100644
index 00000000..1f72056a
--- /dev/null
+++ b/scripts/_nu_recheck.py
@@ -0,0 +1,40 @@
+"""Validate the fixed Nusselt (workflow BdIntegral / conductive)
+on the existing settled checkpoints — no re-simulation."""
+import numpy as np, glob, re, os
+import underworld3 as uw
+
+D = "/tmp/metric_mesh/sat"
+QC = 2.0 * np.pi / np.log(1.0 / 0.5)   # annulus log-conduction flux
+
+
+def latest(tag):
+    ix = []
+    for f in glob.glob(f"{D}/sat_{tag}.mesh.T.*.h5"):
+        m = re.search(r"\.mesh\.T\.(\d+)\.h5$", os.path.basename(f))
+        if m:
+            ix.append(int(m.group(1)))
+    return max(ix) if ix else None
+
+
+print(f"Q_cond (annulus, logarithmic) = {QC:.4f}  "
+      f"(Nu = 1 at pure conduction)")
+print(f"{'run':>22} {'ckpt':>5} {'Q_meas':>9} {'Nu(fixed)':>9}  "
+      f"old-stencil")
+for tag, lab, old in [("ref24", "ref res-24", "~1.69"),
+                      ("u16", "uniform res-16", "~1.14"),
+                      ("a16p", "a16p conservative", "~1.13"),
+                      ("a16s", "a16s aggressive", "~1.13")]:
+    i = latest(tag)
+    if i is None:
+        print(f"{lab:>22}   (no ckpt)")
+        continue
+    m = uw.discretisation.Mesh(f"{D}/sat_{tag}.mesh.{i:05}.h5")
+    T = uw.discretisation.MeshVariable(
+        "T", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    T.read_timestep(f"sat_{tag}", "T", i, outputPath=D)
+    X = m.CoordinateSystem.X
+    er = m.CoordinateSystem.unit_e_0
+    g = T.sym[0].diff(X[0]) * er[0] + T.sym[0].diff(X[1]) * er[1]
+    Qm = -float(uw.maths.BdIntegral(
+        m, g, m.boundaries.Upper.name).evaluate())
+    print(f"{lab:>22} {i:5d} {Qm:9.3f} {Qm/QC:9.3f}      {old}")
diff --git a/scripts/_profile_OT.py b/scripts/_profile_OT.py
new file mode 100644
index 00000000..ef6a508b
--- /dev/null
+++ b/scripts/_profile_OT.py
@@ -0,0 +1,47 @@
+"""Per-OT-step timing breakdown. Want to know:
+  (a) first call vs subsequent (JIT amortization)
+  (b) where the dominant cost is — assembly? factor? solve?
+  (c) what cellSize / element count we're actually running at
+"""
+import os
+import sys
+import time
+import numpy as np
+import underworld3 as uw
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+t0 = time.time()
+m = build_uniform_mesh()
+print(f"mesh build:          {time.time()-t0:6.2f} s   "
+      f"({m.dm.getDepthStratum(0)[1]-m.dm.getDepthStratum(0)[0]} verts, "
+      f"{m.dm.getHeightStratum(0)[1]-m.dm.getHeightStratum(0)[0]} cells)")
+
+t0 = time.time()
+rho = analytic_rho(m)
+print(f"analytic_rho symb:   {time.time()-t0:6.2f} s")
+
+# First OT call — full setup + JIT compile
+t0 = time.time()
+uw.meshing.smooth_mesh_interior(
+    m, metric=rho, method="ot", verbose=False,
+    boundary_slip="box",
+    method_kwargs=dict(n_outer=1, relax=0.1, step_frac=0.3))
+print(f"OT call #1 (setup):  {time.time()-t0:6.2f} s")
+
+# Subsequent OT calls — should be cached
+for k in range(2, 7):
+    t0 = time.time()
+    uw.meshing.smooth_mesh_interior(
+        m, metric=rho, method="ot", verbose=False,
+        boundary_slip="box",
+        method_kwargs=dict(n_outer=1, relax=0.1, step_frac=0.3))
+    print(f"OT call #{k}:          {time.time()-t0:6.2f} s")
diff --git a/scripts/_profile_OT_inside.py b/scripts/_profile_OT_inside.py
new file mode 100644
index 00000000..dfbeffb1
--- /dev/null
+++ b/scripts/_profile_OT_inside.py
@@ -0,0 +1,95 @@
+"""Profile the cost breakdown INSIDE one OT step."""
+import os, sys, time
+import numpy as np
+import sympy
+import underworld3 as uw
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m = build_uniform_mesh()
+rho = analytic_rho(m)
+print(f"mesh: {m.dm.getDepthStratum(0)[1]} verts, "
+      f"{m.dm.getHeightStratum(0)[1]} cells")
+
+# Set up the same machinery the OT step uses, then time
+# individual pieces.
+phi = uw.discretisation.MeshVariable(
+    "prof_phi", m, vtype=uw.VarType.SCALAR, degree=2,
+    continuous=True)
+ps = uw.systems.Poisson(m, phi)
+ps.constitutive_model = uw.constitutive_models.DiffusionModel
+ps.constitutive_model.Parameters.diffusivity = rho
+ps.constant_nullspace = True
+vol_field = uw.discretisation.MeshVariable(
+    "prof_vol", m, vtype=uw.VarType.SCALAR, degree=1,
+    continuous=True)
+gradphi = uw.discretisation.MeshVariable(
+    "prof_gphi", m, vtype=uw.VarType.VECTOR, degree=1,
+    continuous=True)
+gproj = uw.systems.Vector_Projection(m, gradphi)
+gproj.smoothing = 0.0
+X = m.CoordinateSystem.X
+gproj.uw_function = sympy.Matrix(
+    [phi.sym[0].diff(X[i]) for i in range(2)]).T
+
+K_val = 1.0
+vol_field.data[:, 0] = 1.0
+f_src = rho * sympy.log(rho * vol_field.sym[0] /
+                         sympy.Float(K_val))
+ps.f = sympy.Matrix([[f_src]])
+
+# Warm-up — JIT compile + first solve setup
+t0 = time.time()
+ps.solve(zero_init_guess=True)
+print(f"ps.solve warm-up: {time.time()-t0:6.2f} s")
+
+t0 = time.time()
+gproj.solve()
+print(f"gproj.solve warm-up: {time.time()-t0:6.2f} s")
+
+# Now time subsequent solves (no f change, no mesh change)
+print("\n-- repeat solves WITHOUT changing ps.f or mesh:")
+for k in range(3):
+    t0 = time.time()
+    ps.solve(zero_init_guess=True)
+    print(f"  ps.solve #{k+2}: {time.time()-t0:6.2f} s")
+    t0 = time.time()
+    gproj.solve()
+    print(f"  gproj.solve #{k+2}: {time.time()-t0:6.2f} s")
+
+# Time solves WITH ps.f changed (new K_val each time)
+print("\n-- repeat solves WITH ps.f reassigned each time:")
+for k in range(3):
+    K_val = 1.0 + 0.1 * k
+    f_src = rho * sympy.log(rho * vol_field.sym[0] /
+                             sympy.Float(K_val))
+    t0 = time.time()
+    ps.f = sympy.Matrix([[f_src]])
+    print(f"  ps.f assign: {time.time()-t0:6.3f} s")
+    t0 = time.time()
+    ps.solve(zero_init_guess=True)
+    print(f"  ps.solve after f change: {time.time()-t0:6.2f} s")
+
+# Time solves WITH mesh deformed (re-assembly cost)
+print("\n-- repeat solves AFTER _deform_mesh:")
+for k in range(3):
+    coords = np.asarray(m.X.coords)
+    new = coords.copy()
+    new[:] += 1e-5 * np.random.randn(*new.shape)
+    t0 = time.time()
+    m._deform_mesh(new)
+    print(f"  _deform_mesh: {time.time()-t0:6.3f} s")
+    t0 = time.time()
+    ps.solve(zero_init_guess=True)
+    print(f"  ps.solve after deform: {time.time()-t0:6.2f} s")
+    t0 = time.time()
+    gproj.solve()
+    print(f"  gproj.solve after deform: {time.time()-t0:6.2f} s")
diff --git a/scripts/_test_metric_shapes_MA.py b/scripts/_test_metric_shapes_MA.py
new file mode 100644
index 00000000..70223ce9
--- /dev/null
+++ b/scripts/_test_metric_shapes_MA.py
@@ -0,0 +1,119 @@
+"""Smoke-test: drive the existing _winslow_elliptic (BFO
+convex-branch Picard MA solver) on the synthetic shapes
+with the analytic Eulerian ρ.
+
+If the existing MA implementation just works on this box
+geometry, we have our OT comparison without writing new
+code. Per the pivot memory, the prior failure modes were
+on Annulus + re-solve. Single-shot (n_outer=1) on a box
+may be fine.
+"""
+import os
+import sys
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_MA')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m = build_uniform_mesh()
+rho_sym = analytic_rho(m)
+
+# ρ background: render via a high-degree MeshVariable on a
+# separate UNDEFORMED reference mesh (uses the same pyvista
+# path as the T plots). Fixed in physical space — does not
+# move with the deforming mesh.
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(
+    uw.function.evaluate(rho_sym, np.asarray(T_bg.coords))
+).reshape(-1)
+rho_max = float(T_bg.data[:, 0].max())
+rho_clip = (1.0, rho_max)
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+# Capture mesh states across the MA Picard outer iters.
+captured = [np.asarray(m.X.coords).copy()]
+orig_deform = m._deform_mesh
+
+
+def deform_and_capture(new_coords, *args, **kwargs):
+    res = orig_deform(new_coords, *args, **kwargs)
+    captured.append(np.asarray(m.X.coords).copy())
+    return res
+
+
+m._deform_mesh = deform_and_capture
+
+print("Running MA (target-side ρ + box slip) on shapes...")
+try:
+    uw.meshing.smooth_mesh_interior(
+        m, metric=rho_sym, method="ma", verbose=True,
+        boundary_slip="box",
+        method_kwargs=dict(n_outer=1, n_picard=25, relax=1.0,
+                            target_side_rho=True))
+except Exception as e:
+    print(f"\nMA SOLVE FAILED: {type(e).__name__}: {e}")
+    import traceback
+    traceback.print_exc()
+m._deform_mesh = orig_deform
+
+print(f"\ncaptured {len(captured)} mesh states")
+for it, c in enumerate(captured):
+    if it == 0:
+        continue
+    dx = np.linalg.norm(c - captured[it - 1], axis=1)
+    cum = np.linalg.norm(c - captured[0], axis=1)
+    print(f"  outer {it}: this-step max={dx.max():.3e}  "
+          f"cumulative max={cum.max():.3e}")
+
+# Render: undeformed vs MA result
+panels = [(0, "iter 0 (undeformed)")] + [
+    (i, f"after MA outer {i}") for i in range(1, len(captured))]
+if len(panels) <= 1:
+    print("WARN: MA didn't deform the mesh — nothing to render")
+ncols = min(len(panels), 4)
+nrows = (len(panels) + ncols - 1) // ncols if panels else 1
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for panel_idx, (it, label) in enumerate(panels):
+    row, col = panel_idx // ncols, panel_idx % ncols
+    m_viz = build_uniform_mesh()
+    m_viz._deform_mesh(captured[it])
+    edges = vis.mesh_to_pv_mesh(m_viz).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(label, font_size=22, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_MA.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_MA_sweep.py b/scripts/_test_metric_shapes_MA_sweep.py
new file mode 100644
index 00000000..02c7750b
--- /dev/null
+++ b/scripts/_test_metric_shapes_MA_sweep.py
@@ -0,0 +1,113 @@
+"""MA convergence sweep: vary n_picard and n_outer with target-side
+ρ + box slip. Probe whether the residual phase mismatch + bulk
+under-redistribution come from incomplete Picard convergence
+(more n_picard) or insufficient outer composition (more n_outer).
+
+Panels:
+  A. 1 outer × 25 picard  (current best)
+  B. 1 outer × 100 picard (more Picard convergence)
+  C. 3 outer × 50 picard  (more outer composition)
+  D. 3 outer × 100 picard (both)
+"""
+import os
+import sys
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_MA_sweep')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+CASES = [
+    ("A. 1 outer x 25 picard",   1,  25),
+    ("B. 1 outer x 100 picard",  1, 100),
+    ("C. 3 outer x 50 picard",   3,  50),
+    ("D. 3 outer x 100 picard",  3, 100),
+]
+
+
+# ρ background (same physical position for every panel).
+m_bg = build_uniform_mesh()
+rho_bg_sym = analytic_rho(m_bg)
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    rho_bg_sym, np.asarray(T_bg.coords))).reshape(-1)
+rho_max = float(T_bg.data[:, 0].max())
+rho_clip = (1.0, rho_max)
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+for label, n_outer, n_picard in CASES:
+    out_dir = os.path.join(
+        OUT, label.replace(" ", "_").replace(".", "p"))
+    if os.path.exists(os.path.join(out_dir, "adapted.mesh.00000.h5")):
+        print(f"{label}: cached")
+        continue
+    os.makedirs(out_dir, exist_ok=True)
+    print(f"\n=== {label} ===")
+    m = build_uniform_mesh()
+    rho = analytic_rho(m)
+    try:
+        uw.meshing.smooth_mesh_interior(
+            m, metric=rho, method="ma", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=n_outer, n_picard=n_picard,
+                               relax=1.0, target_side_rho=True))
+    except Exception as e:
+        print(f"FAILED: {type(e).__name__}: {e}")
+        continue
+    m.write_timestep(filename="adapted", index=0,
+                     outputPath=out_dir, meshVars=[],
+                     meshUpdates=True, create_xdmf=True)
+
+
+ncols, nrows = 2, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, n_outer, n_picard) in enumerate(CASES):
+    row, col = i // ncols, i % ncols
+    out_dir = os.path.join(
+        OUT, label.replace(" ", "_").replace(".", "p"))
+    mesh_path = os.path.join(out_dir, "adapted.mesh.00000.h5")
+    if not os.path.exists(mesh_path):
+        pl.subplot(row, col)
+        pl.add_text(f"{label}\n(failed)", font_size=20,
+                    color='red')
+        continue
+    m_viz = uw.discretisation.Mesh(mesh_path)
+    edges = vis.mesh_to_pv_mesh(m_viz).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(label, font_size=22, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_MA_sweep.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT.py b/scripts/_test_metric_shapes_OT.py
new file mode 100644
index 00000000..8dccc583
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT.py
@@ -0,0 +1,93 @@
+"""OT improvement step (method='ot'): each call applies one
+weighted-Poisson equidistribution flow step. Composable — the
+input mesh has no special status. Test by chaining multiple
+calls and watching the imbalance ratio drop.
+
+4 panels:
+  0. undeformed
+  1. after 1 OT step
+  3. after 3 OT steps
+  8. after 8 OT steps
+"""
+import os
+import sys
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m = build_uniform_mesh()
+rho_sym = analytic_rho(m)
+
+# ρ background (analytic, on a separate undeformed reference)
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+# capture coords across calls
+captured = [np.asarray(m.X.coords).copy()]
+STEPS_TO_CAPTURE = {0, 1, 5, 15, 40}
+
+N_STEPS = 40
+print("OT-improve sweep — each call is one improvement step:")
+imb_traj = []
+for k in range(1, N_STEPS + 1):
+    uw.meshing.smooth_mesh_interior(
+        m, metric=rho_sym, method="ot", verbose=True,
+        boundary_slip="box",
+        method_kwargs=dict(n_outer=1, relax=0.1))
+    if k in STEPS_TO_CAPTURE:
+        captured.append(np.asarray(m.X.coords).copy())
+
+# Render: undeformed + each captured step
+sorted_steps = sorted(STEPS_TO_CAPTURE)
+ncols = min(len(sorted_steps), 4)
+nrows = (len(sorted_steps) + ncols - 1) // ncols
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for panel_idx, step in enumerate(sorted_steps):
+    row, col = panel_idx // ncols, panel_idx % ncols
+    m_viz = build_uniform_mesh()
+    cap_idx = sorted_steps.index(step)
+    m_viz._deform_mesh(captured[cap_idx])
+    edges = vis.mesh_to_pv_mesh(m_viz).extract_all_edges()
+    pl.subplot(row, col)
+    label = "undeformed" if step == 0 else f"after {step} OT step(s)"
+    pl.add_text(label, font_size=24, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_blob.py b/scripts/_test_metric_shapes_OT_blob.py
new file mode 100644
index 00000000..5fbbcc07
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_blob.py
@@ -0,0 +1,211 @@
+"""Multi-resolution OT with a coarse 'FILLED BLOB' metric:
+
+  ρ_blob(x) = 1 + AMP · 0.5 · (1 + tanh(d_shape(x) / EPS_big))
+
+This is a smoothed shape indicator: ρ ≈ 1+AMP INSIDE the
+shape, ≈ 1 OUTSIDE, with smooth transition. Mass-transport
+OT under this metric pulls nodes INTO each shape — the
+long-range transport effect we want.
+
+Then run the sharp boundary-band metric (the original
+sech² ρ) to migrate those interior nodes to the boundary.
+
+Panels:
+  A. raw sharp OT × 10 (reference — no multi-res)
+  B. blob OT × 5 only — see if nodes flow INTO shapes
+  C. (B's result) + sharp OT × 5
+  D. blob × 3, sharp × 7 (more sharp)
+  E. blob × 7, sharp × 3 (more blob)
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_blob')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import (
+    sym_smax, sym_smin, sym_smin3, sym_sabs, sym_sech2,
+    AMP)
+from _test_metric_shapes_OT_multires import (
+    analytic_rho_eps, build_uniform_mesh)
+
+
+def analytic_rho_blob(mesh, eps_smooth):
+    """ρ = 1 + AMP * 0.5 * (1 + tanh(d / eps_smooth)) per shape.
+    Filled-blob indicator. Use a smaller eps_smooth for sharper
+    blob edges; large eps_smooth blurs the blob outline.
+    """
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2),
+        half_plane(v2, v0))
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + AMP * sympy.Rational(1, 2) * (
+            1 + sympy.tanh(d / eps_smooth))
+    return rho
+
+
+def build_bg(rho_fn, eps_or_kw):
+    m_bg = build_uniform_mesh()
+    T_bg = uw.discretisation.MeshVariable(
+        f"T_bg_{id(rho_fn)}_{int(eps_or_kw*1000)}",
+        m_bg, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+        rho_fn(m_bg, eps_or_kw),
+        np.asarray(T_bg.coords))).reshape(-1)
+    clip = (1.0, float(T_bg.data[:, 0].max()))
+    pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+    pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+    return pv_bg, clip
+
+
+# Two backgrounds for context — sharp (true target) and blob.
+pv_bg_sharp, rho_clip_sharp = build_bg(analytic_rho_eps, 0.04)
+pv_bg_blob, rho_clip_blob = build_bg(analytic_rho_blob, 0.06)
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+def measure_sharp_imb(mesh):
+    """Imbalance of the current mesh against the sharp ρ."""
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=analytic_rho_eps(mesh, eps=0.04),
+            method="ot", verbose=True, boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=0.0,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else float("nan")
+
+
+def run_schedule(steps):
+    """steps: list of ('blob'|'sharp', eps, n)"""
+    m = build_uniform_mesh()
+    traj_blob, traj_sharp = [], []
+    for kind, eps, n in steps:
+        if kind == "blob":
+            rho = analytic_rho_blob(m, eps)
+        else:
+            rho = analytic_rho_eps(m, eps)
+        for _ in range(n):
+            ret = step_OT(m, rho)
+            if ret is not None:
+                if kind == "blob":
+                    traj_blob.append(ret)
+                else:
+                    traj_sharp.append(ret)
+    return m, traj_blob, traj_sharp
+
+
+RECIPES = [
+    ("A. sharp OT x 10  (reference)",
+     [("sharp", 0.04, 10)],
+     "sharp"),
+    ("B. blob OT x 5    (does it pull nodes IN?)",
+     [("blob", 0.06, 5)],
+     "blob"),
+    ("C. blob x 5 + sharp x 5",
+     [("blob", 0.06, 5), ("sharp", 0.04, 5)],
+     "sharp"),
+    ("D. blob x 3 + sharp x 7",
+     [("blob", 0.06, 3), ("sharp", 0.04, 7)],
+     "sharp"),
+    ("E. blob x 7 + sharp x 3",
+     [("blob", 0.06, 7), ("sharp", 0.04, 3)],
+     "sharp"),
+    ("F. blob x 5 alone (different background)",
+     [("blob", 0.06, 5)],
+     "blob_alt"),
+]
+
+
+results = {}
+for label, schedule, bg_kind in RECIPES:
+    print(f"\n=== {label} ===")
+    m, tb, ts = run_schedule(schedule)
+    sharp_imb = measure_sharp_imb(m)
+    print(f"  blob-imb traj : "
+          f"{' '.join(f'{v:.3f}' for v in tb)}")
+    print(f"  sharp-imb traj: "
+          f"{' '.join(f'{v:.3f}' for v in ts)}")
+    print(f"  FINAL sharp imb: {sharp_imb:.3f}")
+    results[label] = (m, sharp_imb, bg_kind)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _, bg_kind) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, sharp_imb, _ = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nsharp imb={sharp_imb:.3f}",
+                font_size=18, color='black')
+    if bg_kind == "blob":
+        pl.add_mesh(pv_bg_blob, scalars="rho", cmap="Blues",
+                    clim=rho_clip_blob, show_edges=False,
+                    lighting=False, show_scalar_bar=False,
+                    opacity=0.85)
+    else:
+        pl.add_mesh(pv_bg_sharp, scalars="rho", cmap="Blues",
+                    clim=rho_clip_sharp, show_edges=False,
+                    lighting=False, show_scalar_bar=False,
+                    opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_blob.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_blob_amp.py b/scripts/_test_metric_shapes_OT_blob_amp.py
new file mode 100644
index 00000000..5735bdb3
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_blob_amp.py
@@ -0,0 +1,236 @@
+"""Coarse-blob OT — amplification sweep.
+
+Tests whether OT under a filled-blob ρ does meaningful long-range
+node transport into shape interiors, and whether amplifying the
+blob makes OT pull harder. Every panel = 5 OT steps from the
+uniform mesh, rendered against the SHARP ρ (so you can judge
+whether the nodes ended up in the right places).
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_blob_amp')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+# Only import from the GUARDED analytic_disp module. The
+# multires script has unguarded top-level work — avoid that
+# import chain.
+from _test_metric_shapes_analytic_disp import (
+    sym_smax, sym_smin, sym_smin3, sym_sabs, sym_sech2, AMP)
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+def analytic_rho_eps(mesh, eps):
+    """Sech²-band ρ with parametrised band width (sharp ρ)."""
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2),
+        half_plane(v2, v0))
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + AMP * sym_sech2(d / eps)
+    return rho
+
+
+def shape_distances(mesh):
+    """Return (d_sq, d_dh, d_tr): signed-distance sympy fields,
+    POSITIVE inside each shape, smooth at the boundary."""
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2),
+        half_plane(v2, v0))
+    return d_sq, d_dh, d_tr
+
+
+def rho_blob_tanh(mesh, amp, eps_smooth):
+    """ρ = 1 + amp · 0.5 · (1 + tanh(d / eps))   per shape."""
+    d_sq, d_dh, d_tr = shape_distances(mesh)
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + amp * sympy.Rational(1, 2) * (
+            1 + sympy.tanh(d / eps_smooth))
+    return rho
+
+
+def rho_blob_gauss(mesh, amp, sigma):
+    """ρ = 1 + amp · exp(-(d_centroid)^2 / sigma^2) per shape —
+    radial blob centred on each shape's CENTROID (not distance
+    to boundary). Ignores shape geometry; pure 2-D Gaussian."""
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    centres = [(0.55, 0.35),     # rotated square centre
+               (-0.55, 0.45),    # doughnut centre
+               (0.10, -0.43)]    # triangle centroid
+    rho = sympy.Integer(1)
+    for cx, cy in centres:
+        r2 = (x - cx) ** 2 + (y - cy) ** 2
+        rho = rho + amp * sympy.exp(-r2 / sigma ** 2)
+    return rho
+
+
+def build_bg(rho_fn, *args, label=""):
+    m_bg = build_uniform_mesh()
+    T_bg = uw.discretisation.MeshVariable(
+        f"T_bg_{label}", m_bg, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+        rho_fn(m_bg, *args),
+        np.asarray(T_bg.coords))).reshape(-1)
+    clip = (1.0, float(T_bg.data[:, 0].max()))
+    pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+    pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+    return pv_bg, clip
+
+
+# SHARP background for ALL panels (so we judge where nodes land)
+pv_bg_sharp, clip_sharp = build_bg(
+    analytic_rho_eps, 0.04, label="sharp")
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+def measure_sharp_imb(mesh):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=analytic_rho_eps(mesh, eps=0.04),
+            method="ot", verbose=True, boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=0.0,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else float("nan")
+
+
+def run5(rho_fn, *args):
+    m = build_uniform_mesh()
+    rho = rho_fn(m, *args)
+    for _ in range(5):
+        step_OT(m, rho)
+    return m, measure_sharp_imb(m)
+
+
+# Six strategies, all 5 OT steps from uniform.
+def make_rho_eps(*a):
+    return lambda m: analytic_rho_eps(m, *a)
+
+
+RECIPES = [
+    ("A. sharp boundary band  AMP=8 (reference)",
+     lambda: run5(analytic_rho_eps, 0.04)),
+    ("B. tanh blob  AMP=8, eps=0.06",
+     lambda: run5(rho_blob_tanh, 8.0, 0.06)),
+    ("C. tanh blob  AMP=20, eps=0.06  (amplified)",
+     lambda: run5(rho_blob_tanh, 20.0, 0.06)),
+    ("D. tanh blob  AMP=50, eps=0.06  (HUGE amp)",
+     lambda: run5(rho_blob_tanh, 50.0, 0.06)),
+    ("E. Gaussian blob  AMP=8, sigma=0.18  (centred)",
+     lambda: run5(rho_blob_gauss, 8.0, 0.18)),
+    ("F. Gaussian blob  AMP=20, sigma=0.25",
+     lambda: run5(rho_blob_gauss, 20.0, 0.25)),
+]
+
+results = {}
+import time
+for label, runfn in RECIPES:
+    t0 = time.time()
+    print(f"\n=== {label} ===")
+    m, sharp_imb = runfn()
+    print(f"  sharp imb={sharp_imb:.3f}  "
+          f"({time.time()-t0:.1f}s)")
+    results[label] = (m, sharp_imb)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, sharp_imb = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nsharp imb={sharp_imb:.3f}",
+                font_size=18, color='black')
+    pl.add_mesh(pv_bg_sharp, scalars="rho", cmap="Blues",
+                clim=clip_sharp, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_blob_amp.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_blob_loose.py b/scripts/_test_metric_shapes_OT_blob_loose.py
new file mode 100644
index 00000000..a27f0781
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_blob_loose.py
@@ -0,0 +1,174 @@
+"""Blob OT with the brakes off: larger relax and step_frac so
+the OT can actually transport nodes long distances into the
+shape interiors. Previous run was strangled by step_frac=0.3
+and relax=0.1 (sharp-metric tuning) — total movement budget
+was 0.006 over 5 steps, vs shapes ~0.5 wide.
+"""
+import os, sys, io, re, contextlib, time
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_blob_loose')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import (
+    sym_smax, sym_smin, sym_smin3, sym_sabs, sym_sech2, AMP)
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+def shape_distances(mesh):
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2),
+        half_plane(v2, v0))
+    return d_sq, d_dh, d_tr
+
+
+def rho_blob_tanh(mesh, amp, eps):
+    d_sq, d_dh, d_tr = shape_distances(mesh)
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + amp * sympy.Rational(1, 2) * (
+            1 + sympy.tanh(d / eps))
+    return rho
+
+
+def rho_sharp(mesh, eps=0.04):
+    d_sq, d_dh, d_tr = shape_distances(mesh)
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + AMP * sym_sech2(d / eps)
+    return rho
+
+
+# Background = SHARP ρ (always)
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    rho_sharp(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho, relax, step_frac):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=step_frac))
+    out = buf.getvalue()
+    imb_m = re.search(r"imb=([0-9.e+-]+)", out)
+    scl_m = re.search(r"scale=([0-9.e+-]+)", out)
+    dx_m = re.search(r"max\|Δx\|=([0-9.e+-]+)", out)
+    return (float(imb_m.group(1)) if imb_m else None,
+            float(scl_m.group(1)) if scl_m else None,
+            float(dx_m.group(1)) if dx_m else None)
+
+
+def run(rho_fn, relax, step_frac, n=5):
+    m = build_uniform_mesh()
+    rho = rho_fn(m)
+    history = []
+    for k in range(n):
+        imb, scl, dx = step_OT(m, rho, relax, step_frac)
+        history.append((imb, scl, dx))
+    # measure final imb against sharp ρ
+    rho_s = rho_sharp(m)
+    imb_s, _, _ = step_OT(m, rho_s, 0.0, 0.3)
+    return m, history, imb_s
+
+
+RECIPES = [
+    # (label, rho factory, relax, step_frac)
+    ("A. sharp, relax=0.1, sf=0.3  (baseline)",
+     lambda m: rho_sharp(m), 0.1, 0.3),
+    ("B. blob AMP=8 eps=0.06, relax=0.1, sf=0.3",
+     lambda m: rho_blob_tanh(m, 8.0, 0.06), 0.1, 0.3),
+    ("C. blob AMP=8 eps=0.06, relax=1.0, sf=1.0",
+     lambda m: rho_blob_tanh(m, 8.0, 0.06), 1.0, 1.0),
+    ("D. blob AMP=20 eps=0.06, relax=1.0, sf=1.0",
+     lambda m: rho_blob_tanh(m, 20.0, 0.06), 1.0, 1.0),
+    ("E. blob AMP=20 eps=0.06, relax=0.5, sf=0.6",
+     lambda m: rho_blob_tanh(m, 20.0, 0.06), 0.5, 0.6),
+    ("F. blob AMP=20 eps=0.15 (wide), relax=1.0, sf=1.0",
+     lambda m: rho_blob_tanh(m, 20.0, 0.15), 1.0, 1.0),
+]
+
+results = {}
+for label, rho_fn, relax, sf in RECIPES:
+    t0 = time.time()
+    print(f"\n=== {label} ===")
+    m, hist, imb_s = run(rho_fn, relax, sf, n=5)
+    # Print per-step diagnostic — see whether scale was capped
+    # by the backtrack, and how big the actual move was.
+    for k, (imb, scl, dx) in enumerate(hist):
+        print(f"  step {k+1}: imb={imb}  scale={scl}  "
+              f"max|Δx|={dx:.3e}" if dx else f"  step {k+1}: --")
+    print(f"  final sharp imb={imb_s:.3f}  "
+          f"({time.time()-t0:.1f}s)")
+    results[label] = (m, imb_s)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, *_) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, imb_s = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nsharp imb={imb_s:.3f}",
+                font_size=18, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_blob_loose.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_blob_then_sharp.py b/scripts/_test_metric_shapes_OT_blob_then_sharp.py
new file mode 100644
index 00000000..92538494
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_blob_then_sharp.py
@@ -0,0 +1,127 @@
+"""Blob (wide, amplified, brakes off) → Sharp chain.
+
+Hypothesis: wide AMP=20 blob with relax=1.0/sf=1.0 transports
+nodes into broad halos around features (recipe F from
+blob_loose). Then sharp OT can narrow those halos onto the
+true band positions — finally seeing the multi-res benefit.
+
+Compare to raw sharp × N at matched compute.
+"""
+import os, sys, io, re, contextlib, time
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_blob_then_sharp')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_OT_blob_loose import (
+    build_uniform_mesh, rho_blob_tanh, rho_sharp)
+
+
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    rho_sharp(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho, relax, step_frac):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=step_frac))
+    out = buf.getvalue()
+    imb_m = re.search(r"imb=([0-9.e+-]+)", out)
+    dx_m = re.search(r"max\|Δx\|=([0-9.e+-]+)", out)
+    return (float(imb_m.group(1)) if imb_m else None,
+            float(dx_m.group(1)) if dx_m else None)
+
+
+def run_chain(schedule, label):
+    print(f"\n=== {label} ===")
+    t0 = time.time()
+    m = build_uniform_mesh()
+    history = []
+    for kind, n in schedule:
+        if kind == "narrow":
+            # narrow conservative blob — uses sharp's brake
+            # tuning (B from blob_loose: AMP=8 eps=0.06,
+            # relax=0.1, sf=0.3). Gentle preconditioning.
+            rho = rho_blob_tanh(m, 8.0, 0.06)
+            relax, sf = 0.1, 0.3
+        elif kind == "sharp":
+            rho = rho_sharp(m)
+            relax, sf = 0.1, 0.3
+        for _ in range(n):
+            imb, dx = step_OT(m, rho, relax, sf)
+            history.append((kind, imb, dx))
+            print(f"  {kind}: imb={imb}  max|Δx|={dx:.3e}"
+                  if dx else f"  {kind}: --")
+    # final sharp imb measurement
+    imb_s, _ = step_OT(m, rho_sharp(m), 0.0, 0.3)
+    print(f"  FINAL sharp imb={imb_s:.3f}  "
+          f"({time.time()-t0:.1f}s)")
+    return m, imb_s
+
+
+# All recipes have 10 OT calls total (matched compute).
+RECIPES = [
+    ("A. sharp OT x 10 (raw reference)",
+     [("sharp", 10)]),
+    ("B. narrow x 5, sharp x 5",
+     [("narrow", 5), ("sharp", 5)]),
+    ("C. narrow x 3, sharp x 7",
+     [("narrow", 3), ("sharp", 7)]),
+    ("D. narrow x 7, sharp x 3",
+     [("narrow", 7), ("sharp", 3)]),
+    ("E. narrow x 2, sharp x 8",
+     [("narrow", 2), ("sharp", 8)]),
+    ("F. narrow x 10 alone (no sharp)",
+     [("narrow", 10)]),
+]
+
+results = {}
+for label, sched in RECIPES:
+    m, imb_s = run_chain(sched, label)
+    results[label] = (m, imb_s)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, imb_s = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nsharp imb={imb_s:.3f}",
+                font_size=18, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_blob_then_sharp.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_interleave.py b/scripts/_test_metric_shapes_OT_interleave.py
new file mode 100644
index 00000000..091d6ce8
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_interleave.py
@@ -0,0 +1,158 @@
+"""OT-improve + heuristic interleave sweep on synthetic shapes.
+
+Tests the architectural payoff of "OT as a composable step":
+each row of the grid is a different recipe alternating OT
+improvements with shape-quality moves. We track the OT imbalance
+metric (std of log(V·ρ/K)) per step and render the final mesh.
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_interleave')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+# ρ background (on its own undeformed reference)
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    # Pull the imbalance from the verbose line.
+    out = buf.getvalue()
+    m = re.search(r"imb=([0-9.e+-]+)", out)
+    return float(m.group(1)) if m else None
+
+
+def step_spring_metric(mesh, rho):
+    # spring with metric — uses ρ to grade per-cell rest lengths
+    with contextlib.redirect_stdout(io.StringIO()):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="spring",
+            boundary_slip="box")
+
+
+def step_jacobi(mesh):
+    # plain graph-Laplacian smoothing (no metric)
+    with contextlib.redirect_stdout(io.StringIO()):
+        uw.meshing.smooth_mesh_interior(
+            mesh, n_iters=3, alpha=0.5)
+
+
+# Recipes: each is a list of strings 'OT'|'spring'|'jacobi'
+# describing the per-call action. Total #OT steps kept ≈ 5 across
+# recipes (apples-to-apples on "OT budget").
+RECIPES = [
+    ("A. OT x 5 (baseline)",
+     ["OT"] * 5),
+    ("B. OT x 15 (raw push)",
+     ["OT"] * 15),
+    ("C. (OT, Jacobi) x 5",
+     ["OT", "jacobi"] * 5),
+    ("D. (OT, spring) x 5",
+     ["OT", "spring"] * 5),
+    ("E. (OT x 3, Jacobi) x 3",
+     (["OT"] * 3 + ["jacobi"]) * 3),
+    ("F. (OT x 3, spring) x 3",
+     (["OT"] * 3 + ["spring"]) * 3),
+]
+
+
+def run_recipe(label, actions, rho_sym):
+    m = build_uniform_mesh()
+    # Build a sympy ρ on THIS mesh's coord system.
+    rho = analytic_rho(m)
+    imb_traj = []
+    for step in actions:
+        if step == "OT":
+            imb = step_OT(m, rho)
+            if imb is not None:
+                imb_traj.append(imb)
+        elif step == "spring":
+            step_spring_metric(m, rho)
+        elif step == "jacobi":
+            step_jacobi(m)
+        else:
+            raise ValueError(step)
+    return m, imb_traj
+
+
+# Run all recipes
+results = {}
+for label, actions in RECIPES:
+    print(f"\n=== {label} ({len(actions)} actions) ===")
+    m, imb = run_recipe(label, actions, None)
+    final_imb = imb[-1] if imb else float("nan")
+    n_ot = sum(1 for a in actions if a == "OT")
+    print(f"  {n_ot} OT steps; imb traj first/last: "
+          f"{imb[0] if imb else 'NA'} → {final_imb}")
+    results[label] = (m, imb, final_imb, n_ot)
+
+
+# Render 2×3 grid
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, imb, final_imb, n_ot = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nimb={final_imb:.3f}  "
+                f"({n_ot} OT)",
+                font_size=20, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_interleave.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
+
+# Print imb trajectories for the record
+print("\n--- imbalance trajectories (per OT step) ---")
+for label, _ in RECIPES:
+    _, imb, _, _ = results[label]
+    traj_str = " ".join(f"{v:.3f}" for v in imb)
+    print(f"  {label}\n    {traj_str}")
diff --git a/scripts/_test_metric_shapes_OT_interleave2.py b/scripts/_test_metric_shapes_OT_interleave2.py
new file mode 100644
index 00000000..5bba284d
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_interleave2.py
@@ -0,0 +1,153 @@
+"""Refined OT-interleave sweep: always end on OT; vary the
+intermediate heuristic and its tuning. Apples-to-apples on
+"how many OT calls" (5 OT each — same compute as the OT baseline).
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_interleave2')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=0.1,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+def step_jacobi(mesh, n_iters=3, alpha=0.5):
+    with contextlib.redirect_stdout(io.StringIO()):
+        uw.meshing.smooth_mesh_interior(
+            mesh, n_iters=n_iters, alpha=alpha)
+
+
+def step_spring(mesh, rho, size_w=8.0, shape_w=1.0, n_sweeps=300):
+    with contextlib.redirect_stdout(io.StringIO()):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="spring",
+            boundary_slip="box",
+            method_kwargs=dict(size_w=size_w, shape_w=shape_w,
+                                n_sweeps=n_sweeps))
+
+
+# Each recipe: list of callables that take (mesh, rho).
+def OT(m, r):
+    return step_OT(m, r)
+
+
+def jac(n=3):
+    def _f(m, r):
+        step_jacobi(m, n_iters=n)
+    return _f
+
+
+def spr(size_w=8.0, shape_w=1.0, n_sweeps=300):
+    def _f(m, r):
+        step_spring(m, r, size_w=size_w, shape_w=shape_w,
+                    n_sweeps=n_sweeps)
+    return _f
+
+
+# All recipes end on OT. Each has 5 OT calls.
+RECIPES = [
+    ("A. OT x 5 (ref)",
+     [OT] * 5),
+    ("B. (OT, jac3) x 4 + OT",
+     [OT, jac(3), OT, jac(3), OT, jac(3), OT, jac(3), OT]),
+    ("C. (jac3, OT) x 5",
+     [jac(3), OT, jac(3), OT, jac(3), OT, jac(3), OT, jac(3), OT]),
+    ("D. OT, jac3, OT, jac3, OT (3 OT)",
+     [OT, jac(3), OT, jac(3), OT]),
+    ("E. (OT, spr_shape) x 4 + OT  (size_w=0)",
+     [OT, spr(size_w=0.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=0.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=0.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=0.0, shape_w=1.0, n_sweeps=60),
+      OT]),
+    ("F. (OT, spr_light) x 4 + OT  (size_w=2)",
+     [OT, spr(size_w=2.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=2.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=2.0, shape_w=1.0, n_sweeps=60),
+      OT, spr(size_w=2.0, shape_w=1.0, n_sweeps=60),
+      OT]),
+]
+
+
+results = {}
+for label, actions in RECIPES:
+    print(f"\n=== {label} ===")
+    m = build_uniform_mesh()
+    rho = analytic_rho(m)
+    imb_traj = []
+    for act in actions:
+        ret = act(m, rho)
+        if ret is not None:
+            imb_traj.append(ret)
+    final = imb_traj[-1] if imb_traj else float("nan")
+    n_ot = sum(1 for a in actions if a is OT)
+    print(f"  {n_ot} OT calls; imb {imb_traj[0]:.3f} → {final:.3f}")
+    print(f"  traj: {' '.join(f'{v:.3f}' for v in imb_traj)}")
+    results[label] = (m, imb_traj, final, n_ot)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, imb, final, n_ot = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nimb={final:.3f}  ({n_ot} OT)",
+                font_size=20, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_interleave2.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_jacobi.py b/scripts/_test_metric_shapes_OT_jacobi.py
new file mode 100644
index 00000000..0cd432d5
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_jacobi.py
@@ -0,0 +1,136 @@
+"""Jacobi-as-preconditioner sweep: test whether a light
+Jacobi BEFORE each OT step enables OT to take larger relax
+than it could safely take alone.
+
+Hypothesis: pure OT at relax=0.1 is mesh-quality-limited
+(backtrack engages once cells get anisotropic). If Jacobi
+restores cell shape between OTs, OT can run at higher relax
+and converge faster. Trade-off: Jacobi pulls toward centroid
+(anti-OT redistribution) so a sweet spot exists.
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_jacobi')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho(m_bg), np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+def step_jacobi(mesh, n_iters=1, alpha=0.3):
+    with contextlib.redirect_stdout(io.StringIO()):
+        uw.meshing.smooth_mesh_interior(
+            mesh, n_iters=n_iters, alpha=alpha)
+
+
+def OT(relax=0.1):
+    def _f(m, r):
+        return step_OT(m, r, relax=relax)
+    return _f
+
+
+def jac(n=1, a=0.3):
+    def _f(m, r):
+        step_jacobi(m, n_iters=n, alpha=a)
+    return _f
+
+
+# All recipes end on OT. Compare matched OT-count (5 OT) vs
+# matched-imbalance ("how fast does it converge?").
+RECIPES = [
+    ("A. OT(0.1) x 5 (baseline)",
+     [OT(0.1)] * 5),
+    ("B. OT(0.3) x 5 (raw bigger step)",
+     [OT(0.3)] * 5),
+    ("C. (jac1_a0.3, OT0.1) x 4 + OT0.1",
+     [jac(1, 0.3), OT(0.1)] * 4 + [OT(0.1)]),
+    ("D. (jac1_a0.3, OT0.3) x 4 + OT0.3",
+     [jac(1, 0.3), OT(0.3)] * 4 + [OT(0.3)]),
+    ("E. (jac3_a0.5, OT0.3) x 4 + OT0.3",
+     [jac(3, 0.5), OT(0.3)] * 4 + [OT(0.3)]),
+    ("F. (jac1_a0.3, OT0.6) x 4 + OT0.6",
+     [jac(1, 0.3), OT(0.6)] * 4 + [OT(0.6)]),
+]
+
+
+results = {}
+for label, actions in RECIPES:
+    print(f"\n=== {label} ===")
+    m = build_uniform_mesh()
+    rho = analytic_rho(m)
+    imb_traj = []
+    for act in actions:
+        ret = act(m, rho)
+        if ret is not None:
+            imb_traj.append(ret)
+    final = imb_traj[-1] if imb_traj else float("nan")
+    print(f"  imb {imb_traj[0]:.3f} → {final:.3f}")
+    print(f"  traj: {' '.join(f'{v:.3f}' for v in imb_traj)}")
+    results[label] = (m, imb_traj, final)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(RECIPES):
+    row, col = i // ncols, i % ncols
+    m, imb, final = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nimb={final:.3f}",
+                font_size=20, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_jacobi.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_levels.py b/scripts/_test_metric_shapes_OT_levels.py
new file mode 100644
index 00000000..909fc019
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_levels.py
@@ -0,0 +1,165 @@
+"""Compare the two halves of recipe B side-by-side:
+  (i)  5 OT steps with SMOOTHED ρ (EPS=0.16) only
+  (ii) 5 OT steps with SHARP ρ (EPS=0.04) only
+
+This isolates what each level of the multi-res chain does on
+its own — so we can see whether the smoothed level moves nodes
+toward useful "coarse" positions or just settles into something
+unrelated to the sharp metric.
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_levels')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_OT_multires import (
+    analytic_rho_eps, build_uniform_mesh)
+
+
+# Background = TRUE sharp ρ (always)
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho_eps(m_bg, eps=0.04),
+    np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+# Background = SMOOTH ρ for the smoothed panel
+m_bg2 = build_uniform_mesh()
+T_bg2 = uw.discretisation.MeshVariable(
+    "T_bg_rho_smooth", m_bg2, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg2.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho_eps(m_bg2, eps=0.16),
+    np.asarray(T_bg2.coords))).reshape(-1)
+rho_clip2 = (1.0, float(T_bg2.data[:, 0].max()))
+pv_bg2 = vis.meshVariable_to_pv_mesh_object(T_bg2)
+pv_bg2.point_data["rho"] = np.asarray(T_bg2.data[:, 0])
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+def run_N_OT(eps, n_steps):
+    m = build_uniform_mesh()
+    rho = analytic_rho_eps(m, eps=eps)
+    traj = []
+    for _ in range(n_steps):
+        ret = step_OT(m, rho, relax=0.1)
+        if ret is not None:
+            traj.append(ret)
+    return m, traj
+
+
+# (i) Smoothed-only: 5 OT × EPS=0.16
+print("=== (i) 5 OT @ EPS=0.16 (smoothed ρ only) ===")
+m_smooth, traj_smooth = run_N_OT(eps=0.16, n_steps=5)
+print(f"  imb @ smoothed: "
+      f"{' '.join(f'{v:.3f}' for v in traj_smooth)}")
+
+# Measure final state's imbalance against SHARP ρ too — what
+# does the smoothed-OT mesh look like to the real target?
+buf = io.StringIO()
+with contextlib.redirect_stdout(buf):
+    uw.meshing.smooth_mesh_interior(
+        m_smooth, metric=analytic_rho_eps(m_smooth, eps=0.04),
+        method="ot", verbose=True, boundary_slip="box",
+        method_kwargs=dict(n_outer=1, relax=0.0, step_frac=0.3))
+m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+sharp_imb_smooth = float(m.group(1)) if m else float("nan")
+print(f"  imb of smoothed result vs SHARP ρ: {sharp_imb_smooth:.3f}")
+
+# (ii) Sharp-only: 5 OT × EPS=0.04
+print("\n=== (ii) 5 OT @ EPS=0.04 (sharp ρ only) ===")
+m_sharp, traj_sharp = run_N_OT(eps=0.04, n_steps=5)
+print(f"  imb @ sharp: "
+      f"{' '.join(f'{v:.3f}' for v in traj_sharp)}")
+
+
+# Two-panel render: smoothed-mesh on smoothed-ρ; sharp-mesh on
+# sharp-ρ. Plus duplicate row showing both meshes against the
+# SAME (sharp) ρ for direct mesh comparison.
+ncols, nrows = 2, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1300 * ncols, 1300 * nrows),
+                border=False)
+pl.set_background("white")
+
+# Top row: each panel against its own ρ
+edges_smooth = vis.mesh_to_pv_mesh(m_smooth).extract_all_edges()
+edges_sharp = vis.mesh_to_pv_mesh(m_sharp).extract_all_edges()
+
+pl.subplot(0, 0)
+pl.add_text(f"(i) 5 OT @ EPS=0.16  (smoothed rho)\n"
+            f"imb={traj_smooth[-1]:.3f}",
+            font_size=20, color='black')
+pl.add_mesh(pv_bg2, scalars="rho", cmap="Blues",
+            clim=rho_clip2, show_edges=False, lighting=False,
+            show_scalar_bar=False, opacity=0.85)
+pl.add_mesh(edges_smooth, color="black", line_width=1.0,
+            lighting=False, opacity=0.85)
+pl.view_xy(); pl.camera.zoom(1.15)
+
+pl.subplot(0, 1)
+pl.add_text(f"(ii) 5 OT @ EPS=0.04  (sharp rho)\n"
+            f"imb={traj_sharp[-1]:.3f}",
+            font_size=20, color='black')
+pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+            clim=rho_clip, show_edges=False, lighting=False,
+            show_scalar_bar=False, opacity=0.85)
+pl.add_mesh(edges_sharp, color="black", line_width=1.0,
+            lighting=False, opacity=0.85)
+pl.view_xy(); pl.camera.zoom(1.15)
+
+# Bottom row: both meshes against the SAME SHARP background
+pl.subplot(1, 0)
+pl.add_text(f"(i) smoothed-OT mesh on SHARP rho\n"
+            f"sharp imb={sharp_imb_smooth:.3f}",
+            font_size=20, color='black')
+pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+            clim=rho_clip, show_edges=False, lighting=False,
+            show_scalar_bar=False, opacity=0.85)
+pl.add_mesh(edges_smooth, color="black", line_width=1.0,
+            lighting=False, opacity=0.85)
+pl.view_xy(); pl.camera.zoom(1.15)
+
+pl.subplot(1, 1)
+pl.add_text(f"(ii) sharp-OT mesh on SHARP rho\n"
+            f"sharp imb={traj_sharp[-1]:.3f}",
+            font_size=20, color='black')
+pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+            clim=rho_clip, show_edges=False, lighting=False,
+            show_scalar_bar=False, opacity=0.85)
+pl.add_mesh(edges_sharp, color="black", line_width=1.0,
+            lighting=False, opacity=0.85)
+pl.view_xy(); pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_levels.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_OT_multires.py b/scripts/_test_metric_shapes_OT_multires.py
new file mode 100644
index 00000000..3784dba5
--- /dev/null
+++ b/scripts/_test_metric_shapes_OT_multires.py
@@ -0,0 +1,179 @@
+"""Multi-resolution OT (ρ-widening homotopy) on synthetic shapes.
+
+Idea: start with a SMOOTHED ρ (broad sech² bumps, large EPS)
+so the OT source has long support → nodes can transport across
+the domain in early steps. Progressively narrow EPS until we
+reach the true sharp ρ — annealing-style multi-scale OT.
+
+Compares against raw OT × N at matched compute. The
+`analytic_rho_eps` parametrised builder takes an EPS so we can
+choose the bump width per level.
+"""
+import os
+import sys
+import io
+import re
+import contextlib
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_OT_multires')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import (
+    sym_smax, sym_smin, sym_smin3, sym_sabs, sym_sech2,
+    AMP)
+
+
+def analytic_rho_eps(mesh, eps):
+    """Same shapes as analytic_rho but with parametrised band
+    width EPS — wide EPS = smoothed metric for multi-res."""
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2),
+        half_plane(v2, v0))
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + AMP * sym_sech2(d / eps)
+    return rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+# ρ background — true (sharp) ρ for visualisation
+m_bg = build_uniform_mesh()
+T_bg = uw.discretisation.MeshVariable(
+    "T_bg_rho", m_bg, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True)
+T_bg.data[:, 0] = np.asarray(uw.function.evaluate(
+    analytic_rho_eps(m_bg, eps=0.04),
+    np.asarray(T_bg.coords))).reshape(-1)
+rho_clip = (1.0, float(T_bg.data[:, 0].max()))
+pv_bg = vis.meshVariable_to_pv_mesh_object(T_bg)
+pv_bg.point_data["rho"] = np.asarray(T_bg.data[:, 0])
+
+
+def step_OT(mesh, rho, relax=0.1):
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=relax,
+                                step_frac=0.3))
+    m = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    return float(m.group(1)) if m else None
+
+
+# Multi-res schedules: (eps, n_OT_at_this_level)
+SCHEDULES = [
+    ("A. raw OT x 10 (EPS=0.04)",
+     [(0.04, 10)]),
+    ("B. 2-level: EPS=0.16 x 5, EPS=0.04 x 5",
+     [(0.16, 5), (0.04, 5)]),
+    ("C. 3-level: EPS=0.16,0.08,0.04 x (3,3,4)",
+     [(0.16, 3), (0.08, 3), (0.04, 4)]),
+    ("D. 3-level wide: EPS=0.32,0.10,0.04 x (3,3,4)",
+     [(0.32, 3), (0.10, 3), (0.04, 4)]),
+    ("E. 4-level: EPS=0.32,0.16,0.08,0.04 x (2,2,2,4)",
+     [(0.32, 2), (0.16, 2), (0.08, 2), (0.04, 4)]),
+    ("F. 5-level: EPS=0.48,0.24,0.12,0.06,0.04 x (2,2,2,2,2)",
+     [(0.48, 2), (0.24, 2), (0.12, 2), (0.06, 2), (0.04, 2)]),
+]
+
+
+results = {}
+for label, schedule in SCHEDULES:
+    n_ot = sum(n for _, n in schedule)
+    print(f"\n=== {label} ({n_ot} OT total) ===")
+    m = build_uniform_mesh()
+    imb_traj = []
+    for eps, n_steps in schedule:
+        rho_lvl = analytic_rho_eps(m, eps)
+        for k in range(n_steps):
+            ret = step_OT(m, rho_lvl, relax=0.1)
+            if ret is not None:
+                imb_traj.append(ret)
+    # Final imbalance is measured against the SHARP ρ — that's
+    # what we actually care about. Do one zero-step measurement
+    # using the true ρ.
+    rho_sharp = analytic_rho_eps(m, eps=0.04)
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            m, metric=rho_sharp, method="ot", verbose=True,
+            boundary_slip="box",
+            method_kwargs=dict(n_outer=1, relax=0.0,
+                                step_frac=0.3))
+    msr = re.search(r"imb=([0-9.e+-]+)", buf.getvalue())
+    sharp_final = float(msr.group(1)) if msr else float("nan")
+    print(f"  level-eps trajectory:")
+    cursor = 0
+    for eps, n_steps in schedule:
+        segment = imb_traj[cursor:cursor + n_steps]
+        cursor += n_steps
+        print(f"    EPS={eps}: "
+              f"{' '.join(f'{v:.3f}' for v in segment)}")
+    print(f"  FINAL imb vs SHARP ρ (EPS=0.04): "
+          f"{sharp_final:.3f}")
+    results[label] = (m, imb_traj, sharp_final, n_ot)
+
+
+ncols, nrows = 3, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, _) in enumerate(SCHEDULES):
+    row, col = i // ncols, i % ncols
+    m, imb, sharp_final, n_ot = results[label]
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"{label}\nsharp imb={sharp_final:.3f}  "
+                f"({n_ot} OT)",
+                font_size=18, color='black')
+    pl.add_mesh(pv_bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False,
+                opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_OT_multires.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_analytic_disp.py b/scripts/_test_metric_shapes_analytic_disp.py
new file mode 100644
index 00000000..d91ce293
--- /dev/null
+++ b/scripts/_test_metric_shapes_analytic_disp.py
@@ -0,0 +1,153 @@
+"""Diagnostic probe: per-iter max|Δx| trajectory with an
+*analytic* (Eulerian) sympy ρ vs. the Lagrangian
+`metric_density_from_gradient` ρ.
+
+If the user's hypothesis is correct, then with analytic ρ +
+`metric_refresh_per_iter=True` the per-iter displacement should
+decay roughly geometrically (true damped Picard) rather than
+stalling at ~constant (the documented Lagrangian failure).
+
+The analytic ρ mirrors the three synthetic shapes in
+``_test_metric_shapes.py`` via sech²-banded smooth signed
+distances — a pure sympy function of mesh.X, no MeshVariable
+behind it.
+"""
+import io
+import contextlib
+import os
+import re
+import sys
+import numpy as np
+import sympy
+import underworld3 as uw
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes import build_mesh_with_field
+
+EPS = 0.04       # band width in ρ for analytic Eulerian probe
+AMP = 8.0        # peak (boundary) excess over bulk: ρ_peak ≈ 1+AMP
+SOFT = 0.005     # smooth-max softness (units of coords)
+
+
+def sym_smax(a, b, soft=SOFT):
+    """Algebraic smooth max — sqrt of an Add, C-printable.
+
+    (a+b+sqrt((a−b)² + s²))/2 → max(a,b) as s→0, smooth elsewhere.
+    """
+    return (a + b + sympy.sqrt((a - b) ** 2 + soft ** 2)) / 2
+
+
+def sym_smin(a, b, soft=SOFT):
+    return -sym_smax(-a, -b, soft)
+
+
+def sym_smin3(a, b, c, soft=SOFT):
+    return sym_smin(sym_smin(a, b, soft), c, soft)
+
+
+def sym_sabs(x, soft=SOFT):
+    """Smooth abs via sqrt (used for the square SDF)."""
+    return sympy.sqrt(x * x + soft * soft)
+
+
+def sym_sech2(z):
+    """sech²(z) = 1/cosh²(z) — printable in C99 (cosh is supported)."""
+    return 1 / sympy.cosh(z) ** 2
+
+
+def analytic_rho(mesh):
+    """Build sympy ρ(x,y) = 1 + AMP · Σ sech²(d_shape / EPS),
+    mirroring the three shapes from _test_metric_shapes.py.
+
+    Pure sympy expression in mesh.X — truly Eulerian.
+    """
+    X = mesh.CoordinateSystem.X
+    x, y = X[0], X[1]
+
+    # Square: centre (0.55, 0.35), side 0.4, angle 30°.
+    cx_sq, cy_sq, side = 0.55, 0.35, 0.4
+    ang_rad = 30.0 * np.pi / 180.0
+    ct, st = float(np.cos(ang_rad)), float(np.sin(ang_rad))
+    dxs, dys = x - cx_sq, y - cy_sq
+    xp = ct * dxs + st * dys
+    yp = -st * dxs + ct * dys
+    d_sq = side / 2 - sym_smax(sym_sabs(xp), sym_sabs(yp))
+
+    # Doughnut: centre (-0.55, 0.45), r ∈ [0.15, 0.30].
+    cx_dh, cy_dh, r_in, r_out = -0.55, 0.45, 0.15, 0.30
+    r = sympy.sqrt((x - cx_dh) ** 2 + (y - cy_dh) ** 2)
+    d_dh = sym_smin(r - r_in, r_out - r)
+
+    # Triangle: CCW vertices.
+    v0 = (sympy.Float(0.05), sympy.Float(-0.65))
+    v1 = (sympy.Float(0.55), sympy.Float(-0.35))
+    v2 = (sympy.Float(-0.30), sympy.Float(-0.30))
+
+    def half_plane(a, b):
+        ex, ey = b[0] - a[0], b[1] - a[1]
+        nx, ny = -ey, ex
+        nl = sympy.sqrt(nx * nx + ny * ny)
+        return ((x - a[0]) * nx + (y - a[1]) * ny) / nl
+
+    d_tr = sym_smin3(
+        half_plane(v0, v1), half_plane(v1, v2), half_plane(v2, v0))
+
+    # ρ = 1 + AMP · Σ sech²(d / EPS). cosh-based, C-printable.
+    rho = sympy.Integer(1)
+    for d in (d_sq, d_dh, d_tr):
+        rho = rho + AMP * sym_sech2(d / EPS)
+    return rho
+
+
+def _extract_disp(verbose_out: str):
+    rows = []
+    for line in verbose_out.splitlines():
+        mch = re.search(
+            r"outer\s+(\d+)/\d+:.*max\|Δx\|=([0-9.e+-]+)", line)
+        if mch:
+            rows.append((int(mch.group(1)), float(mch.group(2))))
+    return rows
+
+
+def _run_mover(label, metric, n_outer, relax, refresh):
+    m, T = build_mesh_with_field()
+    if isinstance(metric, str) and metric == "lagrangian":
+        rho = uw.meshing.metric_density_from_gradient(
+            m, T, refinement=3.0,
+            name=f"lag_{label}_{n_outer}_{relax}_{refresh}")
+    else:
+        rho = analytic_rho(m)
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        uw.meshing.smooth_mesh_interior(
+            m, metric=rho, method="anisotropic", strategy="med",
+            method_kwargs=dict(
+                n_outer=n_outer, relax=relax,
+                metric_refresh_per_iter=refresh),
+            verbose=True)
+    return _extract_disp(buf.getvalue())
+
+
+CASES = [
+    # (label, ρ kind, refresh)
+    ("Lagrangian ρ — refresh=False (baseline)", "lagrangian", False),
+    ("Lagrangian ρ — refresh=True", "lagrangian", True),
+    ("Analytic ρ   — refresh=False", "analytic", False),
+    ("Analytic ρ   — refresh=True (Eulerian)", "analytic", True),
+]
+
+N_OUTER = 12
+RELAXES = [1.0, 0.5, 0.2]
+
+
+if __name__ == "__main__":
+  for relax in RELAXES:
+    print(f"\n{'=' * 66}")
+    print(f"  n_outer={N_OUTER}, relax={relax}")
+    print(f"{'=' * 66}")
+    for label, kind, refresh in CASES:
+        rows = _run_mover(label, kind, N_OUTER, relax, refresh)
+        traj = " ".join(f"{d:.2e}" for _, d in rows)
+        print(f"\n  {label}")
+        print(f"    {traj}")
+  # end if __name__
diff --git a/scripts/_test_metric_shapes_analytic_iter2.py b/scripts/_test_metric_shapes_analytic_iter2.py
new file mode 100644
index 00000000..21c6e41b
--- /dev/null
+++ b/scripts/_test_metric_shapes_analytic_iter2.py
@@ -0,0 +1,109 @@
+"""Visual confirmation: 2×2 panel mesh comparison testing the
+Lagrangian-vs-Eulerian metric hypothesis.
+
+Panels:
+  (A) Lagrangian ρ, default 12×0.2   — current production
+  (B) Lagrangian ρ, 1×1.0            — user's "clean" reference
+  (C) Analytic Eulerian ρ + refresh, 12×0.2  — truly Eulerian
+  (D) Analytic Eulerian ρ + refresh, 1×1.0
+
+If the Eulerian-D hypothesis were correct, panel (C) should
+look CLEANER than (A). Per the disp trajectory in
+_test_metric_shapes_analytic_disp.py, max|Δx| at 12×0.2 GROWS
+with Eulerian refresh (positive feedback) — so we expect (C)
+to look WORSE than (A).
+"""
+import os
+import sys
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_analytic_iter2')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes import build_mesh_with_field, shape_field
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+CASES = [
+    ("A. Lagrangian rho, 12 x 0.2 (default)",
+     "lagrangian", dict(n_outer=12, relax=0.2,
+                         metric_refresh_per_iter=False)),
+    ("B. Lagrangian rho, 1 x 1.0 (clean ref)",
+     "lagrangian", dict(n_outer=1, relax=1.0,
+                         metric_refresh_per_iter=False)),
+    ("C. Analytic Eul rho + refresh, 12 x 0.2",
+     "analytic",   dict(n_outer=12, relax=0.2,
+                         metric_refresh_per_iter=True)),
+    ("D. Analytic Eul rho + refresh, 1 x 1.0",
+     "analytic",   dict(n_outer=1, relax=1.0,
+                         metric_refresh_per_iter=True)),
+]
+
+
+for label, kind, kw in CASES:
+    out_dir = os.path.join(
+        OUT, label.replace(" ", "_").replace(",", "")
+              .replace(".", "p"))
+    os.makedirs(out_dir, exist_ok=True)
+    if os.path.exists(os.path.join(out_dir, "adapted.mesh.00000.h5")):
+        print(f"{label}: cached")
+        continue
+    print(f"{label}: adapting")
+    m, T = build_mesh_with_field()
+    if kind == "lagrangian":
+        rho = uw.meshing.metric_density_from_gradient(
+            m, T, refinement=3.0, name=label.replace(" ", "_"))
+    else:
+        rho = analytic_rho(m)
+    uw.meshing.smooth_mesh_interior(
+        m, metric=rho, method="anisotropic", strategy="med",
+        method_kwargs=kw, verbose=False)
+    m.write_timestep(filename="adapted", index=0,
+                     outputPath=out_dir, meshVars=[T],
+                     meshUpdates=True, create_xdmf=True)
+
+
+ncols, nrows = 2, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1500 * ncols, 1500 * nrows),
+                border=False)
+pl.set_background("white")
+
+for i, (label, kind, kw) in enumerate(CASES):
+    row, col = i // ncols, i % ncols
+    out_dir = os.path.join(
+        OUT, label.replace(" ", "_").replace(",", "")
+              .replace(".", "p"))
+    m = uw.discretisation.Mesh(
+        os.path.join(out_dir, "adapted.mesh.00000.h5"))
+    # Make a degree-3 viz var on the adapted mesh, fill it with
+    # the ANALYTIC indicator field evaluated at the deformed
+    # DOF positions. This shows the shapes at their TRUE physical
+    # positions, with the adapted mesh overlaid — the only honest
+    # way to see whether the mesh tracks the features.
+    T = uw.discretisation.MeshVariable(
+        f"T_view_{i}", m, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T.data[:, 0] = shape_field(np.asarray(T.coords))
+    pv_T = vis.meshVariable_to_pv_mesh_object(T)
+    pv_T.point_data["T"] = np.asarray(T.data[:, 0])
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(label, font_size=20, color='black')
+    pl.add_mesh(pv_T, scalars="T", cmap="Blues",
+                clim=(0.0, 1.0), show_edges=False,
+                lighting=False, show_scalar_bar=False)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_analytic_iter2.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"wrote {out_png}")
diff --git a/scripts/_test_metric_shapes_iter_inside.py b/scripts/_test_metric_shapes_iter_inside.py
new file mode 100644
index 00000000..26a299d0
--- /dev/null
+++ b/scripts/_test_metric_shapes_iter_inside.py
@@ -0,0 +1,120 @@
+"""Per-outer-iteration mesh snapshots from INSIDE ONE
+smooth_mesh_interior call with truly Eulerian metric, refresh
+and relax=1.0.
+
+Why this matters: calling smooth_mesh_interior multiple times
+with n_outer=1 each does NOT replay the inner outer-iter loop
+faithfully (a separate cache bug — second call gives u=0). To
+see what iter 2 actually does, we need to snapshot mesh state
+inside the running outer loop.
+
+We do that by patching mesh._deform_mesh to capture old/new
+coords on every call.
+
+Question being answered: with an Eulerian target metric, does
+iter 2 *correct* iter 1 (small move toward target) or apply
+another full displacement (overshoot)?
+"""
+import os
+import sys
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_iter_inside')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+m = build_uniform_mesh()
+rho_sym = analytic_rho(m)
+
+# ---- ρ background lattice (independent of mesh, analytic) ---
+X = m.CoordinateSystem.X
+rho_lam = sympy.lambdify((X[0], X[1]), rho_sym, "numpy")
+xv = np.linspace(-1.0, 1.0, 401)
+yv = np.linspace(-1.0, 1.0, 401)
+XX, YY = np.meshgrid(xv, yv)
+RHO = rho_lam(XX, YY)
+
+bg = pv.ImageData(
+    dimensions=(401, 401, 1),
+    spacing=(2.0 / 400, 2.0 / 400, 1.0),
+    origin=(-1.0, -1.0, 0.0))
+bg.point_data["rho"] = RHO.ravel(order="F")
+rho_clip = (1.0, float(RHO.max()))
+
+# ---- snapshot patch ----------------------------------------
+captured = [np.asarray(m.X.coords).copy()]
+orig_deform = m._deform_mesh
+
+
+def deform_and_capture(new_coords, *args, **kwargs):
+    res = orig_deform(new_coords, *args, **kwargs)
+    captured.append(np.asarray(m.X.coords).copy())
+    return res
+
+
+m._deform_mesh = deform_and_capture
+
+# ---- one call, n_outer=12 ----------------------------------
+N_OUTER = 12
+RELAX = 1.0
+uw.meshing.smooth_mesh_interior(
+    m, metric=rho_sym, method="anisotropic", strategy="med",
+    method_kwargs=dict(n_outer=N_OUTER, relax=RELAX,
+                       metric_refresh_per_iter=True),
+    verbose=True)
+
+m._deform_mesh = orig_deform
+
+print(f"\ncaptured {len(captured)} mesh states "
+      f"(iter 0 + {len(captured)-1} outer iters)")
+for it, c in enumerate(captured):
+    if it == 0:
+        continue
+    dx = np.linalg.norm(c - captured[it - 1], axis=1)
+    cum = np.linalg.norm(c - captured[0], axis=1)
+    print(f"  iter {it:2d}: this-step max={dx.max():.3e}  "
+          f"cumulative max={cum.max():.3e}")
+
+# ---- render ------------------------------------------------
+panels = [0, 1, 2, 3, 4, 6, 8, 12]
+panels = [p for p in panels if p < len(captured)]
+ncols, nrows = 4, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+for panel_idx, it in enumerate(panels):
+    row, col = panel_idx // ncols, panel_idx % ncols
+    m_viz = build_uniform_mesh()
+    m_viz._deform_mesh(captured[it])
+    edges = vis.mesh_to_pv_mesh(m_viz).extract_all_edges()
+    pl.subplot(row, col)
+    pl.add_text(f"iter {it}", font_size=24, color='black')
+    pl.add_mesh(bg, scalars="rho", cmap="Blues", clim=rho_clip,
+                show_edges=False, lighting=False,
+                show_scalar_bar=False, opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_iter_inside.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/_test_metric_shapes_iter_sequence.py b/scripts/_test_metric_shapes_iter_sequence.py
new file mode 100644
index 00000000..23eef3b2
--- /dev/null
+++ b/scripts/_test_metric_shapes_iter_sequence.py
@@ -0,0 +1,113 @@
+"""Per-iteration mesh sequence with TRULY EULERIAN metric.
+
+Take the analytic ρ(x,y) (pure sympy, frozen in physical space)
+and step the mover one outer iteration at a time, relax=1.0,
+refresh=True. Save the mesh + the proposed full displacement
+(scale=1 *before* backtrack) at every step.
+
+Render the sequence with ρ as the background (NOT the
+Lagrangian T) — so the shapes stay where they really are and we
+can see whether the mesh is being pulled toward the metric or
+oscillating around it.
+
+Question being answered: with an Eulerian target metric, does
+iter 2 *correct* iter 1 (move smaller, toward equidistribution),
+or does it apply another full displacement on top?
+"""
+import os
+import sys
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+OUT = os.path.expanduser(
+    '~/+Simulations/StagnantLid/synthetic_shapes_iter_sequence')
+os.makedirs(OUT, exist_ok=True)
+
+sys.path.insert(0, os.path.dirname(__file__))
+from _test_metric_shapes_analytic_disp import analytic_rho
+
+# Iterations to capture (0 = undeformed)
+ITERS_TO_CAPTURE = [0, 1, 2, 3, 4, 6, 8, 12]
+
+
+def build_uniform_mesh():
+    return uw.meshing.UnstructuredSimplexBox(
+        minCoords=(-1.0, -1.0), maxCoords=(1.0, 1.0),
+        cellSize=0.04, qdegree=3)
+
+
+# Re-use one mesh, step it one outer iter at a time, recording.
+m = build_uniform_mesh()
+rho_sym = analytic_rho(m)
+
+# Sample ρ on a viz lattice (independent of the mesh) for the
+# background. ρ is a sympy expression — evaluate analytically.
+import sympy
+X = m.CoordinateSystem.X
+rho_lam = sympy.lambdify((X[0], X[1]), rho_sym, "numpy")
+xv = np.linspace(-1.0, 1.0, 401)
+yv = np.linspace(-1.0, 1.0, 401)
+XX, YY = np.meshgrid(xv, yv)
+RHO = rho_lam(XX, YY)
+rho_clip = (1.0, float(RHO.max()))  # ρ ∈ [1, ~9]
+
+captured = {}
+captured[0] = np.asarray(m.X.coords).copy()
+print(f"iter 0 (undeformed) captured")
+
+for it in range(1, max(ITERS_TO_CAPTURE) + 1):
+    coords_before = np.asarray(m.X.coords).copy()
+    uw.meshing.smooth_mesh_interior(
+        m, metric=rho_sym, method="anisotropic", strategy="med",
+        method_kwargs=dict(n_outer=1, relax=1.0,
+                            metric_refresh_per_iter=True),
+        verbose=False)
+    coords_after = np.asarray(m.X.coords).copy()
+    dx = np.linalg.norm(coords_after - coords_before, axis=1)
+    print(f"iter {it}: max|Δx|={dx.max():.3e}  mean|Δx|={dx.mean():.3e}")
+    if it in ITERS_TO_CAPTURE:
+        captured[it] = coords_after
+
+
+# Render: 2x4 grid showing ρ background + adapted mesh at each iter
+ncols, nrows = 4, 2
+pl = pv.Plotter(shape=(nrows, ncols), off_screen=True,
+                window_size=(1200 * ncols, 1200 * nrows),
+                border=False)
+pl.set_background("white")
+
+# Build a uniform reference mesh to deform back to each captured
+# state, just for visualisation purposes.
+m_viz_template = build_uniform_mesh()
+
+for panel_idx, it in enumerate(ITERS_TO_CAPTURE):
+    row, col = panel_idx // ncols, panel_idx % ncols
+    m_viz = build_uniform_mesh()
+    m_viz._deform_mesh(captured[it])
+
+    edges = vis.mesh_to_pv_mesh(m_viz).extract_all_edges()
+
+    # ρ background — a structured grid sampled analytically
+    bg = pv.ImageData(
+        dimensions=(401, 401, 1),
+        spacing=(2.0 / 400, 2.0 / 400, 1.0),
+        origin=(-1.0, -1.0, 0.0))
+    bg.point_data["rho"] = RHO.ravel(order="F")
+
+    pl.subplot(row, col)
+    pl.add_text(f"iter {it}", font_size=24, color='black')
+    pl.add_mesh(bg, scalars="rho", cmap="Blues",
+                clim=rho_clip, show_edges=False,
+                lighting=False, show_scalar_bar=False, opacity=0.85)
+    pl.add_mesh(edges, color="black", line_width=1.0,
+                lighting=False, opacity=0.85)
+    pl.view_xy()
+    pl.camera.zoom(1.15)
+
+out_png = os.path.join(OUT, "plot_iter_sequence.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"\nwrote {out_png}")
diff --git a/scripts/adapt_cost_microbench.py b/scripts/adapt_cost_microbench.py
new file mode 100644
index 00000000..dfe435a7
--- /dev/null
+++ b/scripts/adapt_cost_microbench.py
@@ -0,0 +1,112 @@
+"""Micro-benchmark: how many (adv+stokes) solve-steps does ONE
+pristine adaptation cost? Isolated single process (no contention),
+res-16 Ra=1e5, same setup as the saturation runner. Breaks the
+adaptation into metric / mover / remap+restokes.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+RA, RES, r_inner, r_o = 1.0e5, 16, 0.5, 1.0
+
+m = uw.meshing.Annulus(radiusOuter=r_o, radiusInner=r_inner,
+                        cellSize=1.0 / RES, qdegree=3)
+r, th = m.CoordinateSystem.R
+v = uw.discretisation.MeshVariable("V", m, vtype=uw.VarType.VECTOR,
+                                   degree=2, continuous=True)
+P = uw.discretisation.MeshVariable("P", m, vtype=uw.VarType.SCALAR,
+                                   degree=1, continuous=True)
+T = uw.discretisation.MeshVariable("T", m, vtype=uw.VarType.SCALAR,
+                                   degree=3, continuous=True)
+stokes = uw.systems.Stokes(m, velocityField=v, pressureField=P)
+stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+stokes.tolerance = 1.0e-5
+stokes.penalty = 0.0
+ur = m.CoordinateSystem.unit_e_0
+stokes.add_essential_bc((0.0, 0.0), m.boundaries.Lower.name)
+stokes.add_natural_bc(1.0e6 * v.sym.dot(ur) * ur,
+                      m.boundaries.Upper.name)
+stokes.bodyforce = RA * (T.sym[0] - (r_o - r) / (r_o - r_inner)) * ur
+adv = uw.systems.AdvDiffusionSLCN(m, u_Field=T, V_fn=v.sym,
+                                  verbose=False, theta=0.5,
+                                  monotone_mode="clamp")
+adv.constitutive_model = uw.constitutive_models.DiffusionModel
+adv.constitutive_model.Parameters.diffusivity = 1.0
+adv.tolerance = 1.0e-4
+adv.add_dirichlet_bc(1.0, m.boundaries.Lower.name)
+adv.add_dirichlet_bc(0.0, m.boundaries.Upper.name)
+init_t = (0.01 * sympy.sin(5.0 * th)
+          * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+          + (r_o - r) / (r_o - r_inner))
+T.data[...] = np.asarray(uw.function.evaluate(
+    init_t, T.coords)).reshape(-1, 1)
+X0 = np.asarray(m.X.coords).copy()
+X0_Tx = np.asarray(T.coords).copy()
+stokes.solve(zero_init_guess=True)
+
+# warm a few steps so the field is representative (plumes growing)
+for _ in range(8):
+    dt = adv.estimate_dt()
+    adv.solve(timestep=dt, zero_init_guess=False)
+    stokes.solve(zero_init_guess=False)
+
+# (1) time plain (adv+stokes) steps
+N = 8
+t0 = time.perf_counter()
+for _ in range(N):
+    dt = adv.estimate_dt()
+    adv.solve(timestep=dt, zero_init_guess=False)
+    stokes.solve(zero_init_guess=False)
+t_step = (time.perf_counter() - t0) / N
+print(f"plain (adv+stokes) step      : {t_step:6.3f} s  (mean of {N})")
+
+# (2) time ONE pristine adaptation, broken down
+X_prev = np.asarray(m.X.coords).copy()
+T_prev = np.asarray(T.data).copy()
+
+ta = time.perf_counter()
+vals0 = np.asarray(uw.function.evaluate(T.sym[0], X0_Tx)).reshape(-1)
+m._deform_mesh(X0); T.data[:, 0] = vals0
+t_remap_in = time.perf_counter() - ta
+
+tb = time.perf_counter()
+rho = metric_density_from_gradient(m, T, amp=8.0, name="mb")
+t_metric = time.perf_counter() - tb
+
+X0c = np.asarray(m.X.coords).copy(); T0 = np.asarray(T.data).copy()
+tc = time.perf_counter()
+smooth_mesh_interior(m, metric=rho, method="anisotropic",
+                     method_kwargs=dict(aniso_cap=2.0, relax=0.2,
+                                        n_outer=8))
+t_mover = time.perf_counter() - tc
+
+new_X = np.asarray(m.X.coords).copy()
+new_Tx = np.asarray(T.coords).copy()
+td = time.perf_counter()
+m._deform_mesh(X0c); T.data[...] = T0
+valsN = np.asarray(uw.function.evaluate(T.sym[0], new_Tx)).reshape(-1)
+m._deform_mesh(new_X); T.data[:, 0] = valsN
+t_remap_out = time.perf_counter() - td
+
+te = time.perf_counter()
+stokes.solve(zero_init_guess=False)
+t_restokes = time.perf_counter() - te
+
+t_adapt = (t_remap_in + t_metric + t_mover + t_remap_out
+           + t_restokes)
+print(f"  remap-in  (eval+deform)    : {t_remap_in:6.3f} s")
+print(f"  metric    (grad projection): {t_metric:6.3f} s")
+print(f"  MOVER     (anisotropic)    : {t_mover:6.3f} s")
+print(f"  remap-out (eval+deform x2) : {t_remap_out:6.3f} s")
+print(f"  re-stokes                  : {t_restokes:6.3f} s")
+print(f"ONE pristine adaptation TOTAL: {t_adapt:6.3f} s")
+print(f"\nratio  adaptation / (adv+stokes step) = "
+      f"{t_adapt / t_step:5.1f}x")
+print(f"amortised over adapt-every-5: +{t_adapt/(5*t_step):.1f}x "
+      f"work vs a non-adaptive res-16 run "
+      f"(5 steps cost {5*t_step:.2f}s + 1 adapt {t_adapt:.2f}s)")
diff --git a/scripts/adapt_vs_ref24_cost.py b/scripts/adapt_vs_ref24_cost.py
new file mode 100644
index 00000000..06f5a129
--- /dev/null
+++ b/scripts/adapt_vs_ref24_cost.py
@@ -0,0 +1,102 @@
+"""Wall-clock per-step cost: res-24 plain step vs res-16 plain
+step vs one a16s aggressive pristine-adaptation overhead (with
+the REORDERED loop — so the adaptation adds only metric+mover+
+remap; the loop's single Stokes is already counted in the step).
+Contends with live runs ⇒ absolutes inflated, RATIO valid.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+RA, r_inner, r_o = 1.0e5, 0.5, 1.0
+
+
+def build(res):
+    m = uw.meshing.Annulus(radiusOuter=r_o, radiusInner=r_inner,
+                           cellSize=1.0 / res, qdegree=3)
+    r, th = m.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(f"V{res}", m,
+        vtype=uw.VarType.VECTOR, degree=2, continuous=True)
+    P = uw.discretisation.MeshVariable(f"P{res}", m,
+        vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+    T = uw.discretisation.MeshVariable(f"T{res}", m,
+        vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    st = uw.systems.Stokes(m, velocityField=v, pressureField=P)
+    st.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    st.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    st.tolerance = 1.0e-5
+    st.penalty = 0.0
+    ur = m.CoordinateSystem.unit_e_0
+    st.add_essential_bc((0.0, 0.0), m.boundaries.Lower.name)
+    st.add_natural_bc(1.0e6 * v.sym.dot(ur) * ur,
+                      m.boundaries.Upper.name)
+    st.bodyforce = RA * (T.sym[0]
+                         - (r_o - r) / (r_o - r_inner)) * ur
+    ad = uw.systems.AdvDiffusionSLCN(m, u_Field=T, V_fn=v.sym,
+        verbose=False, theta=0.5, monotone_mode="clamp")
+    ad.constitutive_model = uw.constitutive_models.DiffusionModel
+    ad.constitutive_model.Parameters.diffusivity = 1.0
+    ad.tolerance = 1.0e-4
+    ad.add_dirichlet_bc(1.0, m.boundaries.Lower.name)
+    ad.add_dirichlet_bc(0.0, m.boundaries.Upper.name)
+    T.data[...] = np.asarray(uw.function.evaluate(
+        0.01 * sympy.sin(5.0 * th) * sympy.sin(
+            np.pi * (r - r_inner) / (r_o - r_inner))
+        + (r_o - r) / (r_o - r_inner), T.coords)).reshape(-1, 1)
+    return m, v, P, T, st, ad
+
+
+def time_steps(res, n=4):
+    m, v, P, T, st, ad = build(res)
+    st.solve(zero_init_guess=True)
+    for _ in range(4):                       # warm (grow plumes)
+        ad.solve(timestep=ad.estimate_dt(), zero_init_guess=False)
+        st.solve(zero_init_guess=False)
+    t0 = time.perf_counter()
+    for _ in range(n):
+        ad.solve(timestep=ad.estimate_dt(), zero_init_guess=False)
+        st.solve(zero_init_guess=False)
+    ts = (time.perf_counter() - t0) / n
+    return ts, m, v, P, T, st, ad
+
+
+t24, *_ = time_steps(24)
+print(f"res-24 plain (adv+stokes) step : {t24:7.3f} s")
+t16, m, v, P, T, st, ad = time_steps(16)
+print(f"res-16 plain (adv+stokes) step : {t16:7.3f} s")
+
+# one a16s aggressive pristine adaptation OVERHEAD (no stokes —
+# the reordered loop's single stokes is counted in t16)
+X0 = np.asarray(m.X.coords).copy()
+X0_Tx = np.asarray(T.coords).copy()
+X_prev = np.asarray(m.X.coords).copy()
+ta = time.perf_counter()
+vals0 = np.asarray(uw.function.evaluate(T.sym[0], X0_Tx)).reshape(-1)
+m._deform_mesh(X0); T.data[:, 0] = vals0
+rho = metric_density_from_gradient(m, T, amp=16.0, name="mb")
+X0c = np.asarray(m.X.coords).copy(); T0 = np.asarray(T.data).copy()
+smooth_mesh_interior(m, metric=rho, method="anisotropic",
+                     method_kwargs=dict(aniso_cap=4.0, relax=0.05,
+                                        n_outer=25))
+new_X = np.asarray(m.X.coords).copy()
+new_Tx = np.asarray(T.coords).copy()
+m._deform_mesh(X0c); T.data[...] = T0
+valsN = np.asarray(uw.function.evaluate(T.sym[0], new_Tx)).reshape(-1)
+m._deform_mesh(new_X); T.data[:, 0] = valsN
+t_ov = time.perf_counter() - ta
+print(f"a16s adaptation OVERHEAD (no stokes): {t_ov:7.3f} s")
+
+eff = t16 + t_ov / 5.0
+r_step = eff / t24
+print(f"\na16s effective per-step (5 steps + 1 adapt)/5 = "
+      f"{eff:7.3f} s")
+print(f"per-STEP slowdown   a16s / ref24 = {r_step:5.2f}x")
+print(f"dt penalty (ref24/a16s, from histories) = 1.07x")
+print(f"per-SIM-TIME slowdown a16s / ref24 ≈ {r_step*1.07:5.2f}x")
+print(f"(context: a16p adapt is cheaper [n_outer=8] & dt 0.87x; "
+      f"uniform res-16 is ~{t16/t24:.2f}x/step & 0.61x steps "
+      f"⇒ ~{(t16/t24)*0.61:.2f}x/sim-time vs ref24)")
diff --git a/scripts/adaptive_convection_harness.py b/scripts/adaptive_convection_harness.py
new file mode 100644
index 00000000..f2eba279
--- /dev/null
+++ b/scripts/adaptive_convection_harness.py
@@ -0,0 +1,328 @@
+"""Adaptive-convection TEST HARNESS (scaffold for the next phase).
+
+Goal: does Ra=1e5 annulus convection on a coarse res-16 mesh that
+is *adaptively snuggled* on |∇T| reproduce the diagnostics of a
+uniform res-24 reference (the "truth") — at far fewer nodes?
+
+Structure
+  * reference  : uniform res-24, run N steps, record Nu/vrms(t).
+  * adaptive   : res-16, every ADAPT_EVERY steps call the locked-in
+                 API (metric_density_from_gradient +
+                 smooth_mesh_interior method="anisotropic"); record
+                 Nu/vrms(t).
+  * compare    : Nu(t), vrms(t) reference vs adaptive (matched
+                 physical time) + an error summary + a figure.
+
+THE OPEN PIECE (next phase): when the mover moves nodes mid-run,
+the mesh has a velocity ``v_mesh = Δx_adapt / Δt`` that the
+advection–diffusion system must see (ALE: effective transport
+velocity ``v_fluid − v_mesh``), or T must be conservatively
+remapped onto the moved nodes. Without it the adaptation injects a
+spurious advection. ``apply_adaptation_correction`` is the explicit
+hook: ``--correction none`` runs the *uncorrected* baseline (this
+is expected to drift — it is the thing the next phase fixes);
+``--correction ale`` raises with the precise spec to implement.
+
+Both runs' histories are cached to npz (never re-run).
+"""
+from __future__ import annotations
+import os
+import argparse
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+p = argparse.ArgumentParser()
+p.add_argument("--ref-res", type=int, default=24)
+p.add_argument("--adapt-res", type=int, default=16)
+p.add_argument("--n-steps", type=int, default=20)
+p.add_argument("--adapt-every", type=int, default=5)
+p.add_argument("--Ra", type=float, default=1.0e5)
+p.add_argument("--amp", type=float, default=8.0)
+p.add_argument("--correction", choices=["none", "interp", "ale"],
+               default="none")
+args = p.parse_args()
+r_inner, r_o = 0.5, 1.0
+CDIR = "/tmp/metric_mesh"
+SNAP = "/tmp/metric_mesh/harness_remap"
+os.makedirs(CDIR, exist_ok=True)
+os.makedirs(SNAP, exist_ok=True)
+
+
+def build(res, tag):
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / res, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        f"V{tag}", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        f"P{tag}", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        f"T{tag}", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    # The validated benchmark config (trusted): no-slip inner,
+    # free-slip outer — no Stokes nullspace.
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = args.Ra * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    init_t = (0.01 * sympy.sin(5.0 * th)
+              * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+              + (r_o - r) / (r_o - r_inner))
+    T.data[...] = np.asarray(uw.function.evaluate(
+        init_t, T.coords)).reshape(-1, 1)
+    return mesh, v, P, T, stokes, adv, unit_r
+
+
+# Analytic steady-conduction flux through the annulus outer
+# boundary: true ∇²T=0 solution is LOGARITHMIC ⇒
+# Q_cond = 2π / ln(R_o/R_i).
+_Q_COND = 2.0 * np.pi / np.log(r_o / r_inner)
+
+
+_NU_CACHE = {}
+_R_MID = 0.5 * (r_inner + r_o)
+
+
+def nusselt(mesh, T, v=None, cellsize=None):
+    r"""Nu = total radial heat flow through the interior mid-shell
+    / conductive flow. q_r = v_r·T - ∂T/∂r projected to a nodal
+    field, integrated on r=(R_i+R_o)/2 — shell-independent at
+    steady state and immune to thermal-BL resolution (validated:
+    scripts/_nu_proper.py — analytic conduction ⇒ Nu=1.0000;
+    settled checkpoints shell-consistent and ≈ the boundary
+    method). Q_cond = 2π/ln(R_o/R_i) (annular log conduction).
+    Cached per mesh. v required for the advective term."""
+    key = id(mesh)
+    cache = _NU_CACHE.get(key)
+    if cache is None:
+        qf = uw.discretisation.MeshVariable(
+            f"nu_qr_{key:x}", mesh, vtype=uw.VarType.SCALAR,
+            degree=2, continuous=True)
+        proj = uw.systems.Projection(mesh, qf)
+        proj.smoothing = 0.0
+        X = mesh.CoordinateSystem.X
+        er = mesh.CoordinateSystem.unit_e_0
+        gradT_r = (T.sym[0].diff(X[0]) * er[0]
+                   + T.sym[0].diff(X[1]) * er[1])
+        vr = ((v.sym[0] * er[0] + v.sym[1] * er[1])
+              if v is not None else sympy.Integer(0))
+        proj.uw_function = vr * T.sym[0] - gradT_r
+        _NU_CACHE[key] = (qf, proj)
+    else:
+        qf, proj = cache
+    proj.solve()
+    th = np.linspace(0, 2 * np.pi, 720, endpoint=False)
+    pts = np.column_stack([_R_MID * np.cos(th),
+                           _R_MID * np.sin(th)])
+    q = np.asarray(uw.function.evaluate(
+        qf.sym[0], pts)).reshape(-1)
+    return float(q.mean() * _R_MID * 2.0 * np.pi) / _Q_COND
+
+
+def vrms(mesh, v):
+    a = np.asarray(uw.function.evaluate(
+        v.sym.dot(v.sym), mesh.X.coords))
+    return float(np.sqrt(np.mean(a)))
+
+
+def adapt_with_correction(mode, mesh, T, v, P, stokes, dt):
+    r"""Run one adaptation + the chosen field-handling correction.
+
+    The mover (`smooth_mesh_interior`) Lagrangian-carries the
+    fields: `_deform_mesh` only moves coordinates, DOF values are
+    untouched, so the field is rigidly transported by `Δx`.
+
+    * ``none`` — leave the Lagrangian carry as-is. Uncorrected
+      baseline: the mesh move is read by the next SLCN step as a
+      spurious advection (≈ v_mesh) — it drifts, by design.
+    * ``interp`` — **local FE remap, maximum fidelity.** This
+      adapter is topology-preserving: vector size, DOF ordering
+      and the parallel partition are invariant, and the pinned
+      boundary leaves the domain unchanged. So the correct remap
+      is the *old P3 field evaluated at the new node positions*
+      via the local FE evaluate (`uw.function.evaluate`, true P3
+      basis) — NOT `read_timestep`'s kd-tree/RBF path, which
+      exists for the decomposition-changing `mesh.adapt` case and
+      is diffusive (it made vrms error *worse* than uncorrected
+      here). Because the layout is invariant the snapshot/restore
+      are trivial same-shape array ops with no cross-rank
+      migration; because the domain is unchanged every new node is
+      in-domain (no extrapolation). Done by a brief
+      deform-back → evaluate → deform-forward around the mover
+      (the mover commits the move internally, so the old field is
+      reconstructed by restoring coords+values, evaluated at the
+      new positions, then the move is re-committed). Re-solve
+      Stokes so v is consistent with the new mesh + remapped T.
+    * ``ale`` — Lagrangian + `V_fn = v − v_mesh`. Requires the
+      adapter to be *bound* to this adv-diff solver (feed the mesh
+      velocity into its trace-back); a specialized optimization,
+      not the general path. Next-phase; raises for now.
+
+    The snapshot must happen BEFORE the mover, so the whole
+    adaptation step is orchestrated here.
+    """
+    rho = metric_density_from_gradient(mesh, T, amp=args.amp,
+                                       name="harness")
+    if mode == "interp":
+        old_X = np.asarray(mesh.X.coords).copy()
+        old_T = np.asarray(T.data).copy()
+    smooth_mesh_interior(
+        mesh, metric=rho, method="anisotropic",
+        method_kwargs=dict(aniso_cap=2.0, relax=0.2, n_outer=8))
+    if mode == "none":
+        return
+    if mode == "interp":
+        new_X = np.asarray(mesh.X.coords).copy()
+        new_Tx = np.asarray(T.coords).copy()   # new T-DOF coords
+        # reconstruct the OLD field (layout invariant ⇒ trivial)
+        mesh._deform_mesh(old_X)
+        T.data[...] = old_T
+        # local FE evaluate of the old P3 field at the new nodes
+        vals = np.asarray(uw.function.evaluate(
+            T.sym[0], new_Tx)).reshape(-1)
+        # re-commit the move + write the remapped field
+        mesh._deform_mesh(new_X)
+        T.data[:, 0] = vals
+        # refresh v on the new mesh + remapped T (next step's
+        # adv.solve uses v.sym for the SLCN trace-back).
+        stokes.solve(zero_init_guess=False)
+        return
+    raise NotImplementedError(
+        "ALE correction is a NEXT-PHASE, adapter-bound-to-advdiff "
+        "optimization: feed v_mesh = Δx/Δt into the SLCN "
+        "trace-back (V_fn = v − v_mesh). Use --correction interp "
+        "(the general, solver-agnostic remap) for now.")
+
+
+def run(mesh, v, P, T, stokes, adv, cellsize, adaptive, tag):
+    cache = f"{CDIR}/harness_{tag}_Ra{args.Ra:.0e}_n{args.n_steps}.npz"
+    if os.path.exists(cache):
+        print(f"  [{tag}] loading cached history {cache}")
+        z = np.load(cache)
+        return z["t"], z["Nu"], z["vrms"]
+    stokes.solve(zero_init_guess=True)
+    t_sim = 0.0
+    hist_t, hist_Nu, hist_v = [], [], []
+    for s in range(args.n_steps):
+        dt = adv.estimate_dt()
+        adv.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        if adaptive and (s + 1) % args.adapt_every == 0:
+            adapt_with_correction(args.correction, mesh, T, v, P,
+                                  stokes, dt)
+        Nu = nusselt(mesh, T, v)
+        vr = vrms(mesh, v)
+        hist_t.append(t_sim)
+        hist_Nu.append(Nu)
+        hist_v.append(vr)
+        if (s + 1) % 5 == 0 or s == 0:
+            tt = T.data[:, 0]
+            print(f"  [{tag}] step {s+1:2d} t={t_sim:.4f} "
+                  f"Nu={Nu:+.3f} vrms={vr:.3e} "
+                  f"T=[{tt.min():+.2f},{tt.max():+.2f}]", flush=True)
+    t = np.array(hist_t)
+    Nu = np.array(hist_Nu)
+    vr = np.array(hist_v)
+    np.savez(cache, t=t, Nu=Nu, vrms=vr)
+    print(f"  [{tag}] cached → {cache}")
+    return t, Nu, vr
+
+
+print(f"=== adaptive-convection harness  Ra={args.Ra:.0e}  "
+      f"ref res-{args.ref_res} (uniform) vs adapt res-"
+      f"{args.adapt_res} (every {args.adapt_every}, "
+      f"correction={args.correction}) ===")
+print(f"reference (uniform res-{args.ref_res}):")
+m, v, P, T, st, ad, ur = build(args.ref_res, "ref")
+tR, NuR, vR = run(m, v, P, T, st, ad, 1.0 / args.ref_res,
+                  False, f"ref{args.ref_res}")
+print(f"adaptive (res-{args.adapt_res}, correction="
+      f"{args.correction}):")
+m, v, P, T, st, ad, ur = build(args.adapt_res, "ad")
+tA, NuA, vA = run(m, v, P, T, st, ad, 1.0 / args.adapt_res,
+                  True, f"adapt{args.adapt_res}_{args.correction}")
+
+# overlay the cached uncorrected baseline (to SEE what the
+# correction does relative to the Lagrangian-carry drift)
+base = None
+if args.correction != "none":
+    bpath = (f"{CDIR}/harness_adapt{args.adapt_res}_none_"
+             f"Ra{args.Ra:.0e}_n{args.n_steps}.npz")
+    if os.path.exists(bpath):
+        zb = np.load(bpath)
+        base = (zb["t"], zb["Nu"], zb["vrms"])
+        print(f"overlaying uncorrected baseline {bpath}")
+
+# compare on the overlapping physical-time window
+tmax = min(tR.max(), tA.max())
+tg = np.linspace(min(tR.min(), tA.min()), tmax, 60)
+NuR_i = np.interp(tg, tR, NuR)
+NuA_i = np.interp(tg, tA, NuA)
+vR_i = np.interp(tg, tR, vR)
+vA_i = np.interp(tg, tA, vA)
+nu_err = float(np.sqrt(np.mean((NuA_i - NuR_i) ** 2)))
+v_err = float(np.sqrt(np.mean((vA_i - vR_i) ** 2))
+              / max(np.mean(np.abs(vR_i)), 1e-30))
+print(f"\nrms ΔNu(adaptive-ref) = {nu_err:.4f}   "
+      f"rel rms Δvrms = {v_err:.4f}   "
+      f"(adaptive res-{args.adapt_res} vs ref res-{args.ref_res}; "
+      f"correction={args.correction})")
+
+fig, ax = plt.subplots(1, 2, figsize=(15, 5.4))
+ax[0].plot(tR, NuR, "o-", color="k", lw=1.6, ms=3,
+           label=f"ref uniform res-{args.ref_res}")
+ax[0].plot(tA, NuA, "s--", color="#1f4e8c", lw=1.6, ms=3,
+           label=f"adapt res-{args.adapt_res} ({args.correction})")
+if base is not None:
+    ax[0].plot(base[0], base[1], "^:", color="#c0392b", lw=1.3,
+               ms=3, label=f"adapt res-{args.adapt_res} (none)")
+ax[0].set_xlabel("sim time")
+ax[0].set_ylabel("Nu")
+ax[0].set_title("Nusselt(t)")
+ax[1].plot(tR, vR, "o-", color="k", lw=1.6, ms=3,
+           label=f"ref uniform res-{args.ref_res}")
+ax[1].plot(tA, vA, "s--", color="#1f4e8c", lw=1.6, ms=3,
+           label=f"adapt res-{args.adapt_res} ({args.correction})")
+if base is not None:
+    ax[1].plot(base[0], base[2], "^:", color="#c0392b", lw=1.3,
+               ms=3, label=f"adapt res-{args.adapt_res} (none)")
+ax[1].set_xlabel("sim time")
+ax[1].set_ylabel("vrms")
+ax[1].set_title("vrms(t)")
+for a in ax:
+    a.legend(fontsize=9)
+    a.grid(alpha=0.3)
+fig.suptitle(f"Adaptive-convection harness — Ra={args.Ra:.0e}  "
+             f"(rms ΔNu={nu_err:.3f}; correction="
+             f"{args.correction} — ALE correction is next-phase)",
+             fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+out = f"{CDIR}/adaptive_convection_harness.png"
+fig.savefig(out, dpi=130)
+print(f"saved {out}")
diff --git a/scripts/adaptive_first_jump.py b/scripts/adaptive_first_jump.py
new file mode 100644
index 00000000..9635de11
--- /dev/null
+++ b/scripts/adaptive_first_jump.py
@@ -0,0 +1,129 @@
+"""Watch the mesh through the FIRST adaptation jump of an Ra=1e5
+adaptive convection run (res-16, harness BCs — BCs not the point).
+
+Runs 5 convection steps, UW-checkpoints the pre-adapt state, fires
+the anisotropic mover once, UW-checkpoints the post-adapt state.
+Visualisation follows the UW/pyvista requirement: load each
+checkpoint mesh, read_timestep the P3 T onto it, render T on its
+own DOF cloud (faithful high-order, NOT vertex-only) overlaid with
+the deformed-mesh edges; white bg, lighting off, off-screen.
+
+The h5 checkpoints ARE the cache — re-running only re-renders.
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+RA, RES, N_PRE, AMP = 1.0e5, 16, 5, 8.0
+r_inner, r_o = 0.5, 1.0
+SNAP = "/tmp/metric_mesh/aj_snaps"
+os.makedirs(SNAP, exist_ok=True)
+TNAME, VNAME = "T", "V"
+
+
+def build():
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / RES, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        VNAME, mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        "P", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        TNAME, mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = RA * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    init_t = (0.01 * sympy.sin(5.0 * th)
+              * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+              + (r_o - r) / (r_o - r_inner))
+    T.data[...] = np.asarray(uw.function.evaluate(
+        init_t, T.coords)).reshape(-1, 1)
+    return mesh, v, P, T, stokes, adv
+
+
+after_h5 = f"{SNAP}/aj_after.mesh.00000.h5"
+if not os.path.exists(after_h5):
+    mesh, v, P, T, stokes, adv = build()
+    stokes.solve(zero_init_guess=True)
+    t_sim = 0.0
+    for s in range(N_PRE):
+        dt = adv.estimate_dt()
+        adv.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        print(f"  step {s+1} t={t_sim:.4f}", flush=True)
+    mesh.write_timestep("aj_before", 0, outputPath=SNAP,
+                        meshVars=[T, v], meshUpdates=True,
+                        create_xdmf=True)
+    print("checkpointed pre-adapt (aj_before)")
+    rho = metric_density_from_gradient(mesh, T, amp=AMP,
+                                       name="aj")
+    smooth_mesh_interior(
+        mesh, metric=rho, method="anisotropic",
+        method_kwargs=dict(aniso_cap=2.0, relax=0.2, n_outer=8),
+        verbose=True)
+    mesh.write_timestep("aj_after", 0, outputPath=SNAP,
+                        meshVars=[T, v], meshUpdates=True,
+                        create_xdmf=True)
+    print("checkpointed post-adapt (aj_after)")
+else:
+    print(f"using cached checkpoints in {SNAP}")
+
+# ---- UW/pyvista render: T on its DOF cloud + deformed edges -----
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+pl = pv.Plotter(shape=(1, 2), off_screen=True,
+                window_size=(2000, 1000))
+pl.set_background("white")
+for col, tag in enumerate(("aj_before", "aj_after")):
+    m = uw.discretisation.Mesh(f"{SNAP}/{tag}.mesh.00000.h5")
+    Tv = uw.discretisation.MeshVariable(
+        TNAME, m, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    Tv.read_timestep(tag, TNAME, 0, outputPath=SNAP)
+    pv_T = vis.meshVariable_to_pv_mesh_object(Tv)
+    pv_T.point_data["T"] = np.asarray(Tv.data[:, 0])
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(0, col)
+    pl.add_text("before 1st jump" if col == 0
+                else "after 1st jump (mesh snuggled)",
+                font_size=12, color="black")
+    pl.add_mesh(pv_T, scalars="T", cmap="inferno",
+                clim=(0.0, 1.0), show_edges=False,
+                lighting=False, show_scalar_bar=(col == 1),
+                scalar_bar_args=dict(title="T", color="black"))
+    pl.add_mesh(edges, color="black", line_width=0.6,
+                lighting=False)
+    pl.view_xy()
+    pl.camera.zoom(1.3)
+out = "/tmp/metric_mesh/adaptive_first_jump.png"
+pl.screenshot(out)
+print(f"saved {out}")
diff --git a/scripts/adaptive_mesh_evolution.py b/scripts/adaptive_mesh_evolution.py
new file mode 100644
index 00000000..a5324333
--- /dev/null
+++ b/scripts/adaptive_mesh_evolution.py
@@ -0,0 +1,148 @@
+"""Watch the mesh re-snuggle at each adaptation event during an
+Ra=1e5 adaptive convection run (res-16, harness setup: no-slip
+inner / free-slip outer — BCs not the point here).
+
+Captures (coords, T@vertices) just BEFORE and just AFTER every
+adaptation, then renders one row per event:
+  col 0  T + mesh just before the mover fires
+  col 1  T + mesh just after  (the "snuggle" move)
+so the mesh visibly chases the growing plumes / boundary layers.
+Snapshots cached → replot is free.
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+
+RA, RES, N_STEPS, ADAPT_EVERY, AMP = 1.0e5, 16, 15, 5, 8.0
+r_inner, r_o = 0.5, 1.0
+CACHE = f"/tmp/metric_mesh/mesh_evo_Ra{RA:.0e}_res{RES}_n{N_STEPS}.npz"
+
+
+def build():
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / RES, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        "V", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        "P", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        "T", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = RA * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    init_t = (0.01 * sympy.sin(5.0 * th)
+              * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+              + (r_o - r) / (r_o - r_inner))
+    T.data[...] = np.asarray(uw.function.evaluate(
+        init_t, T.coords)).reshape(-1, 1)
+    return mesh, v, P, T, stokes, adv
+
+
+mesh, v, P, T, stokes, adv = build()
+tris = _tri_cells(mesh.dm)
+
+
+def Tvert(X):
+    return np.asarray(uw.function.evaluate(
+        T.sym[0], X)).reshape(-1)
+
+
+if os.path.exists(CACHE):
+    print(f"loading cached snapshots {CACHE}")
+    z = np.load(CACHE, allow_pickle=True)
+    snaps = list(z["snaps"])
+else:
+    stokes.solve(zero_init_guess=True)
+    snaps = []
+    t_sim = 0.0
+    for s in range(N_STEPS):
+        dt = adv.estimate_dt()
+        adv.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        if (s + 1) % ADAPT_EVERY == 0:
+            Xb = np.asarray(mesh.X.coords).copy()
+            Tb = Tvert(Xb)
+            Ab = np.abs(_signed_areas(Xb, tris))
+            rho = metric_density_from_gradient(
+                mesh, T, amp=AMP, name="evo")
+            smooth_mesh_interior(
+                mesh, metric=rho, method="anisotropic",
+                method_kwargs=dict(aniso_cap=2.0, relax=0.2,
+                                   n_outer=8))
+            Xa = np.asarray(mesh.X.coords).copy()
+            Ta = Tvert(Xa)
+            Aa = np.abs(_signed_areas(Xa, tris))
+            dmax = float(np.linalg.norm(Xa - Xb, axis=1).max())
+            snaps.append(dict(
+                step=s + 1, t=t_sim, Xb=Xb, Tb=Tb, Xa=Xa, Ta=Ta,
+                qb=Ab.min() / Ab.mean(), qa=Aa.min() / Aa.mean(),
+                dmax=dmax))
+            print(f"  adapt @ step {s+1:2d} t={t_sim:.4f}  "
+                  f"max|Δx|={dmax:.3e}  minA/meanA "
+                  f"{Ab.min()/Ab.mean():.3f}→{Aa.min()/Aa.mean():.3f}",
+                  flush=True)
+    np.savez(CACHE, snaps=np.array(snaps, dtype=object))
+    print(f"cached → {CACHE}")
+
+n = len(snaps)
+fig, ax = plt.subplots(n, 2, figsize=(11, 5.2 * n))
+if n == 1:
+    ax = ax[None, :]
+for i, sn in enumerate(snaps):
+    for j, (X, Tn, tag, q) in enumerate([
+            (sn["Xb"], sn["Tb"], "before", sn["qb"]),
+            (sn["Xa"], sn["Ta"], "after", sn["qa"])]):
+        a = ax[i, j]
+        tr = mtri.Triangulation(X[:, 0], X[:, 1], tris)
+        a.tricontourf(tr, Tn, levels=22, cmap="inferno")
+        a.triplot(tr, lw=0.35,
+                  color=("white" if j == 0 else "#7fdbff"),
+                  alpha=0.55)
+        a.set_aspect("equal")
+        a.set_xticks([])
+        a.set_yticks([])
+        a.set_title(
+            f"adapt #{i+1} (step {sn['step']}, t={sn['t']:.4f}) "
+            f"— {tag}  minA/meanA={q:.3f}"
+            + (f"  max|Δx|={sn['dmax']:.2e}" if j == 1 else ""),
+            fontsize=10)
+fig.suptitle(f"Mesh re-snuggling through adaptive convection "
+             f"(Ra={RA:.0e}, res-{RES}, adapt every "
+             f"{ADAPT_EVERY} steps) — T + mesh, before → after "
+             f"each update", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.985])
+out = "/tmp/metric_mesh/adaptive_mesh_evolution.png"
+fig.savefig(out, dpi=125)
+print(f"saved {out}")
diff --git a/scripts/adaptive_saturation.py b/scripts/adaptive_saturation.py
new file mode 100644
index 00000000..33da5a8b
--- /dev/null
+++ b/scripts/adaptive_saturation.py
@@ -0,0 +1,827 @@
+"""One model of the 3-way saturation experiment (launch 3 in
+parallel: --model ref24 | u16 | a16).
+
+Ra=1e5 annulus convection (no-slip inner / free-slip outer — the
+trusted benchmark BC, no Stokes nullspace). Run PAST the
+exponential overshoot into the settled-Nu regime (this is an
+exponentially-growing instability — the meaningful comparison is
+the saturated state, not the perturbation-sensitive exponential
+phase). Stop on a Nu-settle detector or a hard step/time cap.
+
+  ref24 : uniform res-24 (the reference)
+  u16   : uniform res-16
+  a16   : res-16 + anisotropic adaptation every ADAPT_EVERY steps
+          with the validated LOCAL-FE interp remap (topology-
+          preserving + fixed domain ⇒ old P3 field evaluated at
+          new nodes via uw.function.evaluate; max fidelity, no
+          kd-tree).
+
+Checkpointed regularly: UW write_timestep h5 (meshUpdates=True so
+each checkpoint is self-contained for the pyvista plotter) + an
+atomic per-step history npz. Plot with adaptive_saturation_plot.py
+any time (reads partial progress).
+"""
+from __future__ import annotations
+import os
+import sys
+import argparse
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+
+p = argparse.ArgumentParser()
+p.add_argument("--model", required=True,
+               choices=["ref24", "u16", "a16", "a16p", "a16s",
+                        "a16x", "a16y", "a16z", "a16c",
+                        "a16c2", "a16c15", "a16e", "a16ed",
+                        "a16r15", "a16r15p", "a16r15e",
+                        "a16c2v", "a16r15r", "a16r15v",
+                        "a16r15b","a16r15l2","a16r15tr","a16r15ko","a16r15a","a16r15d","a16r15_6","u16_6",
+                        "a16r15-n1","a16r15-thr","a16r15-noagr","a16r15-sor","a16r15-full","a16r15-noagrsor",
+                        "a16r15-n1-c","a16r15-thr-c","a16r15-noagr-c",
+                        "a16r15-sor-c","a16r15-full-c","a16r15-noagrsor-c",
+                        "a16r15-base-redo"])
+p.add_argument("--Ra", type=float, default=1.0e5)
+p.add_argument("--adapt-every", type=int, default=5)
+p.add_argument("--max-steps", type=int, default=500)
+p.add_argument("--t-end", type=float, default=0.6)
+p.add_argument("--ckpt-every", type=int, default=20)
+p.add_argument("--resume", action="store_true",
+               help="warm-start from this model's last checkpoint "
+                    "+ history and run --max-steps MORE steps "
+                    "(settle detector disabled — run the full "
+                    "extension; for pushing vrms to steady state)")
+p.add_argument("--resume-from", type=int, default=0,
+               help="restart from THIS checkpoint index instead "
+                    "of the max (0=max; e.g. 50 to test "
+                    "path-dependence from a clean pre-failure "
+                    "state)")
+p.add_argument("--src-tag", type=str, default="",
+               help="read the resume checkpoint+hist from this "
+                    "tag, but write outputs under --model's tag "
+                    "(so a probe can restart off another run "
+                    "without clobbering its files)")
+p.add_argument("--snes-debug", action="store_true",
+               help="after each adv/stokes solve, query the SNES "
+                    "converged reason + iteration count and tag "
+                    "WHICH solver diverged; enable snes_monitor "
+                    "for the residual history of the failure")
+p.add_argument("--stokes-cold-recover", type=int, default=0,
+               help="if the warm-started Stokes solve DIVERGES, "
+                    "re-solve on the SAME mesh from a COLD start "
+                    "(zero_init_guess=True) up to N times instead "
+                    "of propagating the corrupted V/P to the next "
+                    "step. Breaks the same-mesh failure burst.")
+p.add_argument("--no-vp-remap", action="store_true",
+               help="DISABLE the V,P remap across the mesh move "
+                    "(default: remap V,P onto the new node "
+                    "positions exactly like T — the correct/"
+                    "complete behaviour; only the old buggy "
+                    "T-only path leaves V,P scrambled). Use to "
+                    "A/B the remap's effect on Stokes warm-fails.")
+p.add_argument("--stokes-snes-opt",
+               choices=["default","basic","l2","tr","ksponly","direct",
+                        "gamg-n1","gamg-thr","gamg-noagr","gamg-sor","gamg-full","gamg-noagrsor",
+                        "gamg-n1-corr","gamg-thr-corr","gamg-noagr-corr",
+                        "gamg-sor-corr","gamg-full-corr","gamg-noagrsor-corr"],
+               default="default",
+               help="override the Stokes SNES: basic/l2 "
+                    "line-search, tr (trust region), or ksponly "
+                    "(treat the linear Stokes as linear). Warm-"
+                    "start robustness investigation.")
+p.add_argument("--stokes-snes-atol-auto", action="store_true",
+               help="capture the COLD first-solve initial residual "
+                    "‖F(x0=0)‖ and set a FIXED snes_atol = "
+                    "snes_rtol·‖F0‖ — the guess-INDEPENDENT "
+                    "problem-scale convergence criterion UW3's "
+                    "tolerance setter omits. Confirmation of the "
+                    "snes_atol root-cause fix.")
+args = p.parse_args()
+
+RES = 24 if args.model == "ref24" else 16
+ADAPT = args.model in ("a16", "a16p", "a16s", "a16x", "a16y",
+                       "a16z", "a16c", "a16c2", "a16c15", "a16e",
+                       "a16ed", "a16r15", "a16r15p", "a16r15e",
+                       "a16c2v", "a16r15r", "a16r15v",
+                       "a16r15b","a16r15l2","a16r15tr","a16r15ko","a16r15a","a16r15d","a16r15_6","a16r15-n1",
+                       "a16r15-thr","a16r15-noagr","a16r15-sor","a16r15-full","a16r15-noagrsor",
+                       "a16r15-n1-c","a16r15-thr-c","a16r15-noagr-c",
+                       "a16r15-sor-c","a16r15-full-c","a16r15-noagrsor-c",
+                       "a16r15-base-redo")
+PRISTINE = args.model in ("a16p", "a16s", "a16x", "a16y", "a16z",
+                          "a16c", "a16c2", "a16c15", "a16e",
+                          "a16ed", "a16r15", "a16r15p", "a16r15e",
+                          "a16c2v", "a16r15r", "a16r15v",
+                       "a16r15b","a16r15l2","a16r15tr","a16r15ko","a16r15a","a16r15d","a16r15_6","a16r15-n1",
+                       "a16r15-thr","a16r15-noagr","a16r15-sor","a16r15-full","a16r15-noagrsor",
+                       "a16r15-n1-c","a16r15-thr-c","a16r15-noagr-c",
+                       "a16r15-sor-c","a16r15-full-c","a16r15-noagrsor-c",
+                       "a16r15-base-redo")
+
+# Per-model metric strength. a16p = the conservative validated
+# defaults (was tuned vs the now-removed cumulative over-
+# compression). a16s = "more aggressive gradient following": the
+# documented clean-but-strong Pareto corner (cap 2→4 needs a
+# gentler relax + more n_outer per the validation arc; amp 8→16).
+# Pristine re-mesh keeps each event a single uniform→graded map,
+# so the static single-adaptation Pareto applies (no compounding).
+# `beta` (mover anisotropy strength) and `aniso_cap` (eigen-clamp)
+# are the EFFECTIVE knobs; `amp` cancels for the anisotropic mover
+# (no-op, see scripts/_amp_check.py). Existing entries keep
+# beta=200 (the mover default they actually ran with).
+_MP = {
+    "a16":  dict(amp=8.0,  aniso_cap=2.0, relax=0.2,  n_outer=8,
+                 beta=200.0),
+    "a16p": dict(amp=8.0,  aniso_cap=2.0, relax=0.2,  n_outer=8,
+                 beta=200.0),
+    "a16s": dict(amp=16.0, aniso_cap=4.0, relax=0.05, n_outer=25,
+                 beta=200.0),
+    # a16x: amp 16→24. WARNING — for the anisotropic mover `amp`
+    # is a NO-OP: M is built from |∇ρ|/max|∇ρ|, and with
+    # ρ=1+amp·t both scale with amp ⇒ it cancels exactly (verified
+    # to machine ε, scripts/_amp_check.py). a16x ≡ a16s in metric;
+    # kept only as a control / extra movie. To actually intensify
+    # bunching change `aniso_cap` (sharper peak, ≥6 folds) and/or
+    # `beta` in the mover (wider band); the percentile window in
+    # metric_density_from_gradient reshapes *where* it refines.
+    # (`amp` IS effective for the isotropic spring/MA methods.)
+    "a16x": dict(amp=24.0, aniso_cap=4.0, relax=0.05, n_outer=25,
+                 beta=200.0),
+    # a16y = GENUINELY more aggressive: the real levers — sharper
+    # peak (aniso_cap 4→5, the binding lever; ≥6 folds) + wider
+    # refined band (beta 200→300), with extra damping (relax
+    # 0.05→0.04, n_outer 25→30) so cap=5 stays non-folding under
+    # pristine re-mesh. NOT amp (no-op for the mover).
+    "a16y": dict(amp=16.0, aniso_cap=5.0, relax=0.04, n_outer=30,
+                 beta=300.0),
+    # a16z = BUDGET-CONCENTRATION via the percentile window (the
+    # genuine "more bunching" lever — amp/cap/β are flat). Same
+    # *stable* metric strength as a16s (cap=4/β=200/relax=0.05/
+    # n_outer=25) so any change is cleanly the percentile, not
+    # cap=5. lo_pct 50→85, hi_pct 97→99: only the steepest ~15% of
+    # |∇T| qualifies; the fixed node budget concentrates on the
+    # sharpest fronts (BL flanks / weak plumes go coarse).
+    # Numerics stress-test (sharper metric in a thin band ⇒ bigger
+    # per-event displacement ⇒ fold risk) — watched.
+    "a16z": dict(amp=16.0, aniso_cap=4.0, relax=0.05, n_outer=25,
+                 beta=200.0, lo_pct=85.0, hi_pct=99.0),
+    # a16c = the STRUCTURAL FIX: a16s metric strength + percentile
+    # (cap=4 / β=200 / relax=0.05 / n_outer=25 / pct 50-97) but
+    # coarsen_cap=4 turns the refine-only metric (M ⪰ base·I — flat
+    # zones pinned at h0, single steepest feature starves the rest)
+    # into its true anisotropic-EQUIDISTRIBUTION form: low-ρ nodes
+    # de-resolve to h0·√4 = 2·h0 and release budget to ALL fronts
+    # (BL *and* plumes). Clean A/B vs a16s (identical params,
+    # coarsen_cap=1). Per-node max anisotropy now ≈cap·cc=16 —
+    # the no-fold check (valid=True) is the point of this run.
+    "a16c": dict(amp=16.0, aniso_cap=4.0, coarsen_cap=4.0,
+                 relax=0.05, n_outer=25, beta=200.0),
+    # a16c cc=4 over-coarsens: de-resolution mechanism proven
+    # (p95/p05~5.4) but minA/meanA crashed 0.27→0.04-0.14 (slivers,
+    # irregular mesh). Back coarsen_cap DOWN to find the quality
+    # knee — flats to h0·√cc (cc=2 → 1.41·h0, cc=1.5 → 1.22·h0)
+    # instead of cc=4's 2·h0. aniso_cap stays 4; per-node anisotropy
+    # = cap·cc (8 / 6 here, vs a16c's 16). Same a16s base otherwise.
+    "a16c2":  dict(amp=16.0, aniso_cap=4.0, coarsen_cap=2.0,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16c2v = re-verify the "robust" cc=2 (regime 2) config on the
+    # CURRENT build. The cited 0-DIVERGED result predates the
+    # regime-1/2/3 restructuring + snes_max_it=1; this confirms the
+    # regime-2 path is still bit-faithful & solver-robust now.
+    # Identical params to a16c2; separate tag preserves provenance.
+    "a16c2v": dict(amp=16.0, aniso_cap=4.0, coarsen_cap=2.0,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16r15r = R=1.5 equidist (cleanest static mesh ⇒ V/P-guess
+    # is the only variable) for the cold-start RECOVERY test:
+    # run with --stokes-cold-recover N to see if breaking the
+    # same-mesh failure-propagation burst makes equidist robust.
+    "a16r15r": dict(amp=16.0, resolution_ratio=1.5,
+                    relax=0.05, n_outer=25, beta=200.0),
+    # a16r15v = R=1.5 equidist, V,P-remap ON (correct baseline),
+    # NO cold-recover. Isolates the V,P-remap effect: warm STOKES
+    # DIVERGED vs a16r15's recorded 30 (no remap). Expect the
+    # adapt-step (every 5th) bad-guess→KSP-stall fails to vanish;
+    # any residue = the genuinely-hard non-adapt transient.
+    "a16r15v": dict(amp=16.0, resolution_ratio=1.5,
+                    relax=0.05, n_outer=25, beta=200.0),
+    # a16r15a = the snes_atol root-cause CONFIRMATION: R=1.5
+    # equidist, warm start, V,P-remap ON, default newtonls+bt,
+    # NO cold-recover, but --stokes-snes-atol-auto (fixed
+    # problem-scale snes_atol). Predict: warm STOKES DIVERGED → 0
+    # (vs a16r15v's 24) because convergence uses the guess-
+    # independent absolute path.
+    # GAMG-anisotropy probe variants: clones of a16r15 (R=1.5).
+    # Each pairs with --stokes-snes-opt gamg-<...> on resume
+    # from a16r15 ckpt 50 (the reproducible failure case).
+    "a16r15-n1":   dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-thr":  dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-noagr":dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-sor":  dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-full": dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-noagrsor": dict(amp=16.0, resolution_ratio=1.5,
+                       relax=0.05, n_outer=25, beta=200.0),
+    # CORRECT-SCOPE GAMG variants (paired with -corr presets;
+    # _SNES_OPT correctly nests at fieldsplit_velocity_pc_gamg_*).
+    "a16r15-n1-c":   dict(amp=16.0, resolution_ratio=1.5,
+                          relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-thr-c":  dict(amp=16.0, resolution_ratio=1.5,
+                          relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-noagr-c":dict(amp=16.0, resolution_ratio=1.5,
+                          relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-sor-c":  dict(amp=16.0, resolution_ratio=1.5,
+                          relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-full-c": dict(amp=16.0, resolution_ratio=1.5,
+                          relax=0.05, n_outer=25, beta=200.0),
+    "a16r15-noagrsor-c": dict(amp=16.0, resolution_ratio=1.5,
+                              relax=0.05, n_outer=25, beta=200.0),
+    # Fresh baseline (default SNES opt) for comparison.
+    "a16r15-base-redo": dict(amp=16.0, resolution_ratio=1.5,
+                              relax=0.05, n_outer=25, beta=200.0),
+    "a16r15_6": dict(amp=16.0, resolution_ratio=1.5,
+                     relax=0.05, n_outer=25, beta=200.0),
+    "a16r15a": dict(amp=16.0, resolution_ratio=1.5,
+                    relax=0.05, n_outer=25, beta=200.0),
+    # a16r15d = the EXACT-inner-solve probe: R=1.5 equidist, warm,
+    # V,P-remap ON, default newtonls+bt, NO cold-recover, NO
+    # atol-auto, --stokes-snes-opt direct (MUMPS LU). Tests if an
+    # exact Newton step kills the transient line-search failures.
+    # vs a16r15v baseline 24.
+    "a16r15d": dict(amp=16.0, resolution_ratio=1.5,
+                    relax=0.05, n_outer=25, beta=200.0),
+    # Warm-start SNES investigation arms — all R=1.5 equidist,
+    # V,P-remap ON, no cold-recover; only --stokes-snes-opt
+    # differs (separate tags = separate output files). Baseline =
+    # a16r15v's 24 warm STOKES DIVERGED (default newtonls+bt).
+    "a16r15b":  dict(amp=16.0, resolution_ratio=1.5,
+                     relax=0.05, n_outer=25, beta=200.0),
+    "a16r15l2": dict(amp=16.0, resolution_ratio=1.5,
+                     relax=0.05, n_outer=25, beta=200.0),
+    "a16r15tr": dict(amp=16.0, resolution_ratio=1.5,
+                     relax=0.05, n_outer=25, beta=200.0),
+    "a16r15ko": dict(amp=16.0, resolution_ratio=1.5,
+                     relax=0.05, n_outer=25, beta=200.0),
+    "a16c15": dict(amp=16.0, aniso_cap=4.0, coarsen_cap=1.5,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16e = the FINAL single-knob equidistribution API:
+    # resolution_ratio=2 only (no coarsen_cap/aniso_cap → regime 1,
+    # s=base·ρ/G, complementary coarsening automatic by budget
+    # conservation). Same a16s base (relax/n_outer/beta). Expected
+    # ≈ a16c2 quality/physics (R=2 is the validated operating
+    # point) but with the split parameter-free. Settled re-validation.
+    "a16e":   dict(amp=16.0, resolution_ratio=2.0,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16ed = a16e + EMA temporal damping of the equidist
+    # normaliser G (geom_mean_smoothing=0.25, ln-space EMA across
+    # adaptation events). Should kill the startup-transient
+    # over-reaction + the steady-state grading-contrast pulse while
+    # leaving settled quality/Nu = a16e. Clean A/B vs the saved
+    # (undamped, geom_mean_smoothing=1.0) a16e checkpoints.
+    "a16ed":  dict(amp=16.0, resolution_ratio=2.0,
+                   geom_mean_smoothing=0.25,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16r15 = the CORRECTED production config. Frozen-snapshot
+    # sweep (scripts/_snap_sweep.py, step-20 overshoot field)
+    # showed R is the binding cell-quality lever, not bump-keying
+    # or EMA: R=2 sits on the poor-cell knee (Stokes-stressing
+    # clusters), R=1.5 is clean (beats cc2's quality tail at the
+    # worst instant, BIG&THIN=0) — with the DEFAULT construction
+    # (iso-keyed bump, EMA off). So this is the minimal clean
+    # single-knob config: resolution_ratio=1.5 only. Confirmation
+    # target: zero DIVERGED (= a16c2), cc2-class quality through
+    # the full lifecycle, settled Nu≈3.85 (≈ a16c2/a16e).
+    "a16r15": dict(amp=16.0, resolution_ratio=1.5,
+                   relax=0.05, n_outer=25, beta=200.0),
+    # a16r15p = a16r15 config, separate output tag — for the
+    # restart/instrumentation probe (reads a16r15 ckpt via
+    # --src-tag, writes its own files, no clobber).
+    "a16r15p": dict(amp=16.0, resolution_ratio=1.5,
+                    relax=0.05, n_outer=25, beta=200.0),
+    # a16r15e = task 34: the only untested combination — clean
+    # clamp (R=1.5, no static poor cells per the snapshot) PLUS
+    # EMA temporal damping of the equidist normaliser
+    # (geom_mean_smoothing=0.25). The restart probe proved the
+    # Stokes -6 failures are reproducible & dynamics-coupled (not
+    # path/static), so the per-event normaliser jump during the
+    # violent undershoot is the suspect — exactly what the EMA
+    # smooths. Target: ZERO STOKES DIVERGED (= cc2 bar). a16ed was
+    # R=2 (confounded by worse static quality); this is the clean
+    # test. Run with --snes-debug to tag any residual failures.
+    "a16r15e": dict(amp=16.0, resolution_ratio=1.5,
+                    geom_mean_smoothing=0.25,
+                    relax=0.05, n_outer=25, beta=200.0),
+}
+MP = _MP.get(args.model, _MP["a16p"])
+r_inner, r_o = 0.5, 1.0
+DIR = "/tmp/metric_mesh/sat"
+os.makedirs(DIR, exist_ok=True)
+TAG = args.model                       # outputs written here
+SRC = args.src_tag or TAG              # resume state read here
+HIST = f"{DIR}/sat_{TAG}_hist.npz"
+SRC_HIST = f"{DIR}/sat_{SRC}_hist.npz"
+
+
+def build():
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / RES, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        "V", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        "P", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        "T", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = args.Ra * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    init_t = (0.01 * sympy.sin(5.0 * th)
+              * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+              + (r_o - r) / (r_o - r_inner))
+    T.data[...] = np.asarray(uw.function.evaluate(
+        init_t, T.coords)).reshape(-1, 1)
+    return mesh, v, P, T, stokes, adv
+
+
+# Analytic steady-conduction heat flux through the OUTER boundary
+# of the annulus. The true conductive solution of ∇²T=0 with
+# T(R_i)=1, T(R_o)=0 is LOGARITHMIC (NOT the linear slab profile
+# the buoyancy reference uses): T_cond = ln(r/R_o)/ln(R_i/R_o) ⇒
+# Q_cond = -∮_outer ∇T_cond·n dS = 2π / ln(R_o/R_i).
+_Q_COND = 2.0 * np.pi / np.log(r_o / r_inner)
+
+
+_NU_CACHE = {}
+_R_MID = 0.5 * (r_inner + r_o)
+
+
+def nusselt(mesh, T, v=None, h=None):
+    r"""Nu = total radial heat flow through an INTERIOR shell /
+    conductive flow. Total flux q_r = v_r·T - ∂T/∂r is projected
+    to a nodal field (FE-consistent) and integrated on the
+    mid-shell r=(R_i+R_o)/2 — robust: at steady state the flow is
+    shell-independent (conservation) and the interior shell is
+    immune to thermal-BL resolution (unlike the boundary ∂T/∂r).
+    Validated (scripts/_nu_proper.py): analytic conduction ⇒
+    Nu=1.0000 on every shell; settled checkpoints agree with the
+    boundary method to ~1-2% and across shells. Q_cond = total
+    annular log-conduction flow 2π/ln(R_o/R_i) (Nu→1 at
+    conduction; verified). Cached per mesh (called every step).
+    v required for the advective term; omit ⇒ diffusive only."""
+    key = id(mesh)
+    cache = _NU_CACHE.get(key)
+    if cache is None:
+        qf = uw.discretisation.MeshVariable(
+            f"nu_qr_{key:x}", mesh, vtype=uw.VarType.SCALAR,
+            degree=2, continuous=True)
+        proj = uw.systems.Projection(mesh, qf)
+        proj.smoothing = 0.0
+        X = mesh.CoordinateSystem.X
+        er = mesh.CoordinateSystem.unit_e_0
+        gradT_r = (T.sym[0].diff(X[0]) * er[0]
+                   + T.sym[0].diff(X[1]) * er[1])
+        vr = ((v.sym[0] * er[0] + v.sym[1] * er[1])
+              if v is not None else sympy.Integer(0))
+        proj.uw_function = vr * T.sym[0] - gradT_r
+        _NU_CACHE[key] = (qf, proj)
+    else:
+        qf, proj = cache
+    proj.solve()
+    th = np.linspace(0, 2 * np.pi, 720, endpoint=False)
+    pts = np.column_stack([_R_MID * np.cos(th),
+                           _R_MID * np.sin(th)])
+    q = np.asarray(uw.function.evaluate(
+        qf.sym[0], pts)).reshape(-1)
+    return float(q.mean() * _R_MID * 2.0 * np.pi) / _Q_COND
+
+
+def vrms(mesh, v):
+    a = np.asarray(uw.function.evaluate(
+        v.sym.dot(v.sym), mesh.X.coords))
+    return float(np.sqrt(np.mean(a)))
+
+
+def adapt_local_fe_interp(mesh, T, v, P, stokes):
+    """Anisotropic adapt + local-FE remap (validated): old P3
+    field evaluated at the new node positions; layout-invariant ⇒
+    trivial restore, no migration; fixed domain ⇒ in-domain.
+    V,P are remapped the same way as T (correct/complete) unless
+    --no-vp-remap (the old T-only path that scrambled V,P)."""
+    rho = metric_density_from_gradient(
+        mesh, T, amp=MP["amp"], name="sat",
+        lo_percentile=MP.get("lo_pct", 50.0),
+        hi_percentile=MP.get("hi_pct", 97.0))
+    old_X = np.asarray(mesh.X.coords).copy()
+    old_T = np.asarray(T.data).copy()
+    old_v = np.asarray(v.data).copy()
+    old_P = np.asarray(P.data).copy()
+    smooth_mesh_interior(
+        mesh, metric=rho, method="anisotropic",
+        method_kwargs=dict(aniso_cap=MP.get("aniso_cap", 2.0),
+                           coarsen_cap=MP.get("coarsen_cap", 1.0),
+                           resolution_ratio=MP.get(
+                               "resolution_ratio", 1.0),
+                           geom_mean_smoothing=MP.get(
+                               "geom_mean_smoothing", 1.0),
+                           relax=MP["relax"],
+                           n_outer=MP["n_outer"],
+                           beta=MP["beta"]))
+    new_X = np.asarray(mesh.X.coords).copy()
+    new_Tx = np.asarray(T.coords).copy()
+    new_Vx = np.asarray(v.coords).copy()
+    new_Px = np.asarray(P.coords).copy()
+    mesh._deform_mesh(old_X)
+    T.data[...] = old_T
+    vals = np.asarray(uw.function.evaluate(
+        T.sym[0], new_Tx)).reshape(-1)
+    if not args.no_vp_remap:
+        # old_X geometry with old_v/old_P restored = the true
+        # previous (v,P) field; sample it at the NEW dof positions
+        # so the next Stokes warm-start is the previous solution
+        # at the new nodes — not stale values on moved nodes.
+        v.data[...] = old_v
+        P.data[...] = old_P
+        valsV = np.asarray(uw.function.evaluate(v.sym, new_Vx))
+        valsP = np.asarray(uw.function.evaluate(
+            P.sym[0], new_Px)).reshape(-1)
+    mesh._deform_mesh(new_X)
+    T.data[:, 0] = vals
+    if not args.no_vp_remap:
+        v.data[...] = valsV.reshape(v.data.shape)
+        P.data[:, 0] = valsP
+    # NO stokes here — the loop's single stokes.solve (now placed
+    # AFTER adaptation) recomputes v on the adapted mesh; the old
+    # in-adapt re-solve was redundant (v is stale-by-construction
+    # in the segregated scheme; nothing reads it before that solve).
+
+
+def adapt_pristine(mesh, T, v, P, stokes, X0, X0_Tx):
+    """Re-adapt from the ORIGINAL mesh points each event (not the
+    already-snuggled mesh) — the across-events analogue of making
+    the metric Lagrangian-once within a call. Each event maps
+    pristine X0 → graded(current T), so compression does NOT
+    compound (minA/meanA bounded to single-adaptation quality
+    instead of collapsing to ~0.07).
+
+    Field handling stays the validated local-FE remap. Sequence:
+    (1) put the *physical* T (currently on the previous graded
+    mesh) onto pristine X0 by an FE evaluate at the pristine T-DOF
+    coords; (2) build the metric from that pristine-mesh T;
+    (3) mover baseline = pristine X0 ⇒ fresh non-compounding graded
+    map; (4) FE-remap T onto the new graded mesh; (5) refresh v.
+    """
+    X_prev = np.asarray(mesh.X.coords).copy()
+    T_prev = np.asarray(T.data).copy()
+    # v,P true previous solution lives in the X_prev geometry
+    # (it was solved there); captured for the post-move remap.
+    v_prev = np.asarray(v.data).copy()
+    P_prev = np.asarray(P.data).copy()
+    # (1) physical T (mesh@X_prev, T_prev) → pristine X0 T-DOFs
+    vals0 = np.asarray(uw.function.evaluate(
+        T.sym[0], X0_Tx)).reshape(-1)
+    mesh._deform_mesh(X0)
+    T.data[:, 0] = vals0
+    # (2) metric from the physical T now on the pristine mesh
+    rho = metric_density_from_gradient(
+        mesh, T, amp=MP["amp"], name="sat",
+        lo_percentile=MP.get("lo_pct", 50.0),
+        hi_percentile=MP.get("hi_pct", 97.0))
+    # (3) mover baseline is pristine X0 (fresh, non-compounding)
+    X0c = np.asarray(mesh.X.coords).copy()
+    T0 = np.asarray(T.data).copy()
+    smooth_mesh_interior(
+        mesh, metric=rho, method="anisotropic",
+        method_kwargs=dict(aniso_cap=MP.get("aniso_cap", 2.0),
+                           coarsen_cap=MP.get("coarsen_cap", 1.0),
+                           resolution_ratio=MP.get(
+                               "resolution_ratio", 1.0),
+                           geom_mean_smoothing=MP.get(
+                               "geom_mean_smoothing", 1.0),
+                           relax=MP["relax"],
+                           n_outer=MP["n_outer"],
+                           beta=MP["beta"]))
+    new_X = np.asarray(mesh.X.coords).copy()
+    new_Tx = np.asarray(T.coords).copy()
+    new_Vx = np.asarray(v.coords).copy()
+    new_Px = np.asarray(P.coords).copy()
+    # (4) FE-remap the pristine-mesh T onto the new graded mesh
+    mesh._deform_mesh(X0c)
+    T.data[...] = T0
+    valsN = np.asarray(uw.function.evaluate(
+        T.sym[0], new_Tx)).reshape(-1)
+    mesh._deform_mesh(new_X)
+    T.data[:, 0] = valsN
+    # (4b) FE-remap V,P (the correct/complete behaviour): the true
+    # previous solution lives in the X_prev geometry — restore it
+    # there, sample at the NEW V/P dof positions, write onto the
+    # graded mesh. Without this the next Stokes warm-start is the
+    # old nodal values on moved nodes (a garbage guess → KSP stall
+    # → SNES DIVERGED_LINE_SEARCH). --no-vp-remap keeps the old
+    # T-only path for the controlled A/B.
+    if not args.no_vp_remap:
+        mesh._deform_mesh(X_prev)
+        v.data[...] = v_prev
+        P.data[...] = P_prev
+        valsV = np.asarray(uw.function.evaluate(v.sym, new_Vx))
+        valsP = np.asarray(uw.function.evaluate(
+            P.sym[0], new_Px)).reshape(-1)
+        mesh._deform_mesh(new_X)
+        v.data[...] = valsV.reshape(v.data.shape)
+        P.data[:, 0] = valsP
+    # (5) NO stokes here — see adapt_local_fe_interp note; the
+    # loop's single post-adaptation stokes.solve does it.
+
+
+def save_hist(hist):
+    a = np.array(hist, dtype=float)
+    tmp = HIST + ".tmp.npz"
+    np.savez(tmp, step=a[:, 0], t=a[:, 1], dt=a[:, 2],
+             Nu=a[:, 3], vrms=a[:, 4])
+    os.replace(tmp, HIST)
+
+
+def settled(nu_hist):
+    """Past the overshoot and flattened: Nu has exceeded the
+    nonlinear threshold and the trailing window is flat."""
+    if len(nu_hist) < 60:
+        return False
+    w = np.array(nu_hist[-40:])
+    if np.max(np.abs(np.array(nu_hist))) < 3.0:
+        return False                      # still pre-overshoot
+    return (w.max() - w.min()) < 0.06 * abs(w.mean())
+
+
+mesh, v, P, T, stokes, adv = build()
+h = 1.0 / RES
+
+# Warm-start SNES investigation: override the Stokes nonlinear
+# solver. This Stokes is effectively LINEAR (constant viscosity,
+# T-fixed buoyancy) yet the default newtonls+bt line-search fails
+# warm-started from the *true previous solution* on the equidist
+# mesh through the violent transient (a16r15v: 24 fails, all
+# non-adapt steps) while converging cold — a line-search
+# misconfiguration signature, not a guess problem.
+_SNES_OPT = {
+    "default": {},                                  # newtonls+bt
+    "basic":   {"snes_linesearch_type": "basic"},   # full step
+    "l2":      {"snes_linesearch_type": "l2"},
+    "tr":      {"snes_type": "newtontr"},            # trust region
+    "ksponly": {"snes_type": "ksponly"},             # it's linear
+    # EXACT inner Newton solve: MUMPS LU on the (small, res-16 2D)
+    # Stokes Jacobian, replacing the iterative fieldsplit inner
+    # solve. Tests whether the transient DIVERGED_LINE_SEARCH is
+    # the bt line search rejecting an INEXACT Newton step — an
+    # exact step should be accepted at λ=1. Keeps default
+    # newtonls+bt (we test the *step*, not the line search).
+    "direct":  {"ksp_type": "preonly",
+                "pc_type": "lu",
+                "pc_factor_mat_solver_type": "mumps",
+                "mat_mumps_icntl_24": 1},            # null-pivot detect
+    # Anisotropy-tuned GAMG (source-verified PETSc options).
+    # AMG aggregation degrades on stretched/anisotropic cells
+    # — these knobs counteract that without abandoning GAMG.
+    "gamg-n1":    {"pc_gamg_agg_nsmooths": 1},  # 2→1 (PETSc default)
+    "gamg-thr":   {"pc_gamg_threshold": 0.02,
+                   "pc_gamg_threshold_scale": 0.5},
+    "gamg-noagr": {"pc_gamg_aggressive_coarsening": 0},
+    "gamg-sor":   {"mg_levels_ksp_type": "richardson",
+                   "mg_levels_pc_type": "sor",
+                   "mg_levels_ksp_max_it": 2},
+    "gamg-full":  {"pc_gamg_agg_nsmooths": 1,
+                   "pc_gamg_threshold": 0.02,
+                   "pc_gamg_threshold_scale": 0.5,
+                   "pc_gamg_aggressive_coarsening": 0,
+                   "pc_gamg_mis_k_minimum_degree_ordering": True,
+                   "mg_levels_ksp_type": "richardson",
+                   "mg_levels_pc_type": "sor",
+                   "mg_levels_ksp_max_it": 2},
+    # Discriminator: aggregation-fix (noagr) PLUS smoother
+    # swap (sor). On the 40-step probe both alone gave 0;
+    # combined ought to also = 0. Use Krylov-iter / wall-time
+    # as the finer discriminator if both produce 0 fails.
+    "gamg-noagrsor": {"pc_gamg_aggressive_coarsening": 0,
+                       "mg_levels_ksp_type": "richardson",
+                       "mg_levels_pc_type": "sor",
+                       "mg_levels_ksp_max_it": 2},
+    # CORRECT-SCOPE variants: UW3 Stokes nests GAMG inside the
+    # velocity Schur sub-block at prefix
+    # `fieldsplit_velocity_pc_gamg_*`. The original (above)
+    # entries set the options at the global (Solver_NN_) scope,
+    # which is silently ignored by the velocity subsolver —
+    # verified bit-identical KSP residuals to default (see
+    # scripts/_sl_preset_verify.py). These -corr variants use
+    # the correct prefix and DO change the KSP convergence
+    # path. Run alongside the original (wrong-scope) entries
+    # to A/B them.
+    "gamg-n1-corr": {
+        "fieldsplit_velocity_pc_gamg_agg_nsmooths": 1},
+    "gamg-thr-corr": {
+        "fieldsplit_velocity_pc_gamg_threshold": 0.02,
+        "fieldsplit_velocity_pc_gamg_threshold_scale": 0.5},
+    "gamg-noagr-corr": {
+        "fieldsplit_velocity_pc_gamg_aggressive_coarsening": 0},
+    "gamg-sor-corr": {
+        "fieldsplit_velocity_mg_levels_ksp_type": "richardson",
+        "fieldsplit_velocity_mg_levels_pc_type": "sor",
+        "fieldsplit_velocity_mg_levels_ksp_max_it": 2},
+    "gamg-full-corr": {
+        "fieldsplit_velocity_pc_gamg_agg_nsmooths": 1,
+        "fieldsplit_velocity_pc_gamg_threshold": 0.02,
+        "fieldsplit_velocity_pc_gamg_threshold_scale": 0.5,
+        "fieldsplit_velocity_pc_gamg_aggressive_coarsening": 0,
+        "fieldsplit_velocity_pc_gamg_mis_k_minimum_degree_ordering": True,
+        "fieldsplit_velocity_mg_levels_ksp_type": "richardson",
+        "fieldsplit_velocity_mg_levels_pc_type": "sor",
+        "fieldsplit_velocity_mg_levels_ksp_max_it": 2},
+    "gamg-noagrsor-corr": {
+        "fieldsplit_velocity_pc_gamg_aggressive_coarsening": 0,
+        "fieldsplit_velocity_mg_levels_ksp_type": "richardson",
+        "fieldsplit_velocity_mg_levels_pc_type": "sor",
+        "fieldsplit_velocity_mg_levels_ksp_max_it": 2},
+}
+for _k, _vopt in _SNES_OPT.get(args.stokes_snes_opt, {}).items():
+    stokes.petsc_options[_k] = _vopt
+if args.stokes_snes_opt != "default":
+    print(f"[stokes-snes-opt={args.stokes_snes_opt}] "
+          f"{_SNES_OPT[args.stokes_snes_opt]}", flush=True)
+
+# pristine reference captured once (mesh + T-DOF coords undeformed)
+X0 = np.asarray(mesh.X.coords).copy()
+X0_Tx = np.asarray(T.coords).copy()
+
+# NB: do NOT enable the global PETSc `snes_converged_reason`
+# viewer for debugging — it leaks into the mover's ksponly linear
+# sub-solves (created later inside smooth_mesh_interior) and floods
+# the log with misleading "DIVERGED_MAX_IT iterations 0" lines (a
+# linear solve has no Newton iterations; the SNES wrapper
+# mis-labels it). The programmatic _snes_chk below tags WHICH UW
+# physics solver diverged + reason + iter-count with zero leakage.
+
+
+def _snes_chk(solver, name, step):
+    """Post-solve SNES convergence tag — reason<0 ⇒ diverged
+    (DIVERGED_LINE_SEARCH = -6). Identifies which physics solve
+    failed (the pyx retry message is solver-anonymous)."""
+    if not args.snes_debug:
+        return None
+    try:
+        sn = solver.snes
+        reason = int(sn.getConvergedReason())
+        its = int(sn.getIterationNumber())
+        if reason < 0:
+            print(f"  !! [step {step}] {name} SNES DIVERGED "
+                  f"reason={reason} its={its}", flush=True)
+        return reason
+    except Exception as e:
+        print(f"  !! [step {step}] {name} snes-chk error: {e}",
+              flush=True)
+        return None
+
+
+if args.resume:
+    import glob
+    import re
+    _fs = glob.glob(f"{DIR}/sat_{SRC}.mesh.T.*.h5")
+    _all = sorted(int(re.search(r"\.mesh\.T\.(\d+)\.h5$",
+                  f).group(1)) for f in _fs)
+    _idx = (args.resume_from if args.resume_from > 0
+            else max(_all))
+    T.read_timestep(f"sat_{SRC}", "T", _idx, outputPath=DIR)
+    v.read_timestep(f"sat_{SRC}", "V", _idx, outputPath=DIR)
+    _z = np.load(SRC_HIST)
+    hist = [[int(_z["step"][i]), float(_z["t"][i]),
+             float(_z["dt"][i]), float(_z["Nu"][i]),
+             float(_z["vrms"][i])] for i in range(len(_z["step"]))
+            if int(_z["step"][i]) <= _idx]
+    STEP0 = _idx
+    t_sim = (hist[-1][1] if hist else 0.0)
+    stokes.solve(zero_init_guess=False)   # sync v with loaded T
+    print(f"=== sat {TAG} RESUME from {SRC} ckpt {STEP0} "
+          f"t={t_sim:.4f} (+{args.max_steps} steps, "
+          f"snes_debug={args.snes_debug}) ===", flush=True)
+else:
+    stokes.solve(zero_init_guess=True)        # solve 1: creates SNES
+    if args.stokes_snes_atol_auto:
+        # First-pass scale: the SNES object exists only after a
+        # solve, so enable history NOW, then re-solve cold (same T,
+        # x0=0) to record ‖F(x0=0)‖ ≈ the problem/RHS residual
+        # scale. Set a FIXED snes_atol = snes_rtol·‖F0‖ so SNES
+        # converges on the guess-INDEPENDENT absolute path
+        # (SNESConvergedDefault snesut.c:752, evaluated even at
+        # it==0 since forceiteration is off) — the criterion UW3's
+        # tolerance setter never sets (snes_atol left at PETSc
+        # ~1e-50). Same accuracy a working cold solve achieves,
+        # just measured against the problem, not the guess.
+        _F0 = 0.0
+        try:
+            stokes.snes.setConvergenceHistory(reset=True)
+            stokes.solve(zero_init_guess=True)   # solve 2: records
+            _rh, _ = stokes.snes.getConvergenceHistory()
+            _F0 = float(_rh[0]) if _rh is not None and len(_rh) \
+                else 0.0
+        except Exception as _e:
+            print(f"[snes-atol-auto] history read failed: {_e!r}",
+                  flush=True)
+        _rtol = 1.0e-5                       # build(): stokes.tolerance
+        if _F0 > 0.0:
+            _atol = _rtol * _F0
+            stokes.petsc_options["snes_atol"] = _atol
+            print(f"[snes-atol-auto] cold ‖F0‖={_F0:.4e} ⇒ "
+                  f"snes_atol={_atol:.4e} (rtol={_rtol:.0e}); "
+                  f"absolute, guess-independent", flush=True)
+        else:
+            print("[snes-atol-auto] WARN: ‖F0‖ unavailable; "
+                  "atol unchanged (run is NOT a valid "
+                  "confirmation)", flush=True)
+    t_sim = 0.0
+    hist = []
+    STEP0 = 0
+    print(f"=== sat {TAG} (res-{RES}, adapt={ADAPT}) "
+          f"Ra={args.Ra:.0e} max_steps={args.max_steps} ===",
+          flush=True)
+for s in range(args.max_steps):
+    STEP = STEP0 + s + 1
+    dt = adv.estimate_dt()
+    adv.solve(timestep=dt, zero_init_guess=False)
+    _snes_chk(adv, "ADV ", STEP)
+    # Loop-reorder fix: adapt BETWEEN adv.solve and the single
+    # stokes.solve. The remesh+remap happens on the just-advected
+    # T; the one stokes.solve below then recomputes v on the
+    # adapted mesh (no redundant old-mesh solve, no in-adapt
+    # re-solve). Physically equivalent end-of-step state, ~2 fewer
+    # Stokes solves per adaptation step.
+    if ADAPT and STEP % args.adapt_every == 0:
+        if PRISTINE:
+            adapt_pristine(mesh, T, v, P, stokes, X0, X0_Tx)
+        else:
+            adapt_local_fe_interp(mesh, T, v, P, stokes)
+    stokes.solve(zero_init_guess=False)
+    _r = _snes_chk(stokes, "STOKES", STEP)
+    # Same-mesh recovery: a diverged warm solve leaves V/P
+    # corrupted ("solution vector may not have been updated");
+    # propagating it to the next step makes the failure
+    # self-sustaining (the same-mesh burst). Re-solve COLD on the
+    # SAME mesh+T (Stokes is linear here ⇒ a converged cold solve
+    # is the correct v,P) before moving on.
+    if args.stokes_cold_recover and _r is not None and _r < 0:
+        for _att in range(1, args.stokes_cold_recover + 1):
+            print(f"  ~~ [step {STEP}] STOKES cold-recover "
+                  f"attempt {_att} (zero_init_guess=True)",
+                  flush=True)
+            stokes.solve(zero_init_guess=True)
+            _r = _snes_chk(stokes, f"STOKES.r{_att}", STEP)
+            if _r is None or _r >= 0:
+                break
+    t_sim += dt
+    Nu = nusselt(mesh, T, v)
+    vr = vrms(mesh, v)
+    hist.append([STEP, t_sim, dt, Nu, vr])
+    if STEP % args.ckpt_every == 0 or (not args.resume and s == 0):
+        save_hist(hist)
+        mesh.write_timestep(f"sat_{TAG}", STEP, outputPath=DIR,
+                            meshVars=[T, v], meshUpdates=True,
+                            create_xdmf=False)
+        tt = T.data[:, 0]
+        print(f"  [{TAG}] step {STEP:3d} t={t_sim:.4f} "
+              f"dt={dt:.2e} Nu={Nu:+.3f} vrms={vr:.3e} "
+              f"T=[{tt.min():+.2f},{tt.max():+.2f}]", flush=True)
+    if t_sim >= args.t_end:
+        print(f"  [{TAG}] reached t_end={args.t_end}", flush=True)
+        break
+    if (not args.resume) and settled([hh[3] for hh in hist]):
+        print(f"  [{TAG}] Nu settled at step {STEP} "
+              f"t={t_sim:.4f} Nu≈{Nu:.2f}", flush=True)
+        break
+save_hist(hist)
+mesh.write_timestep(f"sat_{TAG}", len(hist), outputPath=DIR,
+                    meshVars=[T, v], meshUpdates=True,
+                    create_xdmf=False)
+print(f"  [{TAG}] DONE {len(hist)} steps, t={t_sim:.4f}, "
+      f"Nu={hist[-1][3]:+.3f}, final ckpt idx={len(hist)}",
+      flush=True)
diff --git a/scripts/adaptive_saturation_plot.py b/scripts/adaptive_saturation_plot.py
new file mode 100644
index 00000000..f88908ff
--- /dev/null
+++ b/scripts/adaptive_saturation_plot.py
@@ -0,0 +1,150 @@
+"""Progress plotter for the 3-way saturation experiment. Reads
+whatever the parallel runs have written so far (partial OK):
+
+  1. Nu(t) and vrms(t) overlay (matplotlib — scalar histories).
+  2. Latest-checkpoint T + mesh per model via the UW/pyvista path
+     (P3 T on its own DOF cloud + deformed-mesh edges, white bg,
+     lighting off — the required high-order render).
+
+Run any time while the runs progress.
+"""
+from __future__ import annotations
+import os
+import glob
+import re
+import numpy as np
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import underworld3 as uw
+import underworld3.visualisation as vis
+
+DIR = "/tmp/metric_mesh/sat"
+MODELS = [("ref24", 24, "k", "ref uniform res-24"),
+          ("u16", 16, "#c0392b", "uniform res-16"),
+          ("a16", 16, "#e07b00",
+           "res-16 + adapt (cumulative, over-adapted)"),
+          ("a16p", 16, "#1f4e8c",
+           "res-16 + adapt (pristine, conservative)"),
+          ("a16s", 16, "#2ca02c",
+           "res-16 + adapt (pristine, AGGRESSIVE)"),
+          ("a16x", 16, "#9467bd",
+           "res-16 + adapt (pristine, amp=24 ≡ a16s [amp no-op])"),
+          ("a16y", 16, "#8c564b",
+           "res-16 + adapt (pristine, cap=5 β=300 — true strong)"),
+          ("a16z", 16, "#17becf",
+           "res-16 + adapt (pristine, pct 85/99 — budget conc.)"),
+          ("a16c", 16, "#d62728",
+           "res-16 + adapt (pristine, coarsen_cap=4 — over-coarse)"),
+          ("a16c2", 16, "#e377c2",
+           "res-16 + adapt (pristine, coarsen_cap=2)"),
+          ("a16c15", 16, "#7f7f7f",
+           "res-16 + adapt (pristine, coarsen_cap=1.5)"),
+          ("a16e", 16, "#2ca02c",
+           "res-16 + adapt (equidistribution, resolution_ratio=2)"),
+          ("a16ed", 16, "#ff7f0e",
+           "res-16 + adapt (equidist R=2 + EMA-G damping)"),
+          ("a16r15", 16, "#1f77b4",
+           "res-16 + adapt (equidist R=1.5 — CORRECTED prod)"),
+          ("a16r15e", 16, "#9467bd",
+           "res-16 + adapt (equidist R=1.5 + EMA-G damping)")]
+
+
+def latest_ckpt(tag):
+    fs = glob.glob(f"{DIR}/sat_{tag}.mesh.T.*.h5")
+    idx = []
+    for f in fs:
+        m = re.search(r"\.mesh\.T\.(\d+)\.h5$", f)
+        if m:
+            idx.append(int(m.group(1)))
+    return max(idx) if idx else None
+
+
+# ---- 1. Nu(t) / vrms(t) -------------------------------------------
+fig, ax = plt.subplots(1, 2, figsize=(15, 5.4))
+print(f"{'model':>26} | {'steps':>5} {'t_end':>8} "
+      f"{'Nu_last':>8} {'vrms_last':>10}")
+print("-" * 66)
+for tag, res, col, lab in MODELS:
+    hp = f"{DIR}/sat_{tag}_hist.npz"
+    if not os.path.exists(hp):
+        print(f"{lab:>26} |  (no history yet)")
+        continue
+    try:
+        z = np.load(hp)
+    except Exception:
+        print(f"{lab:>26} |  (history mid-write, skip)")
+        continue
+    t, Nu, vr = z["t"], z["Nu"], z["vrms"]
+    ax[0].plot(t, Nu, "-", color=col, lw=1.7, label=lab)
+    ax[1].plot(t, vr, "-", color=col, lw=1.7, label=lab)
+    print(f"{lab:>26} | {int(z['step'][-1]):5d} {t[-1]:8.4f} "
+          f"{Nu[-1]:+8.3f} {vr[-1]:10.3e}")
+ax[0].set_xlabel("dimensionless time")
+ax[0].set_ylabel("Nu")
+ax[0].set_title("Nusselt(t) — overshoot then settle")
+ax[1].set_xlabel("dimensionless time")
+ax[1].set_ylabel("vrms")
+ax[1].set_title("vrms(t)")
+for a in ax:
+    a.legend(fontsize=9)
+    a.grid(alpha=0.3)
+fig.suptitle("3-way saturation: res-24 vs res-16 vs res-16+adapt "
+             "(does adapt land BETWEEN once settled?)",
+             fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig(f"{DIR}/sat_timeseries.png", dpi=130)
+print(f"\nsaved {DIR}/sat_timeseries.png")
+
+# ---- 2. latest T + mesh per model (UW/pyvista path) ---------------
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+present = [(t, r, c, l) for (t, r, c, l) in MODELS
+           if latest_ckpt(t) is not None]
+if present:
+    import math
+    n = len(present)
+    ncol = math.ceil(math.sqrt(n))
+    nrow = math.ceil(n / ncol)
+    pl = pv.Plotter(shape=(nrow, ncol), off_screen=True,
+                    window_size=(950 * ncol, 950 * nrow))
+    pl.set_background("white")
+    from underworld3.meshing.smoothing import (
+        _tri_cells, _signed_areas)
+    for k, (tag, res, _, lab) in enumerate(present):
+        rr, cc = divmod(k, ncol)
+        idx = latest_ckpt(tag)
+        m = uw.discretisation.Mesh(
+            f"{DIR}/sat_{tag}.mesh.{idx:05}.h5")
+        Tv = uw.discretisation.MeshVariable(
+            "T", m, vtype=uw.VarType.SCALAR, degree=3,
+            continuous=True)
+        Tv.read_timestep(f"sat_{tag}", "T", idx, outputPath=DIR)
+        pv_T = vis.meshVariable_to_pv_mesh_object(Tv)
+        pv_T.point_data["T"] = np.asarray(Tv.data[:, 0])
+        edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+        # quantify mesh compression (the over-adaptation check)
+        tris = _tri_cells(m.dm)
+        A = np.abs(_signed_areas(np.asarray(m.X.coords), tris))
+        qmin = A.min() / A.mean()
+        pl.subplot(rr, cc)
+        pl.add_text(f"{lab}\nckpt {idx}   minA/meanA={qmin:.3f}",
+                    font_size=16, color="black")
+        # RdBu_r: T=0 (cold) blue, T=1 (hot) red, 0.5 white — the
+        # free-surface viz convention (white bg + lighting=False so
+        # mid-T isn't dirty grey; crisp dark edges for cell quality)
+        pl.add_mesh(pv_T, scalars="T", cmap="RdBu_r",
+                    clim=(0.0, 1.0), show_edges=False,
+                    lighting=False,
+                    show_scalar_bar=(k == n - 1),
+                    scalar_bar_args=dict(title="T", color="black"))
+        pl.add_mesh(edges, color="#202020", line_width=0.8,
+                    lighting=False)
+        pl.view_xy()
+        pl.camera.zoom(1.3)
+    out = f"{DIR}/sat_fields.png"
+    pl.screenshot(out)
+    print(f"saved {out}")
+else:
+    print("no checkpoints yet for the field render")
diff --git a/scripts/aniso_api_smoke.py b/scripts/aniso_api_smoke.py
new file mode 100644
index 00000000..f51f2cf9
--- /dev/null
+++ b/scripts/aniso_api_smoke.py
@@ -0,0 +1,70 @@
+"""Smoke test for the locked-in adaptation API:
+  uw.meshing.metric_density_from_gradient  + smooth_mesh_interior(
+  method="anisotropic", method_kwargs=...).
+Checks: public import, the helper's cache (callable per-step with
+no duplicate-MeshVariable error), method_kwargs pass-through, a
+valid moved mesh, and the uniform-field ~no-op.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import (
+    smooth_mesh_interior, metric_density_from_gradient)
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+
+R_O, R_I, RES = 1.0, 0.5, 16
+
+
+def fresh():
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    T = uw.discretisation.MeshVariable(
+        "T", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    return m, T
+
+
+# 1. gradient-driven density on a feature field, then move
+m, T = fresh()
+X0 = np.asarray(m.X.coords).copy()
+tris = _tri_cells(m.dm)
+r = np.sqrt((np.asarray(T.coords) ** 2).sum(axis=1))
+T.data[:, 0] = np.exp(-((r - 0.7) / 0.12) ** 2)      # a "front"
+rho = metric_density_from_gradient(m, T, amp=8.0)
+# cache test: a second call must NOT raise (no dup MeshVariable)
+rho2 = metric_density_from_gradient(m, T, amp=8.0)
+print("metric_density_from_gradient cache OK (2 calls)")
+smooth_mesh_interior(m, metric=rho, method="anisotropic",
+                     method_kwargs=dict(aniso_cap=2.0, relax=0.2,
+                                        n_outer=8), verbose=True)
+X1 = np.asarray(m.X.coords).copy()
+a = _signed_areas(X1, tris)
+orient = np.sign(np.median(_signed_areas(X0, tris))) or 1.0
+moved = float(np.linalg.norm(X1 - X0, axis=1).max())
+mA = np.abs(a).min() / np.abs(a).mean()
+print(f"moved max|Δx|={moved:.4e}  minA/meanA={mA:.4f}  "
+      f"valid={(a*orient).min() > 0}")
+assert (a * orient).min() > 0.0, "tangled mesh"
+assert moved > 1e-4, "expected node movement on a gradient metric"
+
+# 2. method_kwargs really reaches the mover (bad kwarg → TypeError)
+try:
+    smooth_mesh_interior(m, metric=rho, method="anisotropic",
+                         method_kwargs=dict(not_a_real_kwarg=1))
+    raise AssertionError("bad method_kwargs silently accepted")
+except TypeError:
+    print("method_kwargs pass-through OK (unknown kwarg → TypeError)")
+
+# 3. uniform field ⇒ ρ≈1 ⇒ ~no-op (gradient ~0 everywhere)
+m3, T3 = fresh()
+T3.data[:, 0] = 1.0
+X0 = np.asarray(m3.X.coords).copy()
+rho3 = metric_density_from_gradient(m3, T3, amp=8.0)
+smooth_mesh_interior(m3, metric=rho3, method="anisotropic",
+                     method_kwargs=dict(n_outer=4))
+d = float(np.linalg.norm(
+    np.asarray(m3.X.coords) - X0, axis=1).max())
+print(f"uniform-field move max|Δx|={d:.2e} (≈ no-op)")
+assert d < 1e-3, f"uniform field should be ~no-op, got {d:.2e}"
+
+print("API smoke PASSED")
diff --git a/scripts/aniso_bl_asymmetry.py b/scripts/aniso_bl_asymmetry.py
new file mode 100644
index 00000000..b2418a0c
--- /dev/null
+++ b/scripts/aniso_bl_asymmetry.py
@@ -0,0 +1,158 @@
+"""Why does the inner (Lower) thermal BL NOT gather nodes the way
+the outer (Upper) region does, even though the metric ρ∝|∇T| is
+BRIGHTEST at the inner BL?
+
+Decisive radial profiles on the cached res-32 warm state:
+  * ρ(r)      — the metric density (∝|∇T|): does it peak at r=R_I?
+  * |∇ρ|(r)   — what the mover ACTUALLY refines on (the
+                gradient-metric clusters where |∇ρ| is large, NOT
+                where ρ is large — the blob-core lesson).
+  * node radial distribution BEFORE vs AFTER the mover — where
+    did points actually snuggle?
+
+Three things differ between the two boundaries:
+  (1) velocity BC: inner = no-slip (add_essential_bc), outer =
+      free-slip (add_natural_bc) → different BL dynamics;
+  (2) the inner |∇T| peak sits ON the pinned wall, where ∇ρ≈0
+      (the blob-CORE de-refinement) AND pinned;
+  (3) annulus geometry: inner circumference πD=π vs outer 2π —
+      half the tangential room at r=R_I.
+This script isolates (2): show ρ peaks at R_I but |∇ρ| (the real
+driver) does NOT — it's ~0 at the on-wall peak.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import underworld3 as uw
+from underworld3.meshing import smooth_mesh_interior
+from underworld3.meshing.smoothing import _tri_cells
+
+RA, AMP, RES = 1.0e5, 8.0, 32
+N_WARM = 5
+r_inner, r_o = 0.5, 1.0
+C32 = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES}_warm{N_WARM}.npz"
+
+
+def build():
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / RES, qdegree=3)
+    v = uw.discretisation.MeshVariable(
+        "V32", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        "T32", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    return mesh, v, T
+
+
+mesh, v, T = build()
+z = np.load(C32)
+T.data[...] = z["T"].reshape(T.data.shape)
+v.data[...] = z["V"].reshape(v.data.shape)
+X0 = np.asarray(mesh.X.coords).copy()
+r0 = np.hypot(X0[:, 0], X0[:, 1])
+tris = _tri_cells(mesh.dm)
+
+# --- ρ field and ∇ρ field -----------------------------------------
+Xs = mesh.CoordinateSystem.X
+gradT = uw.discretisation.MeshVariable(
+    "gT", mesh, vtype=uw.VarType.VECTOR, degree=1, continuous=True)
+gp = uw.systems.Vector_Projection(mesh, gradT)
+gp.smoothing = 0.0
+gp.uw_function = sympy.Matrix(
+    [T.sym[0].diff(Xs[i]) for i in range(2)]).T
+gp.solve()
+rho0 = uw.discretisation.MeshVariable(
+    "r0f", mesh, vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+gmag = np.linalg.norm(np.asarray(uw.function.evaluate(
+    gradT.sym, rho0.coords)).reshape(-1, 2), axis=1)
+g_lo, g_hi = np.percentile(gmag, 50.0), np.percentile(gmag, 97.0)
+rho0.data[:, 0] = np.clip(
+    (gmag - g_lo) / max(g_hi - g_lo, 1e-30), 0.0, 1.0)
+metric = 1.0 + AMP * rho0.sym[0]
+
+# |∇ρ| = AMP·|∇(normalised|∇T|)| — exactly what M is built from.
+gradR = uw.discretisation.MeshVariable(
+    "gR", mesh, vtype=uw.VarType.VECTOR, degree=1, continuous=True)
+gpr = uw.systems.Vector_Projection(mesh, gradR)
+gpr.smoothing = 0.0
+gpr.uw_function = sympy.Matrix(
+    [metric.diff(Xs[i]) for i in range(2)]).T
+gpr.solve()
+
+bins = np.linspace(r_inner, r_o, 26)
+bc = 0.5 * (bins[1:] + bins[:-1])
+rho_v = np.asarray(uw.function.evaluate(metric, X0)).reshape(-1)
+gradrho_v = np.linalg.norm(np.asarray(uw.function.evaluate(
+    gradR.sym, X0)).reshape(-1, 2), axis=1)
+
+
+def prof(val, rr):
+    return np.array([val[(rr >= bins[i]) & (rr < bins[i + 1])].mean()
+                     if ((rr >= bins[i]) & (rr < bins[i + 1])).any()
+                     else np.nan for i in range(len(bins) - 1)])
+
+
+rho_p = prof(rho_v, r0)
+grho_p = prof(gradrho_v, r0)
+cnt_before = np.array([
+    ((r0 >= bins[i]) & (r0 < bins[i + 1])).sum()
+    for i in range(len(bins) - 1)], dtype=float)
+
+# --- run the mover, recount nodes by radius -----------------------
+smooth_mesh_interior(mesh, metric=metric, method="anisotropic",
+                     verbose=False)
+Xr = np.asarray(mesh.X.coords).copy()
+rr = np.hypot(Xr[:, 0], Xr[:, 1])
+cnt_after = np.array([
+    ((rr >= bins[i]) & (rr < bins[i + 1])).sum()
+    for i in range(len(bins) - 1)], dtype=float)
+
+print(f"{'r':>6} {'rho':>7} {'|grad rho|':>10} {'n_before':>9} "
+      f"{'n_after':>8} {'Δn':>6}")
+for i in range(len(bc)):
+    print(f"{bc[i]:6.3f} {rho_p[i]:7.2f} {grho_p[i]:10.3f} "
+          f"{cnt_before[i]:9.0f} {cnt_after[i]:8.0f} "
+          f"{cnt_after[i]-cnt_before[i]:+6.0f}")
+
+fig, ax = plt.subplots(1, 2, figsize=(15, 5.6))
+a = ax[0]
+a.plot(bc, rho_p, "o-", color="#1f4e8c", label=r"$\rho=1+8\,\hat{|\nabla T|}$ (metric)")
+a.set_xlabel("radius"); a.set_ylabel(r"$\rho$", color="#1f4e8c")
+a.tick_params(axis="y", colors="#1f4e8c")
+a.axvline(r_inner, color="#c0392b", ls="--", lw=1.2,
+          label="inner wall (Lower, no-slip)")
+a.axvline(r_o, color="#e07b00", ls="--", lw=1.2,
+          label="outer wall (Upper, free-slip)")
+a2 = a.twinx()
+a2.plot(bc, grho_p, "s--", color="#c0392b",
+        label=r"$|\nabla\rho|$ — what the mover refines on")
+a2.set_ylabel(r"$|\nabla\rho|$", color="#c0392b")
+a2.tick_params(axis="y", colors="#c0392b")
+a.set_title("ρ peaks AT the inner wall, but |∇ρ| (the real\n"
+            "driver) is ≈0 there — the blob-CORE de-refinement")
+h1, l1 = a.get_legend_handles_labels()
+h2, l2 = a2.get_legend_handles_labels()
+a.legend(h1 + h2, l1 + l2, fontsize=8, loc="upper center")
+a.grid(alpha=0.3)
+
+a = ax[1]
+w = (bins[1] - bins[0]) * 0.4
+a.bar(bc - w / 2, cnt_before, w, color="0.6", label="before")
+a.bar(bc + w / 2, cnt_after, w, color="#1f4e8c", label="after")
+a.axvline(r_inner, color="#c0392b", ls="--", lw=1.2)
+a.axvline(r_o, color="#e07b00", ls="--", lw=1.2)
+a.set_xlabel("radius"); a.set_ylabel("node count in radial bin")
+a.set_title("nodes gather toward the OUTER half / plume region,\n"
+            "NOT the inner BL (its ρ-peak is pinned + ∇ρ≈0)")
+a.legend(fontsize=9)
+a.grid(alpha=0.3)
+fig.suptitle("Inner vs outer BL: why the gradient metric + pinned "
+             "wall refines one and not the other", fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/aniso_bl_asymmetry.png", dpi=130)
+print("saved /tmp/metric_mesh/aniso_bl_asymmetry.png")
diff --git a/scripts/aniso_blob_metric.py b/scripts/aniso_blob_metric.py
new file mode 100644
index 00000000..d44be4ab
--- /dev/null
+++ b/scripts/aniso_blob_metric.py
@@ -0,0 +1,233 @@
+"""What does the (3) TARGET metric look like for the non-separable
+blob, and does the realised mesh match it?
+
+The metric is GRADIENT-based: M = (1/h0²)[I + β ĝĝᵀ(|∇ρ|/gref)²],
+eigen-clamped (shipped default aniso_cap=2, β=200). For a Gaussian
+blob ρ=1+AMP·exp(-|X-P|²/W²):
+  * centre  (∇ρ=0)  → isotropic, coarsest (clamp floor)
+  * flank   (|∇ρ| max @ |X-P|≈W/√2) → finest, anisotropic, short
+                                       axis pointing at P
+  * far     (∇ρ→0)  → isotropic, coarsest
+So it resolves the blob EDGE, not the CORE.
+
+Panels:
+  A  ρ contours + the desired-cell ellipses (the TARGET metric,
+     EXACTLY the construction the shipped mover uses).
+  B  the realised method="anisotropic" mesh (zoom), with the
+     |∇ρ|-max ring drawn — do the small/flat cells sit on it?
+  C  mean edge length & cell-aspect vs distance d=|X-P|: expect a
+     DIP at d≈W (flank) with the centre (d→0) and far field NOT
+     refined — the quantitative form of the question.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+from matplotlib.patches import Ellipse
+from matplotlib.collections import PatchCollection
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_anisotropic, _auto_pinned_labels, _tri_cells,
+    _edge_pairs, _signed_areas)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+PX, PY, W = 0.78, 0.0, 0.10
+BETA, ACAP = 200.0, 2.0          # shipped defaults
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]
+    Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    f = 1.0 + AMP * sympy.exp(
+        -(((x - PX) ** 2 + (y - PY) ** 2) / W ** 2))
+    return m, f, X0.copy()
+
+
+def grad_blob(x, y):
+    d2 = (x - PX) ** 2 + (y - PY) ** 2
+    e = np.exp(-d2 / W ** 2)
+    rho = 1.0 + AMP * e
+    gx = AMP * e * (-2.0 * (x - PX) / W ** 2)
+    gy = AMP * e * (-2.0 * (y - PY) / W ** 2)
+    return rho, np.stack([gx, gy], axis=1)
+
+
+m0, _, X0u = mk("u")
+tris = _tri_cells(m0.dm)
+edges = _edge_pairs(m0.dm)
+ep = _edge_pairs(m0.dm)
+h0 = float(np.linalg.norm(
+    X0u[ep[:, 1]] - X0u[ep[:, 0]], axis=1).mean())   # mover's h0
+
+
+def build_metric(grad):
+    """EXACTLY the shipped mover's D construction (β, aniso_cap,
+    h0, g_eps) — returns desired-cell semi-axes + eigenvectors."""
+    n = grad.shape[0]
+    gn = np.linalg.norm(grad, axis=1)
+    g_eps = 1.0e-9
+    gmax = gn.max()
+    gref = gmax if gmax > g_eps else 1.0
+    base = 1.0 / h0 ** 2
+    lam_lo = 1.0 / h0 ** 2
+    lam_hi = 1.0 / (h0 / np.sqrt(ACAP)) ** 2
+    H = np.empty((n, 2))
+    V = np.zeros((n, 2, 2))
+    for i in range(n):
+        g, gni = grad[i], gn[i]
+        if gni > g_eps and gmax > g_eps:
+            gh = g / gni
+            M = base * (np.eye(2) + BETA * (gni / gref) ** 2
+                        * np.outer(gh, gh))
+        else:
+            M = base * np.eye(2)
+        w, Vec = np.linalg.eigh(M)
+        w = np.clip(w, lam_lo, lam_hi)
+        H[i] = 1.0 / np.sqrt(w)        # desired spacing per axis
+        V[i] = Vec
+    return H, V, gn
+
+
+# --- A: target metric ellipse field on a clean sample grid -------
+gx = np.linspace(PX - 0.34, PX + 0.34, 26)
+gy = np.linspace(PY - 0.34, PY + 0.34, 26)
+GX, GY = np.meshgrid(gx, gy)
+inside = (np.hypot(GX, GY) > R_I + 0.02) & (np.hypot(GX, GY)
+                                            < R_O - 0.02)
+Xs = np.stack([GX[inside], GY[inside]], axis=1)
+rho_s, grad_s = grad_blob(Xs[:, 0], Xs[:, 1])
+H, V, gn = build_metric(grad_s)
+
+# --- run the shipped mover ---------------------------------------
+m, f, X0 = mk("an")
+_winslow_anisotropic(m, f, _auto_pinned_labels(m), True)
+Xan = np.asarray(m.X.coords).copy()
+
+# --- C: profiles vs d = |X-P| ------------------------------------
+def edge_profile(X):
+    p0, p1 = X[edges[:, 0]], X[edges[:, 1]]
+    mid = 0.5 * (p0 + p1)
+    d = np.hypot(mid[:, 0] - PX, mid[:, 1] - PY)
+    L = np.linalg.norm(p1 - p0, axis=1)
+    bins = np.linspace(0.0, 6 * W, 16)
+    bc = 0.5 * (bins[1:] + bins[:-1])
+    out = [L[(d >= bins[i]) & (d < bins[i + 1])].mean()
+           if ((d >= bins[i]) & (d < bins[i + 1])).any() else np.nan
+           for i in range(len(bins) - 1)]
+    return bc, np.array(out)
+
+bc, La = edge_profile(Xan)
+_, Lu = edge_profile(X0u)
+
+# per-cell aspect ratio (longest/shortest edge) vs d, realised
+ca = Xan[tris[:, 0]]
+cb = Xan[tris[:, 1]]
+cc = Xan[tris[:, 2]]
+cen = (ca + cb + cc) / 3.0
+dc = np.hypot(cen[:, 0] - PX, cen[:, 1] - PY)
+e01 = np.linalg.norm(cb - ca, axis=1)
+e12 = np.linalg.norm(cc - cb, axis=1)
+e20 = np.linalg.norm(ca - cc, axis=1)
+asp = np.maximum.reduce([e01, e12, e20]) / np.maximum(
+    np.minimum.reduce([e01, e12, e20]), 1e-30)
+abins = np.linspace(0.0, 6 * W, 16)
+abc = 0.5 * (abins[1:] + abins[:-1])
+aspm = np.array([asp[(dc >= abins[i]) & (dc < abins[i + 1])].mean()
+                 if ((dc >= abins[i]) & (dc < abins[i + 1])).any()
+                 else np.nan for i in range(len(abins) - 1)])
+
+r_gmax = W / np.sqrt(2.0)          # radius of max |∇ρ|
+
+fig = plt.figure(figsize=(20, 6.2))
+gs = fig.add_gridspec(1, 3, width_ratios=[1, 1, 1.05])
+
+# Panel A — target metric
+aA = fig.add_subplot(gs[0, 0])
+rr = np.hypot(X0u[:, 0], X0u[:, 1])
+aA.tricontourf(mtri.Triangulation(X0u[:, 0], X0u[:, 1], tris),
+               1.0 + AMP * np.exp(
+                   -(((X0u[:, 0] - PX) ** 2
+                      + (X0u[:, 1] - PY) ** 2) / W ** 2)),
+               levels=18, cmap="Blues", alpha=0.45)
+pats, cv = [], []
+disp_sc = 0.9
+for i in range(Xs.shape[0]):
+    ang = np.degrees(np.arctan2(V[i, 1, 1], V[i, 0, 1]))
+    pats.append(Ellipse((Xs[i, 0], Xs[i, 1]),
+                         width=2 * disp_sc * H[i, 1],
+                         height=2 * disp_sc * H[i, 0], angle=ang))
+    cv.append(H[i].max() / max(H[i].min(), 1e-30))
+pc = PatchCollection(pats, facecolor="none", lw=0.8,
+                     edgecolor="#c0392b")
+aA.add_collection(pc)
+th = np.linspace(0, 2 * np.pi, 200)
+aA.plot(PX + r_gmax * np.cos(th), PY + r_gmax * np.sin(th),
+        "k--", lw=1.2, label="|∇ρ| max ring")
+aA.plot(PX, PY, "k+", ms=12, mew=2, label="blob centre (∇ρ=0)")
+aA.set_xlim(PX - 0.34, PX + 0.34)
+aA.set_ylim(PY - 0.34, PY + 0.34)
+aA.set_aspect("equal")
+aA.set_xticks([])
+aA.set_yticks([])
+aA.legend(fontsize=8, loc="upper right")
+aA.set_title("A  TARGET metric — desired cells\n(small/flat on the "
+             "edge ring; circular+coarse at centre & far)",
+             fontsize=10)
+
+# Panel B — realised mesh (zoom)
+aB = fig.add_subplot(gs[0, 1])
+aB.triplot(mtri.Triangulation(Xan[:, 0], Xan[:, 1], tris),
+           lw=0.5, color="#1f4e8c")
+aB.plot(PX + r_gmax * np.cos(th), PY + r_gmax * np.sin(th),
+        "k--", lw=1.2)
+aB.plot(PX, PY, "k+", ms=12, mew=2)
+aB.set_xlim(PX - 0.34, PX + 0.34)
+aB.set_ylim(PY - 0.34, PY + 0.34)
+aB.set_aspect("equal")
+aB.set_xticks([])
+aB.set_yticks([])
+aB.set_title("B  realised method=\"anisotropic\" mesh\n(do the "
+             "small/aligned cells sit on the ring?)", fontsize=10)
+
+# Panel C — profiles vs d
+aC = fig.add_subplot(gs[0, 2])
+aC.plot(bc, Lu, "o-", color="0.6", lw=1.6, ms=4,
+        label="undeformed edge len")
+aC.plot(bc, La, "o-", color="#1f4e8c", lw=1.8, ms=4,
+        label="(3) edge len")
+aC.axvline(r_gmax, color="k", ls="--", lw=1.1,
+           label="|∇ρ| max (d=W/√2)")
+aC.axvline(0.0, color="#c0392b", ls=":", lw=1.0)
+aC.set_xlabel("distance from blob centre  d = |X-P|")
+aC.set_ylabel("mean edge length")
+aC.legend(fontsize=8, loc="lower right")
+aC.grid(alpha=0.3)
+aC2 = aC.twinx()
+aC2.plot(abc, aspm, "s--", color="#e07b00", lw=1.4, ms=3,
+         label="(3) cell aspect")
+aC2.set_ylabel("cell aspect ratio", color="#e07b00")
+aC2.tick_params(axis="y", colors="#e07b00")
+aC.set_title("C  edge length & cell aspect vs d\n(min at the edge "
+             "ring; centre & far NOT refined)", fontsize=10)
+
+fig.suptitle("(3) is a GRADIENT metric: it resolves the blob EDGE "
+             "(|∇ρ| ring), not the CORE — by design", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/aniso_blob_metric.png", dpi=130)
+print("saved /tmp/metric_mesh/aniso_blob_metric.png")
+print(f"h0={h0:.4f}  |∇ρ|-max ring radius d=W/√2={r_gmax:.4f}")
+emin = np.nanargmin(La)
+print(f"realised (3) min mean-edge at d≈{bc[emin]:.3f} "
+      f"(ring is d≈{r_gmax:.3f}); centre d≈0 edge="
+      f"{La[0]:.4f} vs undef {Lu[0]:.4f}; "
+      f"far edge={La[-1]:.4f} vs undef {Lu[-1]:.4f}")
diff --git a/scripts/aniso_convection_demo.py b/scripts/aniso_convection_demo.py
new file mode 100644
index 00000000..474b0768
--- /dev/null
+++ b/scripts/aniso_convection_demo.py
@@ -0,0 +1,208 @@
+"""Ra=1e5 annulus convection (20 steps, FIXED mesh) → then refine
+the mesh on the temperature gradient with the (3) anisotropic
+mover.  "First not dynamically adaptive — run for 20 steps, then
+refine."
+
+The metric is the mover-side analogue of UW3's adaptation metric
+`adaptivity.metric_from_gradient`: that maps a *normalised*
+gradient magnitude |∇T| (clipped between a low/high window) to a
+target edge length h ∈ [h_min, h_max] (isotropic M = h⁻²I, because
+MMG can change the node count). Our mover has a FIXED node budget,
+so it does the *relative* analogue: a target *density*
+ρ = 1 + amp · t,  t = clip((|∇T|-g_lo)/(g_hi-g_lo), 0, 1),
+fed as `metric=` (larger ρ ⇒ finer cells). `amp` is the
+"bunching intensity"; the [g_lo,g_hi] window is the same
+percentile idea as metric_from_gradient.
+
+|∇T| is pre-projected to a *scalar* Lagrangian field (rho0) so the
+mover's internal `metric.diff(X)` is a FIRST derivative of a field
+(UW3-legal) — and the metric rides material points (the mover
+requires a Lagrangian metric).
+
+20-step Ra=1e5 state is cached to npz (memory feedback: save
+results, never re-run) — delete it to force a fresh solve.
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing import smooth_mesh_interior
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+
+RES, N_STEPS, RA = 16, 20, 1.0e5
+AMP = 8.0                       # bunching intensity (ρ = 1+AMP·t)
+G_LO_PCT, G_HI_PCT = 50.0, 97.0   # |∇T| normalisation window
+CACHE = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES}_n{N_STEPS}.npz"
+r_inner, r_o = 0.5, 1.0
+
+
+def build():
+    cellsize = 1.0 / RES
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=cellsize, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        "V", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True, varsymbol=r"\mathbf{v}")
+    P = uw.discretisation.MeshVariable(
+        "P", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True, varsymbol="p")
+    t_soln = uw.discretisation.MeshVariable(
+        "T", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True, varsymbol="T")
+    return mesh, r, th, v, P, t_soln, cellsize
+
+
+def run_convection(mesh, r, th, v, P, t_soln, cellsize):
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = RA * (t_soln.sym[0] - T_cond) * unit_r
+
+    adv_diff = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=t_soln, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv_diff.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv_diff.constitutive_model.Parameters.diffusivity = 1.0
+    adv_diff.tolerance = 1.0e-4
+    adv_diff.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv_diff.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+
+    init_t = (0.01 * sympy.sin(5.0 * th)
+              * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+              + (r_o - r) / (r_o - r_inner))
+    t_soln.data[...] = np.asarray(uw.function.evaluate(
+        init_t, t_soln.coords)).reshape(-1, 1)
+    stokes.solve(zero_init_guess=True)
+
+    t_sim = 0.0
+    for s in range(N_STEPS):
+        dt = adv_diff.estimate_dt()
+        adv_diff.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        tt = t_soln.data[:, 0]
+        print(f"  step {s+1:2d}: t={t_sim:.4f} Δt={dt:.2e} "
+              f"T=[{tt.min():+.3f},{tt.max():+.3f}]", flush=True)
+    return t_sim
+
+
+# --- 1. developed convection (cached) ------------------------------
+mesh, r, th, v, P, t_soln, cellsize = build()
+if os.path.exists(CACHE):
+    print(f"loading cached 20-step state {CACHE}")
+    z = np.load(CACHE)
+    t_soln.data[...] = z["T"].reshape(t_soln.data.shape)
+    v.data[...] = z["V"].reshape(v.data.shape)
+else:
+    print(f"=== Ra={RA:.0e} annulus convection, {N_STEPS} steps, "
+          f"res-{RES} (FIXED mesh) ===")
+    run_convection(mesh, r, th, v, P, t_soln, cellsize)
+    np.savez(CACHE, T=np.asarray(t_soln.data),
+             V=np.asarray(v.data),
+             Xc=np.asarray(mesh.X.coords))
+    print(f"cached → {CACHE}")
+
+X_orig = np.asarray(mesh.X.coords).copy()
+tris = _tri_cells(mesh.dm)
+
+# --- 2. metric ρ ∝ normalised |∇T|  (Lagrangian scalar field) ------
+Xs = mesh.CoordinateSystem.X
+gradT = uw.discretisation.MeshVariable(
+    "gradT", mesh, vtype=uw.VarType.VECTOR, degree=1,
+    continuous=True)
+gp = uw.systems.Vector_Projection(mesh, gradT)
+gp.smoothing = 0.0
+gp.uw_function = sympy.Matrix(
+    [t_soln.sym[0].diff(Xs[i]) for i in range(2)]).T
+gp.solve()
+
+rho0 = uw.discretisation.MeshVariable(   # frozen Lagrangian density
+    "rho0", mesh, vtype=uw.VarType.SCALAR, degree=1,
+    continuous=True)
+gmag = np.linalg.norm(
+    np.asarray(uw.function.evaluate(gradT.sym, rho0.coords)
+               ).reshape(-1, 2), axis=1)
+g_lo = np.percentile(gmag, G_LO_PCT)
+g_hi = np.percentile(gmag, G_HI_PCT)
+t_norm = np.clip((gmag - g_lo) / max(g_hi - g_lo, 1e-30), 0.0, 1.0)
+rho0.data[:, 0] = t_norm                       # store normalised t
+metric = 1.0 + AMP * rho0.sym[0]               # ρ = 1 + AMP·t
+rho_field = np.asarray(uw.function.evaluate(
+    metric, X_orig)).reshape(-1)
+print(f"|∇T|: g_lo(p{G_LO_PCT:.0f})={g_lo:.3f} "
+      f"g_hi(p{G_HI_PCT:.0f})={g_hi:.3f}  ρ∈"
+      f"[{rho_field.min():.2f},{rho_field.max():.2f}]")
+
+# --- 3. refine the mesh on the T-gradient metric -------------------
+A0 = np.abs(_signed_areas(X_orig, tris))
+print(f"=== refine: method='anisotropic' on ρ∝|∇T| ===")
+smooth_mesh_interior(mesh, metric=metric, method="anisotropic",
+                     verbose=True)
+X_ref = np.asarray(mesh.X.coords).copy()
+A1 = np.abs(_signed_areas(X_ref, tris))
+print(f"minA/meanA  before={A0.min()/A0.mean():.4f}  "
+      f"after={A1.min()/A1.mean():.4f}  "
+      f"max|Δx|={np.linalg.norm(X_ref-X_orig,axis=1).max():.3e}")
+
+# --- 4. figure -----------------------------------------------------
+# T at the *vertex* coords (t_soln is P3 → its .data has far more
+# DOFs than vertices; the triangulation is on vertices). T rides
+# DOFs Lagrangianly through the move, so the per-vertex value is
+# the same on the original and refined meshes — only the vertex
+# *positions* differ.
+Tn = np.asarray(uw.function.evaluate(
+    t_soln.sym[0], X_orig)).reshape(-1)
+tro = mtri.Triangulation(X_orig[:, 0], X_orig[:, 1], tris)
+trr = mtri.Triangulation(X_ref[:, 0], X_ref[:, 1], tris)
+fig, ax = plt.subplots(2, 2, figsize=(13.5, 13))
+
+a = ax[0, 0]
+a.tricontourf(tro, Tn, levels=24, cmap="inferno")
+a.triplot(tro, lw=0.18, color="white", alpha=0.35)
+a.set_title(f"developed T  (Ra={RA:.0e}, {N_STEPS} steps, "
+            f"FIXED mesh res-{RES})", fontsize=11)
+
+a = ax[0, 1]
+cf = a.tricontourf(tro, rho_field, levels=24, cmap="viridis")
+a.set_title(r"target metric  $\rho = 1 + %g\,\hat{|\nabla T|}$  "
+            "(what we refine on)" % AMP, fontsize=11)
+fig.colorbar(cf, ax=a, fraction=0.046, pad=0.02)
+
+a = ax[1, 0]
+a.triplot(trr, lw=0.4, color="#1f4e8c")
+a.set_title(f"refined mesh  (anisotropic mover; minA/meanA "
+            f"{A0.min()/A0.mean():.2f}→{A1.min()/A1.mean():.2f})",
+            fontsize=11)
+
+a = ax[1, 1]
+a.tricontourf(trr, Tn, levels=24, cmap="inferno")
+a.triplot(trr, lw=0.3, color="white", alpha=0.4)
+a.set_title("refined mesh + T  (cells gather along the thermal "
+            "BLs / plume edges)", fontsize=11)
+
+for a in ax.ravel():
+    a.set_aspect("equal")
+    a.set_xticks([])
+    a.set_yticks([])
+fig.suptitle("Ra=1e5 annulus convection → refine on ∇T with the "
+             "(3) anisotropic mover (fixed node budget)",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+out = "/tmp/metric_mesh/aniso_convection.png"
+fig.savefig(out, dpi=130)
+print(f"saved {out}")
diff --git a/scripts/aniso_convection_freeslip.py b/scripts/aniso_convection_freeslip.py
new file mode 100644
index 00000000..38b718c2
--- /dev/null
+++ b/scripts/aniso_convection_freeslip.py
@@ -0,0 +1,209 @@
+"""Controlled test: SYMMETRIC velocity BC — free-slip on BOTH
+boundaries (release the inner no-slip), nullspaces lit up
+(constant pressure + the annulus rigid-rotation mode (-y,x)).
+Warm-start T from the res-16 run, let convection develop for a
+while, then refine on ∇T with the (3) anisotropic mover.
+
+If the inner/outer BL gathering becomes symmetric (cf.
+aniso_bl_asymmetry.png with no-slip inner) → the velocity BC was
+the dominant cause; any residual asymmetry is the annulus
+geometry (inner circumference π vs outer 2π) + the gradient-metric
+/ pinned-wall effect.
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing import smooth_mesh_interior
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+
+RA, AMP, RES16, RES = 1.0e5, 8.0, 16, 32
+N_RUN = 25
+G_LO_PCT, G_HI_PCT = 50.0, 97.0
+r_inner, r_o = 0.5, 1.0
+C16 = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES16}_n20.npz"
+CFS = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES}_freeslip{N_RUN}.npz"
+
+
+def build(res, tag):
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / res, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        f"V{tag}", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        f"P{tag}", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T = uw.discretisation.MeshVariable(
+        f"T{tag}", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    return mesh, r, th, v, P, T
+
+
+def make_solvers(mesh, r, v, P, T):
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    # FREE-SLIP on BOTH boundaries: no-penetration penalty on the
+    # NORMAL component only (tangential / rotation free) — releases
+    # the inner no-slip.
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    # Nullspaces lit up EXPLICITLY: constant pressure + the annulus
+    # rigid-rotation mode (-y,x) (= r·e_θ; an exact null mode since
+    # the BC penalises only the normal velocity and the radial
+    # buoyancy has zero torque).
+    x, y = mesh.CoordinateSystem.X
+    stokes.petsc_use_pressure_nullspace = True
+    stokes.petsc_velocity_nullspace_basis = [sympy.Matrix([-y, x])]
+    # Linear Stokes ⇒ the Newton line search is unnecessary and
+    # spuriously fails (DIVERGED_LINE_SEARCH @0 iters) on the
+    # correctly-singular nullspace-bearing operator. Take the full
+    # step.
+    stokes.petsc_options["snes_linesearch_type"] = "basic"
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = RA * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    return stokes, adv
+
+
+mesh, r, th, v, P, T = build(RES, "fs")
+if os.path.exists(CFS):
+    print(f"loading cached free-slip state {CFS}")
+    z = np.load(CFS)
+    T.data[...] = z["T"].reshape(T.data.shape)
+    v.data[...] = z["V"].reshape(v.data.shape)
+    stokes, adv = make_solvers(mesh, r, v, P, T)
+else:
+    if not os.path.exists(C16):
+        raise SystemExit(f"missing res-16 cache {C16}")
+    print(f"warm-start res-{RES} from res-{RES16}; free-slip BOTH "
+          f"+ nullspaces; {N_RUN} steps")
+    m16, r16, t16, v16, P16, T16 = build(RES16, "16")
+    z16 = np.load(C16)
+    T16.data[...] = z16["T"].reshape(T16.data.shape)
+    T.data[:, 0] = np.asarray(uw.function.evaluate(
+        T16.sym[0], T.coords)).reshape(-1)
+    stokes, adv = make_solvers(mesh, r, v, P, T)
+    stokes.solve(zero_init_guess=True)
+    t_sim = 0.0
+    for s in range(N_RUN):
+        dt = adv.estimate_dt()
+        adv.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        tt = T.data[:, 0]
+        if (s + 1) % 5 == 0 or s == 0:
+            print(f"  step {s+1:2d}: Δt={dt:.2e} "
+                  f"T=[{tt.min():+.3f},{tt.max():+.3f}]", flush=True)
+    np.savez(CFS, T=np.asarray(T.data), V=np.asarray(v.data))
+    print(f"cached → {CFS}")
+
+X0 = np.asarray(mesh.X.coords).copy()
+r0 = np.hypot(X0[:, 0], X0[:, 1])
+tris = _tri_cells(mesh.dm)
+
+# metric ρ ∝ normalised |∇T|
+Xs = mesh.CoordinateSystem.X
+gradT = uw.discretisation.MeshVariable(
+    "gTfs", mesh, vtype=uw.VarType.VECTOR, degree=1,
+    continuous=True)
+gp = uw.systems.Vector_Projection(mesh, gradT)
+gp.smoothing = 0.0
+gp.uw_function = sympy.Matrix(
+    [T.sym[0].diff(Xs[i]) for i in range(2)]).T
+gp.solve()
+rho0 = uw.discretisation.MeshVariable(
+    "r0fs", mesh, vtype=uw.VarType.SCALAR, degree=1,
+    continuous=True)
+gmag = np.linalg.norm(np.asarray(uw.function.evaluate(
+    gradT.sym, rho0.coords)).reshape(-1, 2), axis=1)
+g_lo, g_hi = np.percentile(gmag, G_LO_PCT), np.percentile(
+    gmag, G_HI_PCT)
+rho0.data[:, 0] = np.clip(
+    (gmag - g_lo) / max(g_hi - g_lo, 1e-30), 0.0, 1.0)
+metric = 1.0 + AMP * rho0.sym[0]
+rho_field = np.asarray(uw.function.evaluate(
+    metric, X0)).reshape(-1)
+
+A0 = np.abs(_signed_areas(X0, tris))
+bins = np.linspace(r_inner, r_o, 26)
+cnt_before = np.array([((r0 >= bins[i]) & (r0 < bins[i + 1])).sum()
+                       for i in range(len(bins) - 1)], dtype=float)
+print("=== refine: free-slip-both, method='anisotropic' ===")
+smooth_mesh_interior(mesh, metric=metric, method="anisotropic",
+                     verbose=False)
+Xr = np.asarray(mesh.X.coords).copy()
+rr = np.hypot(Xr[:, 0], Xr[:, 1])
+A1 = np.abs(_signed_areas(Xr, tris))
+cnt_after = np.array([((rr >= bins[i]) & (rr < bins[i + 1])).sum()
+                      for i in range(len(bins) - 1)], dtype=float)
+bc = 0.5 * (bins[1:] + bins[:-1])
+print(f"minA/meanA {A0.min()/A0.mean():.3f}→{A1.min()/A1.mean():.3f}"
+      f"  inner-BL Δn(r<0.6)={cnt_after[bc<0.6].sum()-cnt_before[bc<0.6].sum():+.0f}"
+      f"  outer-BL Δn(r>0.9)={cnt_after[bc>0.9].sum()-cnt_before[bc>0.9].sum():+.0f}")
+
+# figure
+Tn = np.asarray(uw.function.evaluate(T.sym[0], X0)).reshape(-1)
+tro = mtri.Triangulation(X0[:, 0], X0[:, 1], tris)
+trr = mtri.Triangulation(Xr[:, 0], Xr[:, 1], tris)
+fig, ax = plt.subplots(2, 2, figsize=(13.5, 13))
+a = ax[0, 0]
+a.tricontourf(tro, Tn, levels=24, cmap="inferno")
+a.triplot(tro, lw=0.15, color="white", alpha=0.3)
+a.set_title(f"T  (free-slip BOTH, nullspaces lit; res-{RES}, "
+            f"{N_RUN} steps from res-{RES16})", fontsize=11)
+a = ax[0, 1]
+cf = a.tricontourf(tro, rho_field, levels=24, cmap="viridis")
+a.set_title(r"metric  $\rho=1+%g\,\hat{|\nabla T|}$" % AMP,
+            fontsize=11)
+fig.colorbar(cf, ax=a, fraction=0.046, pad=0.02)
+a = ax[1, 0]
+a.triplot(trr, lw=0.35, color="#1f4e8c")
+a.set_title(f"refined mesh  (minA/meanA "
+            f"{A0.min()/A0.mean():.2f}→{A1.min()/A1.mean():.2f})",
+            fontsize=11)
+a = ax[1, 1]
+w = (bins[1] - bins[0]) * 0.4
+a.bar(bc - w / 2, cnt_before, w, color="0.6", label="before")
+a.bar(bc + w / 2, cnt_after, w, color="#1f4e8c", label="after")
+a.axvline(r_inner, color="#c0392b", ls="--", lw=1.2,
+          label="inner (free-slip now)")
+a.axvline(r_o, color="#e07b00", ls="--", lw=1.2,
+          label="outer (free-slip)")
+a.set_xlabel("radius")
+a.set_ylabel("node count")
+a.set_title("radial node gather — symmetric now?")
+a.legend(fontsize=8)
+a.grid(alpha=0.3)
+for a in (ax[0, 0], ax[0, 1], ax[1, 0]):
+    a.set_aspect("equal")
+    a.set_xticks([])
+    a.set_yticks([])
+fig.suptitle("Free-slip BOTH boundaries (inner no-slip released, "
+             "nullspaces lit) → convection → refine on ∇T",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+out = "/tmp/metric_mesh/aniso_convection_freeslip.png"
+fig.savefig(out, dpi=130)
+print(f"saved {out}")
diff --git a/scripts/aniso_convection_res32.py b/scripts/aniso_convection_res32.py
new file mode 100644
index 00000000..0d942044
--- /dev/null
+++ b/scripts/aniso_convection_res32.py
@@ -0,0 +1,181 @@
+"""res-32 annulus convection, WARM-STARTED from the cached res-16
+20-step state, run a few steps, then refine on ∇T with the (3)
+anisotropic mover ("point snuggling").
+
+Warm start: rebuild the res-16 P3 T field from the cached npz,
+interpolate it onto the res-32 T nodes (uw.function.evaluate
+across meshes — same annulus geometry), re-solve Stokes for a
+consistent velocity, then run N_WARM adv-diff + Stokes steps so
+the field settles on the finer mesh. res-32 post-warm state is
+itself cached (save results, never re-run).
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing import smooth_mesh_interior
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+
+RA, AMP = 1.0e5, 8.0
+RES16, RES32 = 16, 32
+N_WARM = 5
+G_LO_PCT, G_HI_PCT = 50.0, 97.0
+r_inner, r_o = 0.5, 1.0
+C16 = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES16}_n20.npz"
+C32 = f"/tmp/metric_mesh/conv_ra{RA:.0e}_res{RES32}_warm{N_WARM}.npz"
+
+
+def build(res, tag):
+    mesh = uw.meshing.Annulus(
+        radiusOuter=r_o, radiusInner=r_inner,
+        cellSize=1.0 / res, qdegree=3)
+    r, th = mesh.CoordinateSystem.R
+    v = uw.discretisation.MeshVariable(
+        f"V{tag}", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True, varsymbol=r"\mathbf{v}")
+    P = uw.discretisation.MeshVariable(
+        f"P{tag}", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True, varsymbol="p")
+    T = uw.discretisation.MeshVariable(
+        f"T{tag}", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True, varsymbol="T")
+    return mesh, r, th, v, P, T
+
+
+def make_solvers(mesh, r, v, P, T):
+    stokes = uw.systems.Stokes(mesh, velocityField=v,
+                               pressureField=P)
+    stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    stokes.constitutive_model.Parameters.shear_viscosity_0 = 1.0
+    stokes.tolerance = 1.0e-5
+    stokes.penalty = 0.0
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    stokes.add_natural_bc(1.0e6 * v.sym.dot(unit_r) * unit_r,
+                          mesh.boundaries.Upper.name)
+    T_cond = (r_o - r) / (r_o - r_inner)
+    stokes.bodyforce = RA * (T.sym[0] - T_cond) * unit_r
+    adv = uw.systems.AdvDiffusionSLCN(
+        mesh, u_Field=T, V_fn=v.sym, verbose=False,
+        theta=0.5, monotone_mode="clamp")
+    adv.constitutive_model = uw.constitutive_models.DiffusionModel
+    adv.constitutive_model.Parameters.diffusivity = 1.0
+    adv.tolerance = 1.0e-4
+    adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+    adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+    return stokes, adv
+
+
+# --- res-32 warm-started state (cached) ----------------------------
+mesh, r, th, v, P, T = build(RES32, "32")
+if os.path.exists(C32):
+    print(f"loading cached res-32 warm state {C32}")
+    z = np.load(C32)
+    T.data[...] = z["T"].reshape(T.data.shape)
+    v.data[...] = z["V"].reshape(v.data.shape)
+    stokes, adv = make_solvers(mesh, r, v, P, T)
+else:
+    if not os.path.exists(C16):
+        raise SystemExit(f"missing res-16 cache {C16} — run "
+                         f"aniso_convection_demo.py first")
+    print(f"warm-starting res-{RES32} from res-{RES16} ({C16})")
+    m16, r16, th16, v16, P16, T16 = build(RES16, "16")
+    z16 = np.load(C16)
+    T16.data[...] = z16["T"].reshape(T16.data.shape)
+    # interpolate the res-16 T onto the res-32 T nodes
+    T.data[:, 0] = np.asarray(uw.function.evaluate(
+        T16.sym[0], T.coords)).reshape(-1)
+    stokes, adv = make_solvers(mesh, r, v, P, T)
+    stokes.solve(zero_init_guess=True)         # consistent V
+    t_sim = 0.0
+    print(f"=== res-{RES32} warm run, {N_WARM} steps ===")
+    for s in range(N_WARM):
+        dt = adv.estimate_dt()
+        adv.solve(timestep=dt, zero_init_guess=False)
+        stokes.solve(zero_init_guess=False)
+        t_sim += dt
+        tt = T.data[:, 0]
+        print(f"  step {s+1}: Δt={dt:.2e} "
+              f"T=[{tt.min():+.3f},{tt.max():+.3f}]", flush=True)
+    np.savez(C32, T=np.asarray(T.data), V=np.asarray(v.data))
+    print(f"cached → {C32}")
+
+X_orig = np.asarray(mesh.X.coords).copy()
+tris = _tri_cells(mesh.dm)
+
+# --- metric ρ ∝ normalised |∇T| (Lagrangian scalar field) ----------
+Xs = mesh.CoordinateSystem.X
+gradT = uw.discretisation.MeshVariable(
+    "gradT32", mesh, vtype=uw.VarType.VECTOR, degree=1,
+    continuous=True)
+gp = uw.systems.Vector_Projection(mesh, gradT)
+gp.smoothing = 0.0
+gp.uw_function = sympy.Matrix(
+    [T.sym[0].diff(Xs[i]) for i in range(2)]).T
+gp.solve()
+rho0 = uw.discretisation.MeshVariable(
+    "rho032", mesh, vtype=uw.VarType.SCALAR, degree=1,
+    continuous=True)
+gmag = np.linalg.norm(np.asarray(uw.function.evaluate(
+    gradT.sym, rho0.coords)).reshape(-1, 2), axis=1)
+g_lo = np.percentile(gmag, G_LO_PCT)
+g_hi = np.percentile(gmag, G_HI_PCT)
+rho0.data[:, 0] = np.clip(
+    (gmag - g_lo) / max(g_hi - g_lo, 1e-30), 0.0, 1.0)
+metric = 1.0 + AMP * rho0.sym[0]
+rho_field = np.asarray(uw.function.evaluate(
+    metric, X_orig)).reshape(-1)
+
+# --- refine ("point snuggling") ------------------------------------
+A0 = np.abs(_signed_areas(X_orig, tris))
+print("=== refine res-32: method='anisotropic' on ρ∝|∇T| ===")
+smooth_mesh_interior(mesh, metric=metric, method="anisotropic",
+                     verbose=True)
+X_ref = np.asarray(mesh.X.coords).copy()
+A1 = np.abs(_signed_areas(X_ref, tris))
+print(f"minA/meanA  before={A0.min()/A0.mean():.4f}  "
+      f"after={A1.min()/A1.mean():.4f}  "
+      f"max|Δx|={np.linalg.norm(X_ref-X_orig,axis=1).max():.3e}")
+
+# --- figure --------------------------------------------------------
+Tn = np.asarray(uw.function.evaluate(
+    T.sym[0], X_orig)).reshape(-1)
+tro = mtri.Triangulation(X_orig[:, 0], X_orig[:, 1], tris)
+trr = mtri.Triangulation(X_ref[:, 0], X_ref[:, 1], tris)
+fig, ax = plt.subplots(2, 2, figsize=(13.5, 13))
+a = ax[0, 0]
+a.tricontourf(tro, Tn, levels=24, cmap="inferno")
+a.triplot(tro, lw=0.15, color="white", alpha=0.3)
+a.set_title(f"T  (Ra={RA:.0e}, res-{RES32} warm-started from "
+            f"res-{RES16}, +{N_WARM} steps)", fontsize=11)
+a = ax[0, 1]
+cf = a.tricontourf(tro, rho_field, levels=24, cmap="viridis")
+a.set_title(r"target metric  $\rho = 1+%g\,\hat{|\nabla T|}$"
+            % AMP, fontsize=11)
+fig.colorbar(cf, ax=a, fraction=0.046, pad=0.02)
+a = ax[1, 0]
+a.triplot(trr, lw=0.35, color="#1f4e8c")
+a.set_title(f"refined res-{RES32} mesh  (minA/meanA "
+            f"{A0.min()/A0.mean():.2f}→{A1.min()/A1.mean():.2f})",
+            fontsize=11)
+a = ax[1, 1]
+a.tricontourf(trr, Tn, levels=24, cmap="inferno")
+a.triplot(trr, lw=0.25, color="white", alpha=0.35)
+a.set_title("refined mesh + T  (points snuggle into the BLs / "
+            "plume edges)", fontsize=11)
+for a in ax.ravel():
+    a.set_aspect("equal")
+    a.set_xticks([])
+    a.set_yticks([])
+fig.suptitle(f"res-{RES32} convection (warm from res-{RES16}) → "
+             f"refine on ∇T with the (3) anisotropic mover",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+out = "/tmp/metric_mesh/aniso_convection_res32.png"
+fig.savefig(out, dpi=130)
+print(f"saved {out}")
diff --git a/scripts/aniso_cost_and_gamg.py b/scripts/aniso_cost_and_gamg.py
new file mode 100644
index 00000000..a3d340bb
--- /dev/null
+++ b/scripts/aniso_cost_and_gamg.py
@@ -0,0 +1,103 @@
+"""(3) mover — GAMG parity + cost-per-step characterisation.
+
+Two extensibility questions:
+
+1. GAMG.  The mover is NON-singular (homogeneous Dirichlet, no
+   constant nullspace), so unlike the MA pure-Neumann path GAMG
+   should be robust here. Check grading/quality PARITY (gamg must
+   match direct) and cost across resolutions.
+2. Cost per step.  Decompose into
+     * COLD  — fresh mesh: MeshVariable + solver creation + 1st
+               factorisation (one-off per remesh / topology change)
+     * WARM  — same mesh object again (cache hit): the genuine
+               per-timestep cost in a dynamic-adaptive loop
+     * D-build (gproj solve + the per-node eigen-clamp) vs the
+       per-outer-step displacement solves (n_outer=1 vs default
+       slope).
+   Scales with #triangles (res 16→48) → the parallel / 3D
+   extrapolation.
+
+Interior radial feature (clean cap=2 regime). minA/meanA is the
+parity check.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_anisotropic, _auto_pinned_labels, _tri_cells,
+    _signed_areas, _edge_pairs)
+
+R_O, R_I, WIDTH, AMP, PEAK = 1.0, 0.5, 0.12, 8.0, 0.70
+
+
+def case(tag, res):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - PEAK) / WIDTH) ** 2)
+    return m, f
+
+
+def quality(m, tris):
+    A = np.abs(_signed_areas(np.asarray(m.X.coords), tris))
+    return A.min() / A.mean()
+
+
+print(f"{'res':>4} {'ntri':>6} {'solver':>7} | {'cold':>7} "
+      f"{'warm':>7} {'warm/out':>9} {'Dbuild':>7} | "
+      f"{'minA/meanA':>10}")
+print("-" * 74)
+
+for res in (16, 24, 32, 48):
+    m_probe, _ = case("probe", res)
+    ntri = _tri_cells(m_probe.dm).shape[0]
+    del m_probe
+    row = {}
+    for solver in ("direct", "gamg"):
+        # COLD: fresh mesh, full setup, n_outer=12 (default)
+        m, f = case(f"{solver}_c{res}", res)
+        tris = _tri_cells(m.dm)
+        pin = _auto_pinned_labels(m)
+        X_und = np.asarray(m.X.coords).copy()   # undeformed coords
+        t0 = time.perf_counter()
+        _winslow_anisotropic(m, f, pin, False,
+                             linear_solver=solver)
+        t_cold = time.perf_counter() - t0
+        mA = quality(m, tris)
+        # WARM: SAME mesh object again (cache hit) — the real
+        # per-adaptation-step cost in a dynamic loop. Restore the
+        # undeformed coords so it does identical work.
+        m._deform_mesh(X_und.copy())
+        t0 = time.perf_counter()
+        _winslow_anisotropic(m, f, pin, False,
+                             linear_solver=solver)
+        t_warm = time.perf_counter() - t0
+        # n_outer=1 on the warm cache → fixed (D-build + 1 solve)
+        m._deform_mesh(X_und.copy())
+        t0 = time.perf_counter()
+        _winslow_anisotropic(m, f, pin, False, n_outer=1,
+                             linear_solver=solver)
+        t_n1 = time.perf_counter() - t0
+        per_out = (t_warm - t_n1) / 11.0          # 12 vs 1 slope
+        row[solver] = (t_cold, t_warm, per_out, t_n1, mA)
+        print(f"{res:4d} {ntri:6d} {solver:>7} | {t_cold:7.2f} "
+              f"{t_warm:7.2f} {per_out:9.3f} {t_n1:7.2f} | "
+              f"{mA:10.4f}")
+    d = row["direct"]
+    g = row["gamg"]
+    print(f"{'':>4} {'':>6} {'Δ(g-d)':>7} | "
+          f"{g[0]-d[0]:+7.2f} {g[1]-d[1]:+7.2f} "
+          f"{g[2]-d[2]:+9.3f} {'':>7} | "
+          f"parity |minA_g-minA_d|={abs(g[4]-d[4]):.2e}")
+
+print("\n(cold = one-off per remesh: var+solver creation + 1st "
+      "factorisation;\n warm = per-timestep cost in a dynamic "
+      "loop = D-build + n_outer solves;\n warm/out = marginal cost "
+      "of one extra MMPDE outer step;\n Dbuild ≈ n_outer=1 warm "
+      "= gproj ∇ρ solve + per-node eigen-clamp + 1 disp solve.)")
diff --git a/scripts/aniso_movie.py b/scripts/aniso_movie.py
new file mode 100644
index 00000000..7cbdb1bb
--- /dev/null
+++ b/scripts/aniso_movie.py
@@ -0,0 +1,99 @@
+"""Movie of the mesh adapting during a saturation run. One frame
+per checkpoint (dense ckpt_every=5 ⇒ one per adaptation event):
+P3 T on its own DOF cloud + the deformed-mesh edges, RdBu_r /
+white bg / lighting off (the free-surface viz convention).
+Reads whatever checkpoints exist (partial OK — re-run as more
+land). Writes a GIF (always) + MP4 if ffmpeg is present.
+
+  python scripts/aniso_movie.py [tag=a16x] [fps=8]
+"""
+import sys
+import os
+import glob
+import re
+import shutil
+import subprocess
+import numpy as np
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+TAG = sys.argv[1] if len(sys.argv) > 1 else "a16x"
+FPS = int(sys.argv[2]) if len(sys.argv) > 2 else 8
+D = "/tmp/metric_mesh/sat"
+OUT = f"{D}/aniso_{TAG}_movie"
+
+
+def ckpts(tag):
+    ix = []
+    for f in glob.glob(f"{D}/sat_{tag}.mesh.T.*.h5"):
+        m = re.search(r"\.mesh\.T\.(\d+)\.h5$", os.path.basename(f))
+        if m:
+            ix.append(int(m.group(1)))
+    return sorted(ix)
+
+
+idx = ckpts(TAG)
+if not idx:
+    print(f"no checkpoints for sat_{TAG} yet")
+    sys.exit(0)
+print(f"{TAG}: {len(idx)} checkpoints {idx[0]}..{idx[-1]}")
+
+pv.OFF_SCREEN = True
+FRD = f"{D}/_frames_{TAG}"
+os.makedirs(FRD, exist_ok=True)
+for old in glob.glob(f"{FRD}/f*.png"):
+    os.remove(old)
+nfr = 0
+for fi, k in enumerate(idx):
+    m = uw.discretisation.Mesh(f"{D}/sat_{TAG}.mesh.{k:05}.h5")
+    Tv = uw.discretisation.MeshVariable(
+        "T", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    Tv.read_timestep(f"sat_{TAG}", "T", k, outputPath=D)
+    pv_T = vis.meshVariable_to_pv_mesh_object(Tv)
+    pv_T.point_data["T"] = np.asarray(Tv.data[:, 0])
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl = pv.Plotter(off_screen=True, window_size=(1100, 1100))
+    pl.set_background("white")
+    pl.add_text(f"{TAG}   adaptation checkpoint {k}",
+                font_size=14, color="black")
+    pl.add_mesh(pv_T, scalars="T", cmap="RdBu_r", clim=(0.0, 1.0),
+                show_edges=False, lighting=False,
+                show_scalar_bar=True,
+                scalar_bar_args=dict(title="T", color="black"))
+    pl.add_mesh(edges, color="#202020", line_width=0.7,
+                lighting=False)
+    pl.view_xy()
+    pl.camera.zoom(1.35)
+    pl.screenshot(f"{FRD}/f{fi:04d}.png")
+    pl.close()
+    nfr += 1
+    print(f"  frame {fi} (ckpt {k})", flush=True)
+
+if not shutil.which("ffmpeg"):
+    print(f"frames in {FRD} ({nfr}); ffmpeg not found — "
+          f"no video assembled")
+    sys.exit(0)
+mp4 = OUT + ".mp4"
+subprocess.run(
+    ["ffmpeg", "-y", "-framerate", str(FPS), "-i",
+     f"{FRD}/f%04d.png", "-c:v", "libx264", "-pix_fmt",
+     "yuv420p", "-vf", "pad=ceil(iw/2)*2:ceil(ih/2)*2", mp4],
+    check=True, capture_output=True)
+print(f"saved {mp4}  ({nfr} frames @ {FPS} fps)")
+# also a moderate-size gif for quick inline viewing (ffmpeg
+# palette: clean colours)
+pal = f"{FRD}/_pal.png"
+gif = OUT + ".gif"
+subprocess.run(
+    ["ffmpeg", "-y", "-framerate", str(FPS), "-i",
+     f"{FRD}/f%04d.png", "-vf",
+     f"fps={FPS},scale=720:-1:flags=lanczos,palettegen", pal],
+    check=True, capture_output=True)
+subprocess.run(
+    ["ffmpeg", "-y", "-framerate", str(FPS), "-i",
+     f"{FRD}/f%04d.png", "-i", pal, "-lavfi",
+     f"fps={FPS},scale=720:-1:flags=lanczos[x];[x][1:v]"
+     "paletteuse", gif],
+    check=True, capture_output=True)
+print(f"saved {gif}")
diff --git a/scripts/aniso_param_sweep.py b/scripts/aniso_param_sweep.py
new file mode 100644
index 00000000..d14a33a4
--- /dev/null
+++ b/scripts/aniso_param_sweep.py
@@ -0,0 +1,74 @@
+"""Characterise the anisotropic mover: sweep beta / aniso_cap /
+relax on the interior radial feature (PEAK=0.70) and report
+minA/meanA + rim-radial. Is there a stable regime, or is the
+decoupled direct-Winslow form structurally folding-prone here?
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_anisotropic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES, AMP, PEAK = 1.0, 0.5, 0.12, 16, 8.0, 0.70
+
+
+def case(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - PEAK) / WIDTH) ** 2)
+    return m, f
+
+
+m0, _ = case("u")
+edges = _edge_pairs(m0.dm)
+tris = _tri_cells(m0.dm)
+X0 = np.asarray(m0.X.coords).copy()
+A0 = np.abs(_signed_areas(X0, tris))
+print(f"undeformed minA/meanA = {A0.min()/A0.mean():.4f}")
+
+
+def split(coords):
+    p0, p1 = coords[edges[:, 0]], coords[edges[:, 1]]
+    mid = 0.5 * (p0 + p1)
+    rm = np.linalg.norm(mid, axis=1)
+    rh = mid / np.maximum(rm, 1e-30)[:, None]
+    ev = p1 - p0
+    L = np.linalg.norm(ev, axis=1)
+    fr = np.abs((ev * rh).sum(axis=1)) / np.maximum(L, 1e-30)
+    return rm, L, fr > np.cos(np.pi / 4)
+
+
+print(f"\n{'beta':>6} {'cap':>4} {'relax':>5} {'nout':>4} | "
+      f"{'minA/meanA':>10} {'rim-rad/dr0':>11} {'max|dx|':>9}")
+print("-" * 60)
+i = 0
+for beta in (5.0, 20.0, 50.0, 200.0):
+    for cap in (2.0, 4.0, 8.0):
+        for relax in (0.2, 0.4):
+            i += 1
+            m, f = case(f"s{i}")
+            pin = _auto_pinned_labels(m)
+            try:
+                _winslow_anisotropic(
+                    m, f, pin, False, beta=beta, aniso_cap=cap,
+                    relax=relax, n_outer=12)
+            except Exception as e:
+                print(f"{beta:6.0f} {cap:4.0f} {relax:5.2f}  ERR {e}")
+                continue
+            X = np.asarray(m.X.coords).copy()
+            A = np.abs(_signed_areas(X, tris))
+            mA = A.min() / A.mean()
+            rm, L, isr = split(X)
+            rim = rm > (R_O - 0.06)
+            rr = L[rim & isr].mean() if (rim & isr).any() else np.nan
+            dr0 = (R_O - R_I) / RES
+            dx = float(np.linalg.norm(X - X0, axis=1).max())
+            print(f"{beta:6.0f} {cap:4.0f} {relax:5.2f} {12:4d} | "
+                  f"{mA:10.4f} {rr/dr0:11.3f} {dx:9.4f}")
diff --git a/scripts/aniso_smoke.py b/scripts/aniso_smoke.py
new file mode 100644
index 00000000..f3a60040
--- /dev/null
+++ b/scripts/aniso_smoke.py
@@ -0,0 +1,57 @@
+"""Smoke test for the (3) anisotropic tensor mover.
+
+Radial Gaussian feature on a res-16 Annulus. Confirms: runs, the
+mesh stays valid (no inverted cell), AMP=0 is an exact no-op, and
+the move is non-trivial for AMP>0. Anisotropy-aware numbers + a
+render come in the proper validation script; this is plumbing only.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_anisotropic, _auto_pinned_labels, _tri_cells,
+    _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+
+
+def case(amp, tag, **kw):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords).copy()
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    pin = _auto_pinned_labels(m)
+    tris = _tri_cells(m.dm)
+    a0 = _signed_areas(X0, tris)
+    orient = np.sign(np.median(a0)) or 1.0
+    t = time.perf_counter()
+    _winslow_anisotropic(m, f, pin, True, **kw)
+    dt = time.perf_counter() - t
+    X1 = np.asarray(m.X.coords).copy()
+    a1 = _signed_areas(X1, tris) * orient
+    moved = float(np.linalg.norm(X1 - X0, axis=1).max())
+    valid = bool(a1.min() > 0.0)
+    print(f"[{tag}] amp={amp:5.1f} kw={kw}  time={dt:5.2f}s  "
+          f"max|Δx|={moved:.4e}  minA*orient={a1.min():.3e}  "
+          f"valid={valid}")
+    return moved, valid
+
+
+print("=== anisotropic mover smoke test (res-16 Annulus) ===")
+m0, v0 = case(0.0, "amp0", n_outer=3)
+assert v0, "AMP=0 produced an invalid mesh"
+assert m0 < 1e-9, f"AMP=0 must be an exact no-op, got max|Δx|={m0:.2e}"
+print("  -> AMP=0 exact no-op OK")
+
+m8, v8 = case(8.0, "amp8", n_outer=5)
+assert v8, "AMP=8 produced an invalid (tangled) mesh"
+assert m8 > 1e-4, f"AMP=8 should move nodes, got {m8:.2e}"
+print("  -> AMP=8 moves nodes, mesh valid OK")
+
+print("smoke test PASSED")
diff --git a/scripts/aniso_validate_angular.py b/scripts/aniso_validate_angular.py
new file mode 100644
index 00000000..8cc905c4
--- /dev/null
+++ b/scripts/aniso_validate_angular.py
@@ -0,0 +1,142 @@
+"""(3) anisotropic mover — angular-feature validation.
+
+Angular-only feature ρ(θ)=1+AMP/(1+(Δθ/Wθ)²), constant in r
+(res-24 Annulus, AMP=8) — the SAME problem as
+scripts/ma_angular_ot_target.py. The exact 1-D angular OT (radius
+untouched) is the gold-standard TARGET. The settled result: the
+*scalar* BFO (_winslow_elliptic) produces ≈ZERO angular
+concentration (far/near≈1.0) — a structural dead end for a scalar
+potential. Question for (3): the metric tensor built from the
+*tangential* ∇ρ elongates cells radially (short ⟂ θ), so does the
+tensor mover generate genuine angular concentration where the
+scalar one cannot?
+
+Metrics from ma_angular_ot_target: far/near edge ratio, fraction
+of nodes within Wθ of θ0, minA/meanA, max radial drift. Grids
+rendered.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _winslow_anisotropic, _winslow_spring,
+    _edge_pairs, _auto_pinned_labels, _tri_cells, _signed_areas)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+TH0, WTH = 0.6, 0.50
+
+
+def wrap(a):
+    return np.arctan2(np.sin(a), np.cos(a))
+
+
+_g = np.linspace(-np.pi, np.pi, 200_000)
+_rho = 1.0 + AMP / (1.0 + (wrap(_g - TH0) / WTH) ** 2)
+_M = np.concatenate(
+    [[0.0], np.cumsum(0.5 * (_rho[1:] + _rho[:-1]) * np.diff(_g))])
+_M /= _M[-1]
+
+
+def T(theta):
+    u = (wrap(theta) + np.pi) / (2 * np.pi)
+    return np.interp(u, _M, _g)
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]
+    Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    s_ = y * np.cos(TH0) - x * np.sin(TH0)
+    c_ = x * np.cos(TH0) + y * np.sin(TH0)
+    dthw = sympy.atan2(s_, c_)
+    f = 1.0 + AMP / (1.0 + (dthw / WTH) ** 2)
+    return m, f, X0.copy()
+
+
+def report(name, X, X0, edges, tris):
+    th = np.arctan2(X[:, 1], X[:, 0])
+    r = np.hypot(X[:, 0], X[:, 1])
+    r0 = np.hypot(X0[:, 0], X0[:, 1])
+    dth = np.abs(wrap(th - TH0))
+    nv = X.shape[0]
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    s = np.zeros(nv)
+    cc = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le)
+        np.add.at(cc, a, 1.0)
+    nl = s / np.maximum(cc, 1.0)
+    near, far = dth < WTH, dth > 3 * WTH
+    fn = float(nl[far].mean() / nl[near].mean())
+    frac = float((dth < WTH).mean())
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    rdrift = float(np.abs(r - r0).max())
+    print(f"  {name:<22} far/near={fn:5.2f}  frac@θ0={frac:5.3f}  "
+          f"minA/meanA={minA:5.3f}  radialDrift={rdrift:.2e}")
+    return fn, frac, minA, rdrift
+
+
+m0, _, X0u = mk("u")
+edges = _edge_pairs(m0.dm)
+tris = _tri_cells(m0.dm)
+print(f"Angular-only feature θ0={TH0}, Wθ={WTH}, AMP={AMP}, "
+      f"res-{RES}  (uniform frac@θ0≈{WTH/np.pi:.3f})")
+report("undeformed", X0u, X0u, edges, tris)
+
+# (1) exact angular OT TARGET (radius untouched)
+r_u = np.hypot(X0u[:, 0], X0u[:, 1])
+th_u = np.arctan2(X0u[:, 1], X0u[:, 0])
+Th = T(th_u)
+Xtgt = np.stack([r_u * np.cos(Th), r_u * np.sin(Th)], axis=1)
+report("(1) exact angular OT", Xtgt, X0u, edges, tris)
+
+panels = [("undeformed", X0u), ("(1) exact angular OT", Xtgt)]
+
+m, f, X0 = mk("ma")
+_winslow_elliptic(m, f, _auto_pinned_labels(m), False,
+                  phi_degree=2)
+Xma = np.asarray(m.X.coords).copy()
+report("(2) scalar MA", Xma, X0, edges, tris)
+panels.append(("(2) scalar MA", Xma))
+
+m, f, X0 = mk("an")
+_winslow_anisotropic(m, f, _auto_pinned_labels(m), True,
+                     phi_degree=2)
+Xan = np.asarray(m.X.coords).copy()
+report("(3) anisotropic", Xan, X0, edges, tris)
+panels.append(("(3) anisotropic", Xan))
+
+fig, ax = plt.subplots(1, 5, figsize=(24, 5.3))
+for a, (name, X) in zip(ax, panels):
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.3, color="#1f4e8c")
+    a.plot(0.5 * (R_I + R_O) * np.cos(TH0),
+           0.5 * (R_I + R_O) * np.sin(TH0), "o", ms=11,
+           mfc="none", mec="#c0392b", mew=2)
+    a.plot([R_I * np.cos(TH0), R_O * np.cos(TH0)],
+           [R_I * np.sin(TH0), R_O * np.sin(TH0)],
+           "#c0392b", lw=0.8, alpha=0.5)
+    a.set_aspect("equal")
+    a.set_xticks([])
+    a.set_yticks([])
+    a.set_title(name, fontsize=12)
+ax[4].axis("off")
+fig.suptitle(f"Angular-only feature (red spoke θ0={TH0}) — exact "
+             f"angular OT target vs scalar MA vs (3) tensor mover",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/aniso_angular.png", dpi=130)
+print("\nsaved /tmp/metric_mesh/aniso_angular.png")
diff --git a/scripts/aniso_validate_nonsep.py b/scripts/aniso_validate_nonsep.py
new file mode 100644
index 00000000..c3459dc9
--- /dev/null
+++ b/scripts/aniso_validate_nonsep.py
@@ -0,0 +1,122 @@
+"""(3) anisotropic mover — NON-SEPARABLE feature: the case it
+earns its keep (kickoff brief).
+
+Compact Cartesian Gaussian blob at an interior point P=(0.78,0):
+ρ = 1 + AMP·exp(-|X-P|²/W²). Neither pure-r nor pure-θ, so the
+explicit 1-D OT (exact + cheap for separable features) does NOT
+apply — this is the regime the general tensor mover is for.
+
+Compare (3) vs the isotropic scalar paths (MA, spring) on:
+  * minA/meanA               — sliver / quality (higher = better)
+  * far/near edge ratio       — concentration toward the blob
+  * fraction of nodes within W of P  — did nodes migrate in
+and SHOW the grids (zoomed on the blob). Success per the brief:
+(3) gives cleaner, blob-aligned cells (fewer slivers) at
+comparable concentration — it does NOT beat the node-count cap.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _winslow_anisotropic, _winslow_spring,
+    _edge_pairs, _auto_pinned_labels, _tri_cells, _signed_areas)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+PX, PY, W = 0.78, 0.0, 0.10
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]
+    Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    f = 1.0 + AMP * sympy.exp(
+        -(((x - PX) ** 2 + (y - PY) ** 2) / W ** 2))
+    return m, f, X0.copy()
+
+
+m0, _, X0u = mk("u")
+edges = _edge_pairs(m0.dm)
+tris = _tri_cells(m0.dm)
+
+
+def report(name, X, X0):
+    d = np.hypot(X[:, 0] - PX, X[:, 1] - PY)
+    nv = X.shape[0]
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    s = np.zeros(nv)
+    cc = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le)
+        np.add.at(cc, a, 1.0)
+    nl = s / np.maximum(cc, 1.0)
+    near, far = d < 1.5 * W, d > 5 * W
+    fn = float(nl[far].mean() / nl[near].mean())
+    frac = float((d < 1.5 * W).mean())
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    drift = float(np.abs(np.hypot(X[:, 0], X[:, 1])
+                         - np.hypot(X0[:, 0], X0[:, 1])).max())
+    print(f"  {name:<18} far/near={fn:5.2f}  frac@P={frac:6.4f}  "
+          f"minA/meanA={minA:6.4f}  rdrift={drift:.2e}")
+    return fn, frac, minA
+
+
+print(f"Non-separable blob P=({PX},{PY}) W={W}, AMP={AMP}, "
+      f"res-{RES}  (uniform frac@P≈"
+      f"{float((np.hypot(X0u[:,0]-PX,X0u[:,1]-PY)<1.5*W).mean()):.4f})")
+report("undeformed", X0u, X0u)
+panels = [("undeformed", X0u)]
+
+for tag, name, fn in [
+    ("ma", "scalar MA", lambda m, f, p: _winslow_elliptic(
+        m, f, p, False, phi_degree=2)),
+    ("sp", "spring", lambda m, f, p: _winslow_spring(
+        m, f, p, False)),
+    ("an", "(3) anisotropic", lambda m, f, p: _winslow_anisotropic(
+        m, f, p, True, phi_degree=2))]:
+    m, f, X0 = mk(tag)
+    fn(m, f, _auto_pinned_labels(m))
+    X = np.asarray(m.X.coords).copy()
+    report(name, X, X0)
+    panels.append((name, X))
+
+fig, ax = plt.subplots(2, 4, figsize=(19, 9.6))
+tr = mtri.Triangulation(X0u[:, 0], X0u[:, 1], tris)
+for k, (name, X) in enumerate(panels):
+    a = ax[0, k]
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.3, color="#1f4e8c")
+    a.plot(PX, PY, "o", ms=10, mfc="none", mec="#c0392b", mew=2)
+    a.set_aspect("equal")
+    a.set_xticks([])
+    a.set_yticks([])
+    a.set_title(name, fontsize=12)
+    b = ax[1, k]
+    b.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.5, color="#1f4e8c")
+    b.plot(PX, PY, "o", ms=12, mfc="none", mec="#c0392b", mew=2)
+    b.set_aspect("equal")
+    b.set_xlim(PX - 0.28, PX + 0.28)
+    b.set_ylim(PY - 0.28, PY + 0.28)
+    b.set_xticks([])
+    b.set_yticks([])
+    b.set_title(f"{name} — zoom on blob", fontsize=11)
+fig.suptitle(f"NON-SEPARABLE blob (the case (3) is for) — P="
+             f"({PX},{PY}), res-{RES} AMP={AMP}  "
+             f"(success = cleaner, blob-aligned cells / fewer "
+             f"slivers, NOT a bigger far/near)", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+fig.savefig("/tmp/metric_mesh/aniso_nonsep.png", dpi=125)
+print("\nsaved /tmp/metric_mesh/aniso_nonsep.png")
diff --git a/scripts/aniso_validate_radial.py b/scripts/aniso_validate_radial.py
new file mode 100644
index 00000000..7989e238
--- /dev/null
+++ b/scripts/aniso_validate_radial.py
@@ -0,0 +1,175 @@
+"""(3) anisotropic tensor mover — radial-feature validation.
+
+Canonical comparison point (res-16 Annulus, radial Gaussian
+peaked at r=R_O, AMP=8) — the SAME problem as
+scripts/ma_radial_anisotropy.py so the numbers line up with the
+settled MA results. Diagnostics are ANISOTROPY-AWARE (radial /
+tangential edge split vs radius + minA/meanA), NOT the
+anisotropy-blind d/n, and overlay the exact 1-D radial OT.
+
+Success criterion (per the kickoff brief): (3) does NOT beat the
+fixed node-count grading cap. It earns its keep by IMPROVING cell
+alignment / quality — i.e. a LESS degenerate rim layer (higher
+minA/meanA) than the isotropic MA, with comparable radial grading
+and the tangential edges not frozen/blown-out. Grids are rendered
+(the project norm is to SHOW, not just quote scalars).
+"""
+from __future__ import annotations
+import sys
+import time
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _winslow_anisotropic, _winslow_spring,
+    _edge_pairs, _auto_pinned_labels, _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+# Peak radius: R_O = the boundary-peaked pathology (documented:
+# every method over-collapses the pinned rim; the SEPARABLE case
+# where the explicit 1-D OT is the right tool). An interior peak
+# (e.g. R_O-2.5*WIDTH) gives the feature room on both sides — the
+# honest place to judge (3)'s alignment/quality on a radial
+# feature.  `python aniso_validate_radial.py 0.70`
+PEAK = float(sys.argv[1]) if len(sys.argv) > 1 else R_O
+TAG = f"peak{PEAK:.2f}".replace(".", "p")
+
+
+def case(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}_{TAG}", m, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - PEAK) / WIDTH) ** 2)
+    return m, f
+
+
+def split_edges(coords, edges):
+    p0, p1 = coords[edges[:, 0]], coords[edges[:, 1]]
+    mid = 0.5 * (p0 + p1)
+    rmid = np.linalg.norm(mid, axis=1)
+    rhat = mid / np.maximum(rmid, 1e-30)[:, None]
+    ev = p1 - p0
+    L = np.linalg.norm(ev, axis=1)
+    rad_frac = np.abs((ev * rhat).sum(axis=1)) / np.maximum(L, 1e-30)
+    return rmid, L, rad_frac > np.cos(np.pi / 4)
+
+
+m0, _ = case("u")
+edges = _edge_pairs(m0.dm)
+tris = _tri_cells(m0.dm)
+X0 = np.asarray(m0.X.coords).copy()
+dr0 = (R_O - R_I) / RES
+
+# exact 1-D radial OT profile (ground truth, radial direction)
+s = np.linspace(R_I, R_O, 200_000)
+rho = 1.0 + AMP * np.exp(-(((s - PEAK) / WIDTH) ** 2))
+dens = rho * s
+mcum = np.concatenate(
+    [[0.0], np.cumsum(0.5 * (dens[1:] + dens[:-1]) * (s[1] - s[0]))])
+mcum /= mcum[-1]
+rn = np.interp(np.linspace(0, 1, RES + 1), mcum, s)
+ot_rmid = 0.5 * (rn[1:] + rn[:-1])
+ot_dr = np.diff(rn)
+
+bins = np.linspace(R_I, R_O, 13)
+bc = 0.5 * (bins[1:] + bins[:-1])
+
+
+def radial_profile(coords):
+    rmid, L, is_rad = split_edges(coords, edges)
+    out_r, out_t = [], []
+    for i in range(len(bins) - 1):
+        mb = (rmid >= bins[i]) & (rmid < bins[i + 1])
+        out_r.append(L[mb & is_rad].mean() if (mb & is_rad).any()
+                     else np.nan)
+        out_t.append(L[mb & ~is_rad].mean() if (mb & ~is_rad).any()
+                     else np.nan)
+    return np.array(out_r), np.array(out_t)
+
+
+def run(name, fn):
+    m, f = case(name)
+    pin = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    fn(m, f, pin)
+    dt = time.perf_counter() - t
+    return np.asarray(m.X.coords).copy(), dt
+
+
+runs = [("undeformed", X0, 0.0)]
+X, dt = run("ma", lambda m, f, p: _winslow_elliptic(
+    m, f, p, False, phi_degree=2))
+runs.append(("MA (isotropic)", X, dt))
+X, dt = run("aniso", lambda m, f, p: _winslow_anisotropic(
+    m, f, p, True, phi_degree=2))   # robust defaults (cap 2)
+runs.append(("anisotropic (3)", X, dt))
+X, dt = run("spring", lambda m, f, p: _winslow_spring(
+    m, f, p, False))
+runs.append(("spring", X, dt))
+
+print(f"\nAnnulus res-{RES}, radial Gaussian peak r={PEAK:.2f}, "
+      f"AMP={AMP:g}")
+print(f"{'mesh':>16} | rim radial Δ | vs undef | MIN radial Δ "
+      f"| vs undef | minA/meanA | time")
+print("-" * 92)
+und_rr = und_minr = None
+prof = {}
+for name, Xc, dt in runs:
+    pr, pt = radial_profile(Xc)
+    prof[name] = (pr, pt)
+    rr = pr[np.isfinite(pr)][-1]
+    rmid, L, is_rad = split_edges(Xc, edges)
+    minr = L[is_rad].min()
+    A = np.abs(_signed_areas(Xc, tris))
+    minA = A.min() / A.mean()
+    if name == "undeformed":
+        und_rr, und_minr = rr, minr
+    print(f"{name:>16} | {rr:12.4f} | {rr/und_rr:8.3f} | "
+          f"{minr:12.5f} | {minr/und_minr:8.3f} | {minA:10.4f} "
+          f"| {dt:5.2f}s")
+print(f"exact 1-D radial OT  rim/dr0 = {ot_dr.min()/dr0:.3f} "
+      f"(deep/near = {ot_dr.max()/ot_dr.min():.2f})")
+
+# ---- figure: profiles + grids -------------------------------------
+fig = plt.figure(figsize=(19, 9.5))
+gs = fig.add_gridspec(2, 4, height_ratios=[1.0, 1.15])
+axr = fig.add_subplot(gs[0, 0:2])
+axt = fig.add_subplot(gs[0, 2:4])
+cols = {"undeformed": "k", "MA (isotropic)": "#c0392b",
+        "anisotropic (3)": "#1f4e8c", "spring": "#2a9d8f"}
+for name, _, _ in runs:
+    pr, pt = prof[name]
+    axr.plot(bc, pr, "o-", lw=1.6, ms=4, color=cols[name], label=name)
+    axt.plot(bc, pt, "o-", lw=1.6, ms=4, color=cols[name], label=name)
+axr.plot(ot_rmid, ot_dr, "k--", lw=1.4, label="exact 1-D radial OT")
+axr.set_title("RADIAL edge length vs radius")
+axt.set_title("TANGENTIAL edge length vs radius")
+for a in (axr, axt):
+    a.set_xlabel("radius"); a.set_ylabel("edge length")
+    a.axvline(R_O, color="grey", ls=":", lw=0.8)
+    a.legend(fontsize=8); a.grid(alpha=0.3)
+for k, (name, Xc, _) in enumerate(runs):
+    a = fig.add_subplot(gs[1, k])
+    a.triplot(mtri.Triangulation(Xc[:, 0], Xc[:, 1], tris),
+              lw=0.35, color=cols[name])
+    th = np.linspace(0, 2 * np.pi, 300)
+    a.plot(R_O * np.cos(th), R_O * np.sin(th), "grey", lw=0.6,
+           ls=":")
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(name, fontsize=11)
+fig.suptitle(f"(3) anisotropic mover vs isotropic MA — radial "
+             f"Gaussian peak r={PEAK:.2f}, res-{RES} AMP={AMP:g}  "
+             f"(success = fewer slivers / better alignment, "
+             f"NOT a bigger d/n)", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+_out = f"/tmp/metric_mesh/aniso_radial_{TAG}.png"
+fig.savefig(_out, dpi=125)
+print(f"\nsaved {_out}")
diff --git a/scripts/cost_compare.py b/scripts/cost_compare.py
new file mode 100644
index 00000000..d9ca2f00
--- /dev/null
+++ b/scripts/cost_compare.py
@@ -0,0 +1,79 @@
+"""Cost vs grading: time the elastic-spring and the Monge–Ampère
+(BFO) metric paths on identical undeformed Annulus setups, with the
+HONEST deep/near metric. Per-step printed so it is killable early.
+
+Wall time is the cold cost a caller sees on the FIRST call (it
+includes one-time PETSc DM / JIT solver build for MA and the edge
+cache for spring; subsequent calls on the same-topology mesh reuse
+the cache — a warm re-time is also reported).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+AMPS = [0.0, 2.0, 8.0, 20.0]
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print(f"{'AMP':>5} {'nodes':>6} {'edges':>6} | "
+      f"{'spring t/s':>10} {'warm':>6} {'d/n':>5} | "
+      f"{'MA t/s':>8} {'warm':>6} {'d/n':>5}")
+print("-" * 78)
+for k, amp in enumerate(AMPS):
+    # spring (public API default path)
+    m, f = case(amp, f"sp{k}")
+    e = _edge_pairs(m.dm)
+    nv, ne = np.asarray(m.X.coords).shape[0], e.shape[0]
+    t = time.perf_counter()
+    uw.meshing.smooth_mesh_interior(m, metric=f, verbose=False)
+    ts_cold = time.perf_counter() - t
+    rs = honest_ratio(np.asarray(m.X.coords), e)
+    t = time.perf_counter()                       # warm re-call
+    uw.meshing.smooth_mesh_interior(m, metric=f, verbose=False)
+    ts_warm = time.perf_counter() - t
+
+    # MA (preserved, called directly)
+    m, f = case(amp, f"ma{k}")
+    e = _edge_pairs(m.dm)
+    pin = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_elliptic(m, f, pin, False)
+    tm_cold = time.perf_counter() - t
+    rm = honest_ratio(np.asarray(m.X.coords), e)
+    t = time.perf_counter()                       # warm re-call
+    _winslow_elliptic(m, f, pin, False)
+    tm_warm = time.perf_counter() - t
+
+    print(f"{amp:5.1f} {nv:6d} {ne:6d} | "
+          f"{ts_cold:10.2f} {ts_warm:6.2f} {rs:5.2f} | "
+          f"{tm_cold:8.2f} {tm_warm:6.2f} {rm:5.2f}", flush=True)
diff --git a/scripts/interior_refine.py b/scripts/interior_refine.py
new file mode 100644
index 00000000..f6fb43f4
--- /dev/null
+++ b/scripts/interior_refine.py
@@ -0,0 +1,137 @@
+"""The SENSIBLE test: refine around a LOCALISED INTERIOR region
+(a Gaussian blob away from both boundary rings) — what we actually
+want a metric smoother to do. Interior nodes are free to
+redistribute and the boundary node-count is NOT the binding
+constraint here (unlike the thin surface band).
+
+Lagrangian feature: an initial-position vector field X0v is set
+ONCE; metric = 1 + AMP·exp(-|X0v-c|²/W²). It tracks the material
+feature through any deformation.
+
+Methods: volumetric spring (the fast winner) vs MA. Reported:
+  - local edge ratio  far / near-blob   (>1 ⇒ refined at the blob)
+  - mesh quality minA/meanA
+  - wall time
+Mesh pictures: /tmp/metric_mesh/interior.png  (blob centre marked)
+"""
+from __future__ import annotations
+import os
+import time
+import numpy as np
+import matplotlib.pyplot as plt
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _winslow_elliptic, _edge_pairs,
+    _auto_pinned_labels, _tri_cells, _signed_areas)
+
+OUT = "/tmp/metric_mesh"
+os.makedirs(OUT, exist_ok=True)
+R_O, R_I, RES = 1.0, 0.5, 16
+CX, CY, W = 0.78, 0.0, 0.12          # interior blob centre/width
+AMPS = [8.0, 20.0]
+
+
+def mesh_tris(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    out = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            out.append(vs)
+    return np.asarray(out, np.int64)
+
+
+def local_ratio(coords, edges):
+    """mean incident edge length: far-field / near-blob (final
+    position). >1 ⇒ smaller cells at the blob (refined)."""
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); cnt = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(cnt, a, 1.0)
+    nl = s / np.maximum(cnt, 1.0)
+    d = np.sqrt((coords[:, 0] - CX) ** 2 + (coords[:, 1] - CY) ** 2)
+    near = d < W
+    far = d > 4 * W
+    if not near.any() or not far.any():
+        return float("nan")
+    return float(nl[far].mean() / nl[near].mean())
+
+
+def quality(m):
+    t = _tri_cells(m.dm)
+    a = np.abs(_signed_areas(np.asarray(m.X.coords), t))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    X0v = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    X0v.data[:, 0] = X0[:, 0]
+    X0v.data[:, 1] = X0[:, 1]
+    d2 = (X0v.sym[0] - CX) ** 2 + (X0v.sym[1] - CY) ** 2
+    f = 1.0 + amp * sympy.exp(-d2 / W ** 2)
+    return m, f
+
+
+res = []
+for k, amp in enumerate(AMPS):
+    m, f = case(amp, f"s{k}")
+    e = _edge_pairs(m.dm); TRI = mesh_tris(m)
+    p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, f, p, False)
+    ts = time.perf_counter() - t
+    cs = np.asarray(m.X.coords).copy()
+    print(f"[vol-spring] AMP={amp:4.0f}  far/near="
+          f"{local_ratio(cs, e):.2f}  q={quality(m):.3f}  "
+          f"{ts:.2f}s", flush=True)
+
+    m, f = case(amp, f"m{k}")
+    e = _edge_pairs(m.dm)
+    p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_elliptic(m, f, p, False)
+    tm = time.perf_counter() - t
+    cm = np.asarray(m.X.coords).copy()
+    print(f"[MA]         AMP={amp:4.0f}  far/near="
+          f"{local_ratio(cm, e):.2f}  q={quality(m):.3f}  "
+          f"{tm:.2f}s", flush=True)
+    res.append((amp, cs, cm, TRI))
+
+if uw.mpi.rank == 0:
+    th = np.linspace(0, 2 * np.pi, 240)
+    fig, ax = plt.subplots(2, len(AMPS),
+                           figsize=(7.2 * len(AMPS), 14),
+                           facecolor="white")
+    for ci, (amp, cs, cm, TRI) in enumerate(res):
+        for ri, (cc, lab) in enumerate(
+                ((cs, "VOL-SPRING"), (cm, "MA"))):
+            a = ax[ri, ci]
+            a.triplot(cc[:, 0], cc[:, 1], TRI,
+                      color="black", lw=0.6)
+            a.plot(R_O * np.cos(th), R_O * np.sin(th),
+                    "tab:red", lw=1.2)
+            a.plot(R_I * np.cos(th), R_I * np.sin(th),
+                    "tab:blue", lw=1.2)
+            a.add_patch(plt.Circle((CX, CY), W, fill=False,
+                        ec="tab:green", lw=2.0))
+            a.set_title(f"{lab}  AMP={amp:g}", fontsize=14)
+            a.set_aspect("equal")
+            a.set_xticks([]); a.set_yticks([])
+    fig.suptitle("Localised INTERIOR refinement (green = blob, "
+                 "metric peak). Cells should shrink inside it.",
+                 fontsize=15)
+    fig.tight_layout(rect=[0, 0, 1, 0.96])
+    pth = os.path.join(OUT, "interior.png")
+    fig.savefig(pth, dpi=130, bbox_inches="tight")
+    print("Saved", pth)
diff --git a/scripts/ma_analytic_check.py b/scripts/ma_analytic_check.py
new file mode 100644
index 00000000..97a9eae6
--- /dev/null
+++ b/scripts/ma_analytic_check.py
@@ -0,0 +1,46 @@
+"""Ground-truth: the EXACT radial equidistribution map for the
+undeformed Annulus, computed as a 1D quadrature (no FE, no UW3).
+
+If the exact optimal-transport / equidistribution map itself only
+grades deep/near by ~1.07, then "move once by grad(phi)" cannot
+reach strong grading and the plan premise needs revisiting. If it
+grades strongly (deep/near >> 1.1), the weak FE result is a solver
+accuracy problem and the coupled (phi,H) SNES escalation is
+justified.
+
+Radial 2D equidistribution: place node radii r_k so equal target
+"mass" m(r) = \int rho_tgt(s) s ds sits between consecutive shells
+(area element 2*pi*r dr). Radial spacing dr/dxi ∝ 1/(rho_tgt * r).
+This monotone rearrangement IS the OT map under radial symmetry.
+"""
+import numpy as np
+
+R_I, R_O = 0.5, 1.0
+WIDTH = 0.12
+N = 200_000          # quadrature resolution
+N_SHELLS = 16        # ~ matches RES=16 radial resolution
+
+s = np.linspace(R_I, R_O, N)
+ds = s[1] - s[0]
+
+for amp in (0.0, 2.0, 8.0, 20.0):
+    rho = 1.0 + amp * np.exp(-(((s - R_O) / WIDTH) ** 2))
+    dens = rho * s                       # 2D area weighting
+    m = np.concatenate([[0.0], np.cumsum(0.5 * (dens[1:] + dens[:-1]) * ds)])
+    m /= m[-1]                           # normalised cumulative mass
+
+    targets = np.linspace(0.0, 1.0, N_SHELLS + 1)
+    r_nodes = np.interp(targets, m, s)   # equidistributed shell radii
+    dr = np.diff(r_nodes)                # radial spacing per shell
+    r_mid = 0.5 * (r_nodes[1:] + r_nodes[:-1])
+
+    near = r_mid > (R_O - WIDTH)
+    deep = r_mid < (R_O - 0.30)
+    dr_near = dr[near].mean()
+    dr_deep = dr[deep].mean()
+    # uniform start spacing for reference
+    dr0 = (R_O - R_I) / N_SHELLS
+    print(
+        f"AMP={amp:5.1f}  ideal radial spacing  near={dr_near:.4f} "
+        f"deep={dr_deep:.4f}  deep/near={dr_deep/dr_near:6.2f}  "
+        f"(uniform dr0={dr0:.4f}; near/dr0={dr_near/dr0:.3f})")
diff --git a/scripts/ma_angular_ot_target.py b/scripts/ma_angular_ot_target.py
new file mode 100644
index 00000000..33afebbc
--- /dev/null
+++ b/scripts/ma_angular_ot_target.py
@@ -0,0 +1,145 @@
+"""(1) exact angular optimal-transport map as the TARGET for (2)
+directional move-weighting.
+
+Feature is localised in ANGLE only: rho(theta) = 1 + AMP/(1 +
+(Δθ/Wθ)^2), constant in r — every ring has the same angular
+density, so the exact equidistribution is a single monotone angular
+reparametrisation Θ=T(θ) (radius untouched ⇒ zero radial drift, no
+tangle, boundary nodes slide exactly along their ring). This is the
+gold-standard "slide the spare angular nodes toward the feature".
+
+(2) = _winslow_elliptic with boundary_slip + the new opt-in
+move_anisotropy=(w_r,w_θ): rescale the realised displacement in the
+local radial/tangential frame so the same scalar metric is met
+mostly tangentially. Sweep w_θ/w_r and see how close (2) gets to
+the (1) target. AMP=8, res-24.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+TH0, WTH = 0.6, 0.50
+
+
+def wrap(a):
+    return np.arctan2(np.sin(a), np.cos(a))
+
+
+# exact 1-D angular OT push-forward T(θ)
+_g = np.linspace(-np.pi, np.pi, 200_000)
+_rho = 1.0 + AMP / (1.0 + (wrap(_g - TH0) / WTH) ** 2)
+_M = np.concatenate(
+    [[0.0], np.cumsum(0.5 * (_rho[1:] + _rho[:-1]) * np.diff(_g))])
+_M /= _M[-1]
+
+
+def T(theta):
+    u = (wrap(theta) + np.pi) / (2 * np.pi)
+    return np.interp(u, _M, _g)
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]; Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    s_ = y * np.cos(TH0) - x * np.sin(TH0)
+    c_ = x * np.cos(TH0) + y * np.sin(TH0)
+    dthw = sympy.atan2(s_, c_)
+    f = 1.0 + AMP / (1.0 + (dthw / WTH) ** 2)
+    return m, f, X0.copy()
+
+
+def report(name, X, X0, edges, tris):
+    th = np.arctan2(X[:, 1], X[:, 0])
+    r = np.hypot(X[:, 0], X[:, 1])
+    r0 = np.hypot(X0[:, 0], X0[:, 1])
+    dth = np.abs(wrap(th - TH0))
+    nv = X.shape[0]
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    s = np.zeros(nv); cc = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(cc, a, 1.0)
+    nl = s / np.maximum(cc, 1.0)
+    near, far = dth < WTH, dth > 3 * WTH
+    fn = float(nl[far].mean() / nl[near].mean())
+    frac = float((dth < WTH).mean())          # concentration
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    rdrift = float(np.abs(r - r0).max())
+    print(f"  {name:<26} far/near={fn:5.2f}  frac@θ0={frac:5.3f}  "
+          f"minA={minA:5.3f}  radialDrift={rdrift:.2e}")
+    return fn, frac, minA, rdrift
+
+
+m0, _, X0u = mk("u")
+edges = _edge_pairs(m0.dm)
+tris = _tri_cells(m0.dm)
+panels = [("undeformed", X0u)]
+print(f"Angular-only feature θ0={TH0}, Wθ={WTH}, AMP={AMP}, "
+      f"res-{RES}  (uniform frac@θ0≈{WTH/np.pi:.3f})")
+report("undeformed", X0u, X0u, edges, tris)
+
+# (1) TARGET — exact angular OT (radius untouched)
+r_u = np.hypot(X0u[:, 0], X0u[:, 1])
+th_u = np.arctan2(X0u[:, 1], X0u[:, 0])
+Th = T(th_u)
+Xtgt = np.stack([r_u * np.cos(Th), r_u * np.sin(Th)], axis=1)
+print("(1) exact angular OT  [TARGET]:")
+tgt = report("angular-OT target", Xtgt, X0u, edges, tris)
+panels.append(("(1) exact angular OT", Xtgt))
+
+# (2) winslow + slip + move_anisotropy sweep
+print("(2) _winslow_elliptic + boundary_slip, move_anisotropy:")
+sweep = [None, (1.0, 1.0), (1.0, 5.0), (0.2, 1.0), (0.05, 1.0)]
+best = None
+for ma in sweep:
+    m, f, X0 = mk(f"ma{ma}")
+    e = _edge_pairs(m.dm); tr = _tri_cells(m.dm)
+    pin = _auto_pinned_labels(m)
+    _winslow_elliptic(m, f, pin, False, boundary_slip=True,
+                      linear_solver="direct", phi_degree=2,
+                      move_anisotropy=ma)
+    X = np.asarray(m.X.coords).copy()
+    tag = f"move_aniso={ma}"
+    fn, fr, mA, rd = report(tag, X, X0, e, tr)
+    if ma in (None, (0.05, 1.0)):
+        panels.append((f"(2) {('iso' if ma is None else 'tang-pref')}",
+                       X))
+    if best is None or fr > best[0]:
+        best = (fr, tag)
+print(f"  → closest to target concentration: {best[1]}")
+
+fig, ax = plt.subplots(1, 4, figsize=(20, 5.3))
+for a, (name, X) in zip(ax, panels[:4]):
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.35, color="#1f4e8c")
+    thP = np.linspace(0, 2 * np.pi, 200)
+    a.plot(0.5 * (R_I + R_O) * np.cos(TH0),
+           0.5 * (R_I + R_O) * np.sin(TH0), "o", ms=11,
+           mfc="none", mec="#c0392b", mew=2)
+    a.plot([R_I * np.cos(TH0), R_O * np.cos(TH0)],
+           [R_I * np.sin(TH0), R_O * np.sin(TH0)],
+           "#c0392b", lw=0.8, alpha=0.5)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(name, fontsize=12)
+fig.suptitle(f"Angular-only feature (red spoke at θ0={TH0}) — exact "
+             f"angular OT vs winslow iso vs tangential-preferred",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/ma_angular_ot.png", dpi=130)
+print("saved /tmp/metric_mesh/ma_angular_ot.png")
diff --git a/scripts/ma_cost_grading.py b/scripts/ma_cost_grading.py
new file mode 100644
index 00000000..095a38b0
--- /dev/null
+++ b/scripts/ma_cost_grading.py
@@ -0,0 +1,72 @@
+"""MA-only cost + HONEST grading + mesh-validity check, cold & warm,
+across AMP. Confirms the direct-solver speedup preserves the
+grading/quality and the AMP=0 exact-no-op invariant.
+
+Recorded BFO baselines (project memory, GAMG path, honest metric):
+  AMP 0 → d/n ≈ 1.02 (no-op)   AMP 8 → 1.71   AMP 20 → 1.54
+Grading must match these to within noise; minA>0 (no tangle).
+Spring is NOT re-run here (reference its recorded ~0.3 s / 1.65–1.79).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+AMPS = [0.0, 2.0, 8.0, 20.0]
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+def min_area(m):
+    tris = _tri_cells(m.dm)
+    if tris is None:
+        return float("nan")
+    a = np.abs(_signed_areas(np.asarray(m.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+print(f"{'AMP':>5} {'cold/s':>7} {'warm/s':>7} {'d/n':>6} "
+      f"{'minA/meanA':>10}   baseline d/n")
+print("-" * 56)
+BASE = {0.0: 1.02, 2.0: 1.43, 8.0: 1.71, 20.0: 1.54}
+for k, amp in enumerate(AMPS):
+    m, f = case(amp, f"c{k}")
+    e = _edge_pairs(m.dm)
+    pin = _auto_pinned_labels(m)
+    t = time.perf_counter(); _winslow_elliptic(m, f, pin, False)
+    cold = time.perf_counter() - t
+    dn = honest_ratio(np.asarray(m.X.coords), e)
+    ma = min_area(m)
+    t = time.perf_counter(); _winslow_elliptic(m, f, pin, False)
+    warm = time.perf_counter() - t
+    print(f"{amp:5.1f} {cold:7.2f} {warm:7.2f} {dn:6.3f} "
+          f"{ma:10.4f}   (was ~{BASE[amp]:.2f})", flush=True)
diff --git a/scripts/ma_gamg_vs_direct.py b/scripts/ma_gamg_vs_direct.py
new file mode 100644
index 00000000..2cf819b9
--- /dev/null
+++ b/scripts/ma_gamg_vs_direct.py
@@ -0,0 +1,112 @@
+"""BFO MA: parallel-scalable GAMG-reuse path vs the serial MUMPS
+direct path. Same _winslow_elliptic, only linear_solver= differs.
+
+Validates (a) grading bit-for-bit unchanged (d/n must match the
+recorded 1.02/1.43/1.71/1.54), (b) cost cold+warm, (c) that the
+factor/setup-once-reuse + Krylov warm-start actually fire — reported
+as the φ-Poisson KSP iteration count per Picard iter (should be a
+few once warm, NOT a fresh GAMG setup each time).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _WINSLOW_CACHE)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+AMPS = [0.0, 2.0, 8.0, 20.0]
+BASE = {0.0: 1.02, 2.0: 1.43, 8.0: 1.71, 20.0: 1.54}
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def min_area(m):
+    tris = _tri_cells(m.dm)
+    a = np.abs(_signed_areas(np.asarray(m.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+def ksp_its(m, amp):
+    """Re-run AMP=8 with a per-Picard φ-KSP-iteration probe."""
+    m2, f = case(amp, "kp")
+    e = _edge_pairs(m2.dm)
+    pin = _auto_pinned_labels(m2)
+    _winslow_elliptic(m2, f, pin, False, linear_solver="gamg")
+    # warm call, instrument the cached φ solver's KSP
+    k = [kk for kk in _WINSLOW_CACHE
+         if kk[0] == id(m2) and kk[-1] == "gamg"][0]
+    phi, ps, gradphi, gproj, hsolver, vol = _WINSLOW_CACHE[k]
+    raw = ps.solve
+    its = []
+    def w(*a, **kw):
+        r = raw(*a, **kw)
+        try:
+            its.append(ps.snes.getKSP().getIterationNumber())
+        except Exception:
+            its.append(-1)
+        return r
+    ps.solve = w
+    _winslow_elliptic(m2, f, pin, False, linear_solver="gamg")
+    return its
+
+
+print(f"{'AMP':>5} | {'direct':>16} | {'gamg':>16} | grading")
+print(f"{'':>5} | {'cold':>7}{'warm':>9} | {'cold':>7}{'warm':>9} | "
+      f"{'dir d/n':>8}{'gmg d/n':>9}  base")
+print("-" * 74)
+for k, amp in enumerate(AMPS):
+    md, f = case(amp, f"d{k}")
+    e = _edge_pairs(md.dm); pin = _auto_pinned_labels(md)
+    t = time.perf_counter()
+    _winslow_elliptic(md, f, pin, False, linear_solver="direct")
+    dc = time.perf_counter() - t
+    dn_d = honest_ratio(np.asarray(md.X.coords), e)
+    t = time.perf_counter()
+    _winslow_elliptic(md, f, pin, False, linear_solver="direct")
+    dw = time.perf_counter() - t
+
+    mg, f = case(amp, f"g{k}")
+    e = _edge_pairs(mg.dm); pin = _auto_pinned_labels(mg)
+    t = time.perf_counter()
+    _winslow_elliptic(mg, f, pin, False, linear_solver="gamg")
+    gc = time.perf_counter() - t
+    dn_g = honest_ratio(np.asarray(mg.X.coords), e)
+    t = time.perf_counter()
+    _winslow_elliptic(mg, f, pin, False, linear_solver="gamg")
+    gw = time.perf_counter() - t
+
+    print(f"{amp:5.1f} | {dc:7.2f}{dw:9.2f} | {gc:7.2f}{gw:9.2f} | "
+          f"{dn_d:8.3f}{dn_g:9.3f}  ~{BASE[amp]:.2f}", flush=True)
+
+its = ksp_its(None, 8.0)
+print(f"\nφ-Poisson KSP iters / Picard (gamg, AMP=8, warm call): "
+      f"first={its[0]} rest={its[1:6]}... max={max(its)} "
+      f"mean={np.mean(its):.1f}")
+print("(low & flat ⇒ hierarchy built once + Krylov warm-start firing)")
diff --git a/scripts/ma_heavytail_metric.py b/scripts/ma_heavytail_metric.py
new file mode 100644
index 00000000..cd87507c
--- /dev/null
+++ b/scripts/ma_heavytail_metric.py
@@ -0,0 +1,106 @@
+"""The snuggle fix: a single Gaussian's width sets BOTH the
+resolution scale and the reach, so narrow = sharp-but-isolated,
+broad = global-but-washed-out. A heavy-tailed (Lorentzian) monitor
+has a sharp core (real resolution at the feature) AND a slow ~1/d²
+tail (every node feels an inward pull → the bulk migrates toward
+the feature). Interior blob (0.78,0), AMP=8, res-24.
+
+Diagnostics that actually mean something:
+  far/near  : refinement AT the feature (>1 ⇒ resolved; the point)
+  inward    : mean (d0 - d_final) for nodes that START far
+              (d0 > 0.35) — POSITIVE ⇒ distant nodes snuggled IN
+  minA/meanA: quality (no sliver)
+Also: GAMG robust on the heavy-tail interior metric?
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _WINSLOW_CACHE)
+
+R_O, R_I, AMP, CX, CY, RES = 1.0, 0.5, 8.0, 0.78, 0.0, 24
+WC = 0.12                                  # sharp core length scale
+
+
+def mk(kind, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]; Xv.data[:, 1] = X0[:, 1]
+    d2 = (Xv.sym[0] - CX) ** 2 + (Xv.sym[1] - CY) ** 2
+    if kind == "gauss-narrow":
+        f = 1.0 + AMP * sympy.exp(-d2 / WC ** 2)
+    elif kind == "gauss-broad":
+        f = 1.0 + AMP * sympy.exp(-d2 / 0.30 ** 2)
+    else:   # lorentzian: sharp core (scale WC) + slow 1/d^2 tail
+        f = 1.0 + AMP / (1.0 + d2 / WC ** 2)
+    return m, f, X0.copy()
+
+
+def diag(X, X0, edges, tris):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    nv = X.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    d0 = np.sqrt((X0[:, 0] - CX) ** 2 + (X0[:, 1] - CY) ** 2)
+    df = np.sqrt((X[:, 0] - CX) ** 2 + (X[:, 1] - CY) ** 2)
+    near, far = d0 < WC, d0 > 4 * WC
+    ratio = (float(nl[far].mean() / nl[near].mean())
+             if near.any() and far.any() else float("nan"))
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    distant = d0 > 0.35                       # started far from blob
+    inward = float((d0[distant] - df[distant]).mean())
+    return ratio, minA, inward, float(np.linalg.norm(X - X0,
+                                                     axis=1).max())
+
+
+fig, ax = plt.subplots(1, 3, figsize=(16, 5.4))
+print(f"{'metric':>13} {'solver':>7} | {'far/near':>8} {'minA':>6} "
+      f"{'inward':>7} {'maxdx':>6} | gamg KSP")
+print("-" * 74)
+for a, kind in zip(ax, ("gauss-narrow", "gauss-broad",
+                        "lorentzian")):
+    for mode in ("direct", "gamg"):
+        m, f, X0 = mk(kind, f"{kind}{mode}")
+        e = _edge_pairs(m.dm); tris = _tri_cells(m.dm)
+        pin = _auto_pinned_labels(m)
+        _winslow_elliptic(m, f, pin, False,
+                          linear_solver=mode, phi_degree=2)
+        X = np.asarray(m.X.coords).copy()
+        r, mA, inw, mdx = diag(X, X0, e, tris)
+        ks = ""
+        if mode == "gamg":
+            k = [kk for kk in _WINSLOW_CACHE if kk[0] == id(m)
+                 and kk[-2] == "gamg" and kk[-1] == 2][0]
+            ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+            ks = f"r={ksp.getConvergedReason()} it={ksp.getIterationNumber()}"
+        print(f"{kind:>13} {mode:>7} | {r:8.3f} {mA:6.3f} "
+              f"{inw:+7.4f} {mdx:6.3f} | {ks}", flush=True)
+        if mode == "direct":
+            a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+                      lw=0.4, color="#1f4e8c")
+            a.plot(CX, CY, "o", ms=11, mfc="none", mec="#c0392b",
+                   mew=2)
+            a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+            a.set_title(f"{kind}\nfar/near={r:.2f}  minA={mA:.3f}  "
+                        f"inward={inw:+.3f}", fontsize=11)
+fig.suptitle("Snuggle test, interior blob AMP=8 res-24 — Lorentzian: "
+             "sharp core + heavy tail ⇒ distant nodes migrate IN, "
+             "feature stays sharp", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.93])
+fig.savefig("/tmp/metric_mesh/ma_heavytail.png", dpi=135)
+print("\nsaved /tmp/metric_mesh/ma_heavytail.png")
diff --git a/scripts/ma_localised_reach_gamg.py b/scripts/ma_localised_reach_gamg.py
new file mode 100644
index 00000000..50864c38
--- /dev/null
+++ b/scripts/ma_localised_reach_gamg.py
@@ -0,0 +1,126 @@
+"""Two questions:
+
+(1) "Too local" — a narrow Gaussian blob has ~zero gradient over
+    most of the mesh so the bulk never moves. Widen the metric's
+    reach (W) and the whole mesh feels a pull → nodes migrate IN
+    toward the feature ("snuggle up") instead of a local crush.
+    Diagnostic: fraction of interior nodes that actually move, the
+    far/near-blob refinement ratio, and mesh quality (minA/meanA).
+
+(2) For these LOCALISED interior cases (blob away from the pinned
+    boundary — no boundary-peaked-vs-pinned pathology), how robust
+    is GAMG vs the boundary-peaked annulus where it was erratic?
+    Report KSP reason/its + cost, direct vs gamg, two resolutions.
+
+Interior blob at (0.78, 0), AMP=8. Saves the grid figure too.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _pinned_mask, _WINSLOW_CACHE)
+
+R_O, R_I, AMP = 1.0, 0.5, 8.0
+CX, CY = 0.78, 0.0
+WIDTHS = [0.12, 0.30, 0.50]          # narrow / broad / very-broad
+RESS = [24, 40]
+
+
+def case(res, W, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    X0v = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    X0v.data[:, 0] = X0[:, 0]; X0v.data[:, 1] = X0[:, 1]
+    d2 = (X0v.sym[0] - CX) ** 2 + (X0v.sym[1] - CY) ** 2
+    f = 1.0 + AMP * sympy.exp(-d2 / W ** 2)
+    return m, f, X0.copy()
+
+
+def diagnostics(X, X0, edges, tris, W):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    nv = X.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    d = np.sqrt((X[:, 0] - CX) ** 2 + (X[:, 1] - CY) ** 2)
+    near, far = d < W, d > 4 * W
+    ratio = (float(nl[far].mean() / nl[near].mean())
+             if near.any() and far.any() else float("nan"))
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    disp = np.linalg.norm(X - X0, axis=1)
+    h = (R_O - R_I) / 24.0
+    moved = float((disp > 0.05 * h).mean())     # fraction that moved
+    return ratio, minA, moved, disp.max()
+
+
+print(f"{'W':>5} {'RES':>4} {'solver':>7} | {'far/near':>8} "
+      f"{'minA':>6} {'moved%':>6} {'maxdx':>6} | {'cold':>6} "
+      f"{'warm':>6} | gamg KSP")
+print("-" * 86)
+for W in WIDTHS:
+    for res in RESS:
+        for mode in ("direct", "gamg"):
+            m, f, X0 = case(res, W, f"{mode}{res}{int(W*100)}")
+            e = _edge_pairs(m.dm); tris = _tri_cells(m.dm)
+            pin = _auto_pinned_labels(m)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=2)
+            cold = time.perf_counter() - t
+            X = np.asarray(m.X.coords).copy()
+            ratio, minA, moved, mdx = diagnostics(X, X0, e, tris, W)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=2)
+            warm = time.perf_counter() - t
+            ks = ""
+            if mode == "gamg":
+                k = [kk for kk in _WINSLOW_CACHE if kk[0] == id(m)
+                     and kk[-2] == "gamg" and kk[-1] == 2][0]
+                ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+                ks = (f"r={ksp.getConvergedReason()} "
+                      f"it={ksp.getIterationNumber()}")
+            print(f"{W:5.2f} {res:4d} {mode:>7} | {ratio:8.3f} "
+                  f"{minA:6.3f} {moved*100:5.0f}% {mdx:6.3f} | "
+                  f"{cold:6.2f} {warm:6.2f} | {ks}", flush=True)
+    print("-" * 86)
+
+# grid picture: narrow vs broad vs very-broad (direct, res-24)
+fig, ax = plt.subplots(1, 3, figsize=(16, 5.4))
+for a, W in zip(ax, WIDTHS):
+    m, f, X0 = case(24, W, f"fig{int(W*100)}")
+    pin = _auto_pinned_labels(m)
+    tris = _tri_cells(m.dm); e = _edge_pairs(m.dm)
+    _winslow_elliptic(m, f, pin, False, linear_solver="direct",
+                      phi_degree=2)
+    X = np.asarray(m.X.coords).copy()
+    ratio, minA, moved, mdx = diagnostics(X, X0, e, tris, W)
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.4, color="#1f4e8c")
+    a.plot(CX, CY, "o", ms=10, mfc="none", mec="#c0392b", mew=2)
+    th = np.linspace(0, 2 * np.pi, 200)
+    a.plot(CX + W * np.cos(th), CY + W * np.sin(th),
+            "#c0392b", lw=0.8, alpha=0.6)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(f"W={W:g}   far/near={ratio:.2f}\n"
+                f"moved={moved*100:.0f}%  minA/meanA={minA:.3f}",
+                fontsize=12)
+fig.suptitle("Interior blob (red), AMP=8, res-24 — narrow metric "
+             "moves only local nodes; broad metric draws the whole "
+             "mesh in", fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.94])
+fig.savefig("/tmp/metric_mesh/ma_localised_reach.png", dpi=135)
+print("\nsaved /tmp/metric_mesh/ma_localised_reach.png")
diff --git a/scripts/ma_lorentzian_slip_final.py b/scripts/ma_lorentzian_slip_final.py
new file mode 100644
index 00000000..7685792b
--- /dev/null
+++ b/scripts/ma_lorentzian_slip_final.py
@@ -0,0 +1,108 @@
+"""Decisive: the PROVEN concentrator is a compact Cartesian
+Lorentzian about the feature POINT P (last run: far/near 2.74,
+distant nodes migrate in). The two polar-separable formulations
+failed — v1 chord 2(1-cosΔθ) saturates at the antipode (no angular
+reach); v2 (r-R0)²+Δθ² with a heavy radial tail is a low-gradient
+radial *spoke* that the smoother washes out (far/near ~1.1).
+
+So to get a θ-pull: keep the compact Cartesian Lorentzian but place
+the feature NEAR the outer boundary (r0=0.88, offset off the pinned
+rim to avoid the sliver pathology) and turn boundary_slip ON — the
+rim nodes should then slide tangentially toward θ0. Compare
+slip OFF/ON × direct/gamg. res-24, AMP=8.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _pinned_mask, _WINSLOW_CACHE)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+R0, TH0, WC = 0.88, 0.6, 0.15                  # near-rim feature
+PX, PY = R0 * np.cos(TH0), R0 * np.sin(TH0)
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]; Xv.data[:, 1] = X0[:, 1]
+    d2 = (Xv.sym[0] - PX) ** 2 + (Xv.sym[1] - PY) ** 2  # compact, x,y
+    f = 1.0 + AMP / (1.0 + d2 / WC ** 2)
+    return m, f, X0.copy()
+
+
+def diag(X, X0, edges, tris, m):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    nv = X.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    dP0 = np.hypot(X0[:, 0] - PX, X0[:, 1] - PY)
+    dPf = np.hypot(X[:, 0] - PX, X[:, 1] - PY)
+    near, far = dP0 < WC, dP0 > 4 * WC
+    ratio = float(nl[far].mean() / nl[near].mean())
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    inward = float((dP0[dP0 > 0.5] - dPf[dP0 > 0.5]).mean())
+    rf = np.hypot(X[:, 0], X[:, 1])
+    af = np.arctan2(X[:, 1] * np.cos(TH0) - X[:, 0] * np.sin(TH0),
+                    X[:, 0] * np.cos(TH0) + X[:, 1] * np.sin(TH0))
+    is_b = _pinned_mask(m.dm, tuple(_auto_pinned_labels(m)))
+    outer = is_b & (rf > 0.9 * R_O)
+    nclust = int((np.abs(af[outer]) < 0.35).sum())
+    drift = float(np.abs(rf[outer] - R_O).max())
+    return ratio, minA, inward, nclust, drift
+
+
+print(f"{'slip':>5} {'solver':>7} | {'far/near':>8} {'minA':>6} "
+      f"{'inward':>7} {'#rim@θ0':>7} {'drift':>9} | gamg KSP")
+print("-" * 80)
+panels = []
+m0, _, X0u = mk("u")
+tris = _tri_cells(m0.dm)
+panels.append(("undeformed", X0u))
+for slip in (False, True):
+    for mode in ("direct", "gamg"):
+        m, f, X0 = mk(f"s{int(slip)}{mode}")
+        e = _edge_pairs(m.dm); tr = _tri_cells(m.dm)
+        pin = _auto_pinned_labels(m)
+        _winslow_elliptic(m, f, pin, False, boundary_slip=slip,
+                          linear_solver=mode, phi_degree=2)
+        X = np.asarray(m.X.coords).copy()
+        r, mA, inw, nc, drf = diag(X, X0, e, tr, m)
+        ks = ""
+        if mode == "gamg":
+            k = [kk for kk in _WINSLOW_CACHE if kk[0] == id(m)
+                 and kk[-2] == "gamg" and kk[-1] == 2][0]
+            ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+            ks = f"r={ksp.getConvergedReason()} it={ksp.getIterationNumber()}"
+        print(f"{str(slip):>5} {mode:>7} | {r:8.3f} {mA:6.3f} "
+              f"{inw:+7.4f} {nc:7d} {drf:9.2e} | {ks}", flush=True)
+        if mode == "direct":
+            panels.append((f"slip={slip}", X))
+
+fig, ax = plt.subplots(1, 3, figsize=(16, 5.4))
+for a, (name, X) in zip(ax, panels):
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.4, color="#1f4e8c")
+    a.plot(PX, PY, "o", ms=12, mfc="none", mec="#c0392b", mew=2)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(name, fontsize=12)
+fig.suptitle(f"Compact Cartesian Lorentzian near rim (r={R0}, "
+             f"θ={TH0:g}) red, AMP={AMP:g} res-{RES} — slip ON ⇒ rim "
+             f"nodes slide tangentially toward θ₀", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.94])
+fig.savefig("/tmp/metric_mesh/ma_lorentzian_slip.png", dpi=135)
+print("\nsaved /tmp/metric_mesh/ma_lorentzian_slip.png")
diff --git a/scripts/ma_metric_tensor_viz.py b/scripts/ma_metric_tensor_viz.py
new file mode 100644
index 00000000..56f44833
--- /dev/null
+++ b/scripts/ma_metric_tensor_viz.py
@@ -0,0 +1,148 @@
+"""(3) metric-tensor machinery — visual sanity check.
+
+Scalar density rho(x) in  ->  anisotropic metric tensor derived from
+its GRADIENT, with NO (r,theta) frame ever specified. The point: the
+eigenframe of M = I + beta (grad rho (x) grad rho) auto-aligns to
+the feature (r-hat for a radial feature, theta-hat for an angular
+one). We draw the desired-cell ellipse (semi-axes h_i = lambda_i^-1/2
+of M, eigen-clamped) at sampled nodes for two features and check the
+orientation/flattening is correct BEFORE building the anisotropic
+mover.
+
+grad rho is computed analytically in CARTESIAN (chain rule through
+r,theta only to get the closed form) — the construction itself only
+ever sees grad rho in (x,y); the (r,theta) alignment is emergent.
+In production grad rho would be a Vector_Projection of rho
+(first-derivative, UW3-clean) — same machinery, same result.
+"""
+from __future__ import annotations
+import numpy as np
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+from matplotlib.patches import Ellipse
+from matplotlib.collections import PatchCollection
+import underworld3 as uw
+from underworld3.meshing.smoothing import _tri_cells
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 16, 8.0
+H0 = (R_O - R_I) / RES                 # nominal spacing
+H_MIN, H_MAX = 0.12 * H0, 1.0 * H0     # eigen-clamp band (≤8:1)
+BETA = 200.0                           # anisotropy strength
+
+
+def grad_radial(x, y):
+    """rho = 1 + AMP exp(-((r-r0)/Wr)^2); grad is purely radial."""
+    r0, Wr = 0.85, 0.12
+    r = np.hypot(x, y)
+    rho = 1.0 + AMP * np.exp(-((r - r0) / Wr) ** 2)
+    drdr = AMP * np.exp(-((r - r0) / Wr) ** 2) * (-2 * (r - r0) / Wr ** 2)
+    gx = drdr * x / r
+    gy = drdr * y / r
+    return rho, np.stack([gx, gy], axis=1)
+
+
+def grad_angular(x, y):
+    """rho = 1 + AMP/(1+(Δθ/Wθ)^2); grad is purely tangential."""
+    th0, Wth = 0.6, 0.5
+    r = np.hypot(x, y)
+    th = np.arctan2(y, x)
+    dth = np.arctan2(np.sin(th - th0), np.cos(th - th0))
+    u = dth / Wth
+    rho = 1.0 + AMP / (1.0 + u ** 2)
+    drdth = AMP * (-1.0) / (1.0 + u ** 2) ** 2 * (2 * u / Wth)
+    that = np.stack([-y / r, x / r], axis=1)          # θ̂
+    g = (drdth / r)[:, None] * that
+    return rho, g
+
+
+def build_metric(grad):
+    """THE machinery: M = (1/h0^2)[ I + beta ĝĝᵀ (|g|/gref)^2 ],
+    eigen-clamped so spacing ∈ [H_MIN, H_MAX]. Returns desired-cell
+    semi-axes (h1,h2) and eigenvectors per node."""
+    n = grad.shape[0]
+    gnorm = np.linalg.norm(grad, axis=1)
+    gref = gnorm.max() if gnorm.max() > 0 else 1.0
+    lam_hi = 1.0 / H_MIN ** 2
+    lam_lo = 1.0 / H_MAX ** 2
+    H1 = np.empty(n); H2 = np.empty(n)
+    V1 = np.zeros((n, 2)); V2 = np.zeros((n, 2))
+    base = 1.0 / H0 ** 2
+    for i in range(n):
+        g = grad[i]
+        gn = gnorm[i]
+        M = base * np.eye(2)
+        if gn > 1e-30:
+            gh = g / gn
+            M = base * (np.eye(2)
+                        + BETA * (gn / gref) ** 2 * np.outer(gh, gh))
+        w, Vec = np.linalg.eigh(M)              # ascending
+        w = np.clip(w, lam_lo, lam_hi)
+        # desired spacing along each eigenvector = w^-1/2
+        H2[i], H1[i] = 1.0 / np.sqrt(w[1]), 1.0 / np.sqrt(w[0])
+        V2[i], V1[i] = Vec[:, 1], Vec[:, 0]
+    return H1, V1, H2, V2, gnorm
+
+
+m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                        cellSize=1.0 / RES, qdegree=3)
+Xn = np.asarray(m.X.coords)
+tris = _tri_cells(m.dm)
+# clean polar SAMPLE grid (decouple the ellipse field from mesh-node
+# density — the construction is identical, only the sample points
+# are a regular grid for legibility)
+rr = np.linspace(R_I + 0.02, R_O - 0.02, 11)
+tt = np.linspace(0, 2 * np.pi, 49)[:-1]
+RG, TG = np.meshgrid(rr, tt)
+Xs = np.stack([(RG * np.cos(TG)).ravel(),
+               (RG * np.sin(TG)).ravel()], axis=1)
+
+fig, ax = plt.subplots(1, 2, figsize=(15, 7.2))
+for a, (name, gfn, mark) in zip(
+        ax,
+        [("radial feature  ρ(r)", grad_radial, ("ring", 0.85)),
+         ("angular feature  ρ(θ)", grad_angular, ("spoke", 0.6))]):
+    rho_n, _ = gfn(Xn[:, 0], Xn[:, 1])
+    _, grad = gfn(Xs[:, 0], Xs[:, 1])
+    H1, V1, H2, V2, gn = build_metric(grad)
+    a.tricontourf(mtri.Triangulation(Xn[:, 0], Xn[:, 1], tris),
+                  rho_n, levels=20, cmap="Blues", alpha=0.40)
+    aniso = H1 / np.maximum(H2, 1e-30)          # ≥1, =1 isotropic
+    sc = 1.1                                    # display scale
+    pats, cv = [], []
+    for i in range(Xs.shape[0]):
+        ang = np.degrees(np.arctan2(V2[i, 1], V2[i, 0]))
+        pats.append(Ellipse((Xs[i, 0], Xs[i, 1]),
+                            width=2 * sc * H2[i],     # along ∇ρ
+                            height=2 * sc * H1[i],    # along feature
+                            angle=ang))
+        cv.append(aniso[i])
+    pc = PatchCollection(pats, facecolor="none", lw=0.9,
+                         cmap="autumn_r")
+    pc.set_array(np.array(cv))
+    pc.set_clim(1.0, aniso.max())
+    a.add_collection(pc)
+    if mark[0] == "ring":
+        th = np.linspace(0, 2 * np.pi, 300)
+        a.plot(mark[1] * np.cos(th), mark[1] * np.sin(th),
+                "k--", lw=1.0, alpha=0.5)
+    else:
+        a.plot([R_I * np.cos(mark[1]), R_O * np.cos(mark[1])],
+                [R_I * np.sin(mark[1]), R_O * np.sin(mark[1])],
+                "k--", lw=1.0, alpha=0.5)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_xlim(-1.08, 1.08); a.set_ylim(-1.08, 1.08)
+    a.set_title(f"{name}\nellipse = desired cell (short ⟂ to the "
+                f"feature); max anisotropy {aniso.max():.1f}:1",
+                fontsize=12)
+    fig.colorbar(pc, ax=a, fraction=0.046, pad=0.02,
+                 label="anisotropy ratio")
+
+fig.suptitle("(3) gradient-derived metric tensor — eigenframe "
+             "auto-aligns to the feature (no r,θ specified)",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/ma_metric_tensor.png", dpi=135)
+print("saved /tmp/metric_mesh/ma_metric_tensor.png")
+print(f"H0={H0:.4f}  clamp=[{H_MIN:.4f},{H_MAX:.4f}]  beta={BETA}")
diff --git a/scripts/ma_newton_phase0.py b/scripts/ma_newton_phase0.py
new file mode 100644
index 00000000..ab750617
--- /dev/null
+++ b/scripts/ma_newton_phase0.py
@@ -0,0 +1,253 @@
+"""Phase 0 (run 2) — Newton/cofactor vs BFO-Picard, measured on the
+TRANSPORT MAP (the right yardstick).
+
+Run 1 showed `det(I+H_rec)-g` has a large irreducible floor (the
+recovered-Hessian under-estimation = the settled grading-cap root
+cause); it is NOT a contraction metric. The efficiency question is:
+does Newton reach the SAME fixed-node map in far fewer iterations
+than BFO's ~20-25?  Metrics, geometry held FIXED:
+
+  d_k   = max nodal |grad phi_k|              (size of the map)
+  Dlt_k = max |grad phi_k - grad phi_{k-1}|   (map increment -> 0)
+
+and, after ONE signed-area-backtracked node move at the end, the
+honest deep/near grading (must match between schemes == same fixed
+point == regression guard).
+
+  * BFO    : Delta phi = sqrt(...)-2, omega=0.4   (shipped)
+  * Newton : div(C_k grad dphi) = g-det(I+H_k), C_k=cof(I+H_k),
+             phi <- phi + lam dphi, fixed lam, det(I+H)>0 backtrack
+             (the legitimate convexity safeguard only)
+
+No src/ changes. Per-iter print. Writes the design-doc summary.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.constitutive_models import DiffusionModel
+from underworld3.meshing.smoothing import (
+    _hessian_recovery_class, _use_direct_solver, _auto_pinned_labels,
+    _edge_pairs, _pinned_mask, _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+N_BFO, N_NEWT, NEWT_LAM = 40, 15, 0.5
+
+m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                        cellSize=1.0 / RES, qdegree=3)
+r0 = uw.discretisation.MeshVariable("r0n", m, vtype=uw.VarType.SCALAR,
+                                    degree=1, continuous=True)
+coords0 = np.asarray(m.X.coords).copy()
+r0.data[:, 0] = np.sqrt((coords0 ** 2).sum(axis=1))
+metric = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+rho_t = np.asarray(uw.function.evaluate(metric, coords0)).reshape(-1)
+c = 1.0 / (float(np.mean(1.0 / np.sqrt(rho_t))) ** 2)
+g_expr = c / metric
+cdim = m.cdim
+pin = _auto_pinned_labels(m)
+edges = _edge_pairs(m.dm)
+is_pin = _pinned_mask(m.dm, tuple(pin))
+tris = _tri_cells(m.dm)
+
+
+def honest_dn_after_move(disp):
+    free = ~is_pin
+    a0 = _signed_areas(coords0, tris)
+    orient = np.sign(np.median(a0)) or 1.0
+    scale, new = 1.0, coords0.copy()
+    for _ in range(12):
+        trial = coords0.copy()
+        trial[free] += scale * disp[free]
+        if (_signed_areas(trial, tris) * orient).min() > 0.0:
+            new = trial
+            break
+        scale *= 0.5
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(new[v1] - new[v0], axis=1)
+    nv = new.shape[0]
+    s = np.zeros(nv); cnt = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(cnt, a, 1.0)
+    nl = s / np.maximum(cnt, 1.0)
+    rr = np.sqrt((new ** 2).sum(axis=1))
+    deep = (rr >= R_I) & (rr < R_I + 0.20)
+    near = (rr > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean()), scale
+
+
+def make_gproj(phi):
+    gv = uw.discretisation.MeshVariable(
+        f"gp_{id(phi)}", m, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    gp = uw.systems.Vector_Projection(m, gv)
+    gp.smoothing = 0.0
+    _use_direct_solver(gp)
+    X = m.CoordinateSystem.X
+    gp.uw_function = sympy.Matrix(
+        [phi.sym[0].diff(X[i]) for i in range(cdim)]).T
+    return gp, gv
+
+
+def disp_of(gp, gv):
+    gp.solve()
+    return np.asarray(uw.function.evaluate(
+        gv.sym, coords0)).reshape(coords0.shape)
+
+
+def report(tag, hist_d, hist_dl, dn, sc):
+    a = np.array(hist_dl)
+    tol = 1e-3 * hist_d[0] if hist_d[0] else 1e-4
+    w = np.nonzero(a < tol)[0]
+    nconv = int(w[0]) if w.size else -1
+    print(f"  {tag}: final max|grad phi|={hist_d[-1]:.4e}  "
+          f"iters(Dlt<1e-3·d0)={nconv}  d/n={dn:.4f} (scale={sc:.3f})")
+    return nconv
+
+
+def run_bfo():
+    phi = uw.discretisation.MeshVariable(
+        "phi_b", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    ps = uw.systems.Poisson(m, phi)
+    ps.constitutive_model = uw.constitutive_models.DiffusionModel
+    ps.constitutive_model.Parameters.diffusivity = 1.0
+    ps.constant_nullspace = True
+    _use_direct_solver(ps, singular=True)
+    hs = _hessian_recovery_class()(m, phi, degree=2, verbose=False)
+    hs.tolerance = 1.0e-6
+    _use_direct_solver(hs)
+    Hf = hs.u.sym
+    Hxx, Hyy = Hf[0], Hf[3]
+    Hxy = (Hf[1] + Hf[2]) / 2
+    f_src = sympy.sqrt((Hxx - Hyy) ** 2 + 4 * Hxy ** 2 + 4 * g_expr) - 2
+    ps.f = sympy.Matrix([[-1.0 * f_src]])
+    hs.u.array[...] = 0.0
+    gp, gv = make_gproj(phi)
+    omega = 0.4
+    prev = None
+    hd, hl = [], []
+    print("  BFO-Picard:")
+    for it in range(N_BFO):
+        pp = np.asarray(phi.array).copy()
+        ps.solve(zero_init_guess=True)
+        phi.array[...] = (1 - omega) * pp + omega * np.asarray(phi.array)
+        hs.solve()
+        d = disp_of(gp, gv)
+        dmax = float(np.linalg.norm(d, axis=1).max())
+        dl = float(np.abs(d - prev).max()) if prev is not None else dmax
+        prev = d
+        hd.append(dmax); hl.append(dl)
+        if it < 5 or it % 5 == 0 or it == N_BFO - 1:
+            print(f"    it {it:2d}  max|gφ|={dmax:.4e}  Δ={dl:.3e}",
+                  flush=True)
+    dn, sc = honest_dn_after_move(prev)
+    return hd, hl, dn, sc
+
+
+_CK = [None]
+PD_EPS = 0.05
+
+
+class _CofDiff(DiffusionModel):
+    def _build_c_tensor(self):
+        self._c = _CK[0]
+
+
+def project_pd(hs):
+    """Projection onto convex Hessians: clip eigenvalues of the
+    symmetrised (I+H) to >= PD_EPS, write H back. The MA convexity
+    safeguard that BFO's +sqrt branch supplies structurally."""
+    A = np.asarray(hs.u.array).reshape(-1, 4)
+    Hxx, Hyy = A[:, 0], A[:, 3]
+    Hxy = 0.5 * (A[:, 1] + A[:, 2])
+    a = 1.0 + Hxx
+    d = 1.0 + Hyy
+    b = Hxy
+    tr = a + d
+    dsc = np.sqrt(np.maximum((a - d) ** 2 + 4 * b ** 2, 0.0))
+    l1 = 0.5 * (tr - dsc)
+    l2 = 0.5 * (tr + dsc)
+    c1 = np.clip(l1, PD_EPS, None)
+    c2 = np.clip(l2, PD_EPS, None)
+    # eigenvectors of [[a,b],[b,d]]: rebuild M' = sum ci vi vi^T
+    # v for l1: (b, l1-a) normalised (fallback to axis if b~0)
+    vx = b.copy()
+    vy = l1 - a
+    nrm = np.sqrt(vx ** 2 + vy ** 2)
+    small = nrm < 1e-14
+    vx = np.where(small, 1.0, vx / np.where(small, 1.0, nrm))
+    vy = np.where(small, 0.0, vy / np.where(small, 1.0, nrm))
+    wx, wy = -vy, vx                       # orthogonal eigenvector
+    Mxx = c1 * vx * vx + c2 * wx * wx
+    Mxy = c1 * vx * vy + c2 * wx * wy
+    Myy = c1 * vy * vy + c2 * wy * wy
+    A[:, 0] = Mxx - 1.0
+    A[:, 1] = Mxy
+    A[:, 2] = Mxy
+    A[:, 3] = Myy - 1.0
+    hs.u.array[...] = A.reshape(hs.u.array.shape)
+
+
+def run_newton():
+    phi = uw.discretisation.MeshVariable(
+        "phi_n", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    dphi = uw.discretisation.MeshVariable(
+        "dphi_n", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    hs = _hessian_recovery_class()(m, phi, degree=2, verbose=False)
+    hs.tolerance = 1.0e-6
+    _use_direct_solver(hs)
+    Hf = hs.u.sym
+    Hxx, Hyy = Hf[0], Hf[3]
+    Hxy = (Hf[1] + Hf[2]) / 2
+    M = sympy.Matrix([[1 + Hxx, Hxy], [Hxy, 1 + Hyy]])
+    Ck = sympy.Matrix([[1 + Hyy, -Hxy], [-Hxy, 1 + Hxx]])
+    detIH = M.det()
+    _CK[0] = Ck
+    ps = uw.systems.Poisson(m, dphi)
+    ps.constitutive_model = _CofDiff
+    ps.constant_nullspace = True
+    _use_direct_solver(ps, singular=True)
+    ps.f = sympy.Matrix([[detIH - g_expr]])      # div(Ck∇dphi)=g-det
+    phi.array[...] = 0.0
+    hs.u.array[...] = 0.0
+    gp, gv = make_gproj(phi)
+    prev = None
+    hd, hl = [], []
+    print(f"  Newton/cofactor (lam={NEWT_LAM}, PD-projected Hessian, "
+          f"eps={PD_EPS}):")
+    for it in range(N_NEWT):
+        hs.solve()
+        project_pd(hs)                       # convexity safeguard
+        ps.solve(zero_init_guess=True)
+        step = np.asarray(dphi.array).copy()
+        base = np.asarray(phi.array).copy()
+        lam = NEWT_LAM
+        phi.array[...] = base + lam * step
+        hs.solve()
+        project_pd(hs)
+        d = disp_of(gp, gv)
+        dmax = float(np.linalg.norm(d, axis=1).max())
+        dl = float(np.abs(d - prev).max()) if prev is not None else dmax
+        prev = d
+        hd.append(dmax); hl.append(dl)
+        print(f"    it {it:2d}  max|gφ|={dmax:.4e}  Δ={dl:.3e}  "
+              f"lam={lam:.3g}", flush=True)
+        if dl < 1e-4 and it > 0:
+            break
+    dn, sc = honest_dn_after_move(prev)
+    return hd, hl, dn, sc
+
+
+t = time.perf_counter()
+bd, bl, bdn, bsc = run_bfo()
+nd, nl, ndn, nsc = run_newton()
+dt = time.perf_counter() - t
+print(f"\n=== Phase 0 run2  AMP={AMP} RES={RES}  ({dt:.1f}s) ===")
+nb = report("BFO   ", bd, bl, bdn, bsc)
+nn = report("NEWT  ", nd, nl, ndn, nsc)
+print(f"map agreement: |d/n_BFO - d/n_NEWT| = {abs(bdn-ndn):.4f}  "
+      f"(same fixed point if ~0; baseline d/n≈1.71)")
+np.savez("/tmp/metric_mesh/ma_newton_phase0.npz",
+         bfo_d=bd, bfo_dl=bl, nwt_d=nd, nwt_dl=nl,
+         bfo_dn=bdn, nwt_dn=ndn, amp=AMP, res=RES)
+print("saved /tmp/metric_mesh/ma_newton_phase0.npz")
diff --git a/scripts/ma_p1_gamg_scaling.py b/scripts/ma_p1_gamg_scaling.py
new file mode 100644
index 00000000..afb26f2c
--- /dev/null
+++ b/scripts/ma_p1_gamg_scaling.py
@@ -0,0 +1,79 @@
+"""P1 vs P2, GAMG vs direct, scaling with #triangles. P1 gives the
+smallest / most AMG-friendly matrices — does P1+GAMG converge
+robustly and scale where P2/P3+GAMG were erratic?  Grading expected
+P1≈1.40, P2≈1.71 (P1 is ~18% weaker — this is a robustness/scaling
+check, not a grading proposal). AMP=8. Per-row print (killable).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _WINSLOW_CACHE)
+
+R_O, R_I, WIDTH, AMP = 1.0, 0.5, 0.12, 8.0
+RESS = [16, 24, 32, 48, 64]
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    return float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+                 / nl[r > R_O - 0.05].mean())
+
+
+def case(res, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print(f"{'RES':>4} {'tris':>6} {'P':>2} {'solver':>7} | "
+      f"{'cold':>6} {'warm':>6} | {'d/n':>6} | gamg KSP | ok?")
+print("-" * 74)
+for res in RESS:
+    for pdeg in (1, 2):
+        for mode in ("direct", "gamg"):
+            m, f = case(res, f"{mode}{res}p{pdeg}")
+            e = _edge_pairs(m.dm)
+            ntri = _tri_cells(m.dm).shape[0]
+            pin = _auto_pinned_labels(m)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=pdeg)
+            cold = time.perf_counter() - t
+            dn = honest_ratio(np.asarray(m.X.coords), e)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=pdeg)
+            warm = time.perf_counter() - t
+            ks, ok = "", "y"
+            if mode == "gamg":
+                k = [kk for kk in _WINSLOW_CACHE
+                     if kk[0] == id(m) and kk[-2] == "gamg"
+                     and kk[-1] == pdeg][0]
+                ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+                rsn, nit = ksp.getConvergedReason(), ksp.getIterationNumber()
+                ks = f"r={rsn} it={nit}"
+                ok = "y" if (rsn > 0 and nit < 9999) else "FAIL"
+            exp = 1.40 if pdeg == 1 else 1.71
+            if abs(dn - exp) > 0.12:
+                ok = "FAIL"
+            print(f"{res:4d} {ntri:6d} {pdeg:2d} {mode:>7} | "
+                  f"{cold:6.2f} {warm:6.2f} | {dn:6.3f} | "
+                  f"{ks:>10} | {ok}", flush=True)
+    print("-" * 74)
diff --git a/scripts/ma_phi2_validate.py b/scripts/ma_phi2_validate.py
new file mode 100644
index 00000000..5de9de41
--- /dev/null
+++ b/scripts/ma_phi2_validate.py
@@ -0,0 +1,75 @@
+"""Validate φ=P2 vs the shipped φ=P3 on the DIRECT path across the
+full AMP sweep: grading must match the recorded baseline
+(1.02/1.43/1.71/1.54), AMP=0 exact no-op, no tangle (minA>0), and
+P2 should be ~2× cheaper. If it holds, phi_degree default 3→2 is a
+clean solver-independent win on top of the shipped ~10×.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+AMPS = [0.0, 2.0, 8.0, 20.0]
+BASE = {0.0: 1.02, 2.0: 1.43, 8.0: 1.71, 20.0: 1.54}
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    return float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+                 / nl[r > R_O - 0.05].mean())
+
+
+def min_area(m):
+    tris = _tri_cells(m.dm)
+    a = np.abs(_signed_areas(np.asarray(m.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print(f"{'AMP':>5} | {'P3 d/n':>7} {'P2 d/n':>7} {'base':>5} | "
+      f"{'P3 minA':>8} {'P2 minA':>8} | {'P3 t':>6} {'P2 t':>6} (cold)")
+print("-" * 72)
+for k, amp in enumerate(AMPS):
+    m3, f = case(amp, f"p3_{k}")
+    e = _edge_pairs(m3.dm); pin = _auto_pinned_labels(m3)
+    t = time.perf_counter()
+    _winslow_elliptic(m3, f, pin, False, phi_degree=3)
+    t3 = time.perf_counter() - t
+    dn3 = honest_ratio(np.asarray(m3.X.coords), e)
+    ma3 = min_area(m3)
+
+    m2, f = case(amp, f"p2_{k}")
+    e = _edge_pairs(m2.dm); pin = _auto_pinned_labels(m2)
+    t = time.perf_counter()
+    _winslow_elliptic(m2, f, pin, False, phi_degree=2)
+    t2 = time.perf_counter() - t
+    dn2 = honest_ratio(np.asarray(m2.X.coords), e)
+    ma2 = min_area(m2)
+
+    flag = "" if abs(dn3 - dn2) < 8e-3 else "  <-MISMATCH"
+    print(f"{amp:5.1f} | {dn3:7.3f} {dn2:7.3f} ~{BASE[amp]:.2f} | "
+          f"{ma3:8.4f} {ma2:8.4f} | {t3:6.2f} {t2:6.2f}{flag}",
+          flush=True)
diff --git a/scripts/ma_phi_order.py b/scripts/ma_phi_order.py
new file mode 100644
index 00000000..5d6494d4
--- /dev/null
+++ b/scripts/ma_phi_order.py
@@ -0,0 +1,75 @@
+"""Does the φ-potential order drive the GAMG fragility?
+
+Memory records P2/P3 give identical grading (the det term is inert;
+the cap is structural). P3 is GAMG-hostile (high-order stiffness
+defeats aggregation AMG). Test phi_degree ∈ {1,2,3} × {direct,gamg}
+at the resolution where gamg failed (24) and one above (32).
+Report grading (must stay ~1.71), cost, and the φ-KSP converged
+reason / iters (gamg) — the robustness signal.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels, _WINSLOW_CACHE)
+
+R_O, R_I, WIDTH, AMP = 1.0, 0.5, 0.12, 8.0
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    return float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+                 / nl[r > R_O - 0.05].mean())
+
+
+def case(res, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print(f"{'RES':>4} {'P':>2} {'solver':>7} | {'cold':>6} {'warm':>6} "
+      f"| {'d/n':>6} | φ-KSP reason/its (gamg)")
+print("-" * 70)
+for res in (24, 32):
+    for pdeg in (1, 2, 3):
+        for mode in ("direct", "gamg"):
+            m, f = case(res, f"{mode}{res}p{pdeg}")
+            e = _edge_pairs(m.dm)
+            pin = _auto_pinned_labels(m)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=pdeg)
+            cold = time.perf_counter() - t
+            dn = honest_ratio(np.asarray(m.X.coords), e)
+            t = time.perf_counter()
+            _winslow_elliptic(m, f, pin, False,
+                              linear_solver=mode, phi_degree=pdeg)
+            warm = time.perf_counter() - t
+            ks = ""
+            if mode == "gamg":
+                k = [kk for kk in _WINSLOW_CACHE
+                     if kk[0] == id(m) and kk[-2] == "gamg"
+                     and kk[-1] == pdeg][0]
+                ps = _WINSLOW_CACHE[k][1]
+                ksp = ps.snes.getKSP()
+                ks = (f"reason={ksp.getConvergedReason()} "
+                      f"its={ksp.getIterationNumber()}")
+            print(f"{res:4d} {pdeg:2d} {mode:>7} | {cold:6.2f} "
+                  f"{warm:6.2f} | {dn:6.3f} | {ks}", flush=True)
+    print("-" * 70)
diff --git a/scripts/ma_polar_lorentzian_slip.py b/scripts/ma_polar_lorentzian_slip.py
new file mode 100644
index 00000000..169c9234
--- /dev/null
+++ b/scripts/ma_polar_lorentzian_slip.py
@@ -0,0 +1,119 @@
+"""Polar Lorentzian metric + boundary sliding.
+
+Metric is defined in (r, theta): a feature at (r0, th0), sharp
+radial core Wr + angular core Wth, heavy Lorentzian tail so the
+whole annulus feels a pull in BOTH r and theta. Angular distance
+uses the branch-cut-free chord  ang2 = 2(1 - cos(th-th0))
+(no atan2 — safer for the JIT; ≈ (Δθ)² for small Δθ, periodic).
+
+  f = 1 + AMP / (1 + ((r-r0)/Wr)^2 + 2(1-cos(th-th0))/Wth^2)
+
+Compare boundary_slip OFF vs ON (per-ring tangential slide, radial
+DOF removed → nodes provably stay on the ring): with the angular
+feature + slip, boundary nodes should slide around toward th0.
+Interior-offset feature (r0=0.85, off the pinned rim) to avoid the
+boundary-spike sliver pathology. Also direct vs gamg robustness.
+res-24, AMP=8.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _pinned_mask, _WINSLOW_CACHE)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+R0, TH0, WR, WTH = 0.85, 0.6, 0.12, 0.35       # feature in (r,θ)
+PX, PY = R0 * np.cos(TH0), R0 * np.sin(TH0)    # feature in (x,y)
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]; Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    r = sympy.sqrt(x ** 2 + y ** 2)
+    cosdth = (x * np.cos(TH0) + y * np.sin(TH0)) / r   # cos(θ-θ0)
+    ang2 = 2 * (1 - cosdth)
+    d2 = ((r - R0) / WR) ** 2 + ang2 / WTH ** 2
+    f = 1.0 + AMP / (1.0 + d2)
+    return m, f, X0.copy()
+
+
+def diag(X, X0, edges, tris, m):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    nv = X.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    dP0 = np.hypot(X0[:, 0] - PX, X0[:, 1] - PY)
+    dPf = np.hypot(X[:, 0] - PX, X[:, 1] - PY)
+    near, far = dP0 < 0.15, dP0 > 0.6
+    ratio = (float(nl[far].mean() / nl[near].mean())
+             if near.any() and far.any() else float("nan"))
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    distant = dP0 > 0.5
+    inward = float((dP0[distant] - dPf[distant]).mean())
+    # outer-ring angular concentration toward TH0 + radial drift
+    is_b = _pinned_mask(m.dm, tuple(_auto_pinned_labels(m)))
+    rb = np.hypot(X[:, 0], X[:, 1])
+    outer = is_b & (rb > 0.9 * R_O)
+    th = np.arctan2(X[outer, 1], X[outer, 0])
+    dth = np.abs(np.arctan2(np.sin(th - TH0), np.cos(th - TH0)))
+    ncluster = int((dth < WTH).sum())            # nodes within core
+    drift = float(np.abs(rb[outer] - R_O).max())  # ~0 ⇒ stayed on rim
+    return ratio, minA, inward, ncluster, drift
+
+
+print(f"{'slip':>5} {'solver':>7} | {'far/near':>8} {'minA':>6} "
+      f"{'inward':>7} {'#@rim<Wth':>9} {'rimDrift':>9} | gamg KSP")
+print("-" * 80)
+panels = []
+m0, _, X0u = mk("u")
+tris = _tri_cells(m0.dm)
+panels.append(("undeformed", X0u))
+for slip in (False, True):
+    for mode in ("direct", "gamg"):
+        m, f, X0 = mk(f"s{int(slip)}{mode}")
+        e = _edge_pairs(m.dm); tr = _tri_cells(m.dm)
+        pin = _auto_pinned_labels(m)
+        _winslow_elliptic(m, f, pin, False, boundary_slip=slip,
+                          linear_solver=mode, phi_degree=2)
+        X = np.asarray(m.X.coords).copy()
+        r, mA, inw, nc, drf = diag(X, X0, e, tr, m)
+        ks = ""
+        if mode == "gamg":
+            k = [kk for kk in _WINSLOW_CACHE if kk[0] == id(m)
+                 and kk[-2] == "gamg" and kk[-1] == 2][0]
+            ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+            ks = f"r={ksp.getConvergedReason()} it={ksp.getIterationNumber()}"
+        print(f"{str(slip):>5} {mode:>7} | {r:8.3f} {mA:6.3f} "
+              f"{inw:+7.4f} {nc:9d} {drf:9.2e} | {ks}", flush=True)
+        if mode == "direct":
+            panels.append((f"slip={slip}", X))
+
+fig, ax = plt.subplots(1, 3, figsize=(16, 5.4))
+for a, (name, X) in zip(ax, panels):
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.4, color="#1f4e8c")
+    a.plot(PX, PY, "o", ms=12, mfc="none", mec="#c0392b", mew=2)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(name, fontsize=12)
+fig.suptitle(f"Polar Lorentzian, feature (r={R0}, θ={TH0:g}) red, "
+             f"AMP={AMP:g} res-{RES} — slip lets rim nodes slide "
+             f"toward θ₀", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.94])
+fig.savefig("/tmp/metric_mesh/ma_polar_slip.png", dpi=135)
+print("\nsaved /tmp/metric_mesh/ma_polar_slip.png")
diff --git a/scripts/ma_polar_lorentzian_slip_v2.py b/scripts/ma_polar_lorentzian_slip_v2.py
new file mode 100644
index 00000000..789145b9
--- /dev/null
+++ b/scripts/ma_polar_lorentzian_slip_v2.py
@@ -0,0 +1,139 @@
+"""Polar Lorentzian metric + boundary sliding.
+
+Metric is defined in (r, theta): a feature at (r0, th0), sharp
+radial core Wr + angular core Wth, heavy Lorentzian tail so the
+whole annulus feels a pull in BOTH r and theta. Angular distance
+uses the branch-cut-free chord  ang2 = 2(1 - cos(th-th0))
+(no atan2 — safer for the JIT; ≈ (Δθ)² for small Δθ, periodic).
+
+  f = 1 + AMP / (1 + ((r-r0)/Wr)^2 + 2(1-cos(th-th0))/Wth^2)
+
+Compare boundary_slip OFF vs ON (per-ring tangential slide, radial
+DOF removed → nodes provably stay on the ring): with the angular
+feature + slip, boundary nodes should slide around toward th0.
+Interior-offset feature (r0=0.85, off the pinned rim) to avoid the
+boundary-spike sliver pathology. Also direct vs gamg robustness.
+res-24, AMP=8.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas, _pinned_mask, _WINSLOW_CACHE)
+
+R_O, R_I, RES, AMP = 1.0, 0.5, 24, 8.0
+# Balanced (r,θ) cores so the feature is a genuine blob, not a thin
+# radial shell that swamps the angular localisation (v1 lesson).
+R0, TH0, WR, WTH = 0.85, 0.6, 0.18, 0.55       # feature in (r,θ)
+PX, PY = R0 * np.cos(TH0), R0 * np.sin(TH0)    # feature in (x,y)
+
+
+def mk(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    Xv = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    Xv.data[:, 0] = X0[:, 0]; Xv.data[:, 1] = X0[:, 1]
+    x, y = Xv.sym[0], Xv.sym[1]
+    r = sympy.sqrt(x ** 2 + y ** 2)
+    # TRUE signed wrapped angular separation (no chord saturation):
+    # single atan2 of linear combos of x,y (scale-invariant ⇒ /r
+    # cancels, no nesting). dthw ∈ (-π,π], ≈ θ-θ0 near the feature,
+    # grows linearly with angle so the Lorentzian tail has real
+    # angular reach (the v1 chord saturated at the antipode).
+    s_ = y * np.cos(TH0) - x * np.sin(TH0)     # r·sin(θ-θ0)
+    c_ = x * np.cos(TH0) + y * np.sin(TH0)     # r·cos(θ-θ0)
+    dthw = sympy.atan2(s_, c_)
+    d2 = ((r - R0) / WR) ** 2 + (dthw / WTH) ** 2
+    f = 1.0 + AMP / (1.0 + d2)
+    return m, f, X0.copy()
+
+
+def diag(X, X0, edges, tris, m):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(X[v1] - X[v0], axis=1)
+    nv = X.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    dP0 = np.hypot(X0[:, 0] - PX, X0[:, 1] - PY)
+    dPf = np.hypot(X[:, 0] - PX, X[:, 1] - PY)
+    near, far = dP0 < 0.15, dP0 > 0.6
+    ratio = (float(nl[far].mean() / nl[near].mean())
+             if near.any() and far.any() else float("nan"))
+    A = np.abs(_signed_areas(X, tris))
+    minA = float(A.min() / A.mean())
+    distant = dP0 > 0.5
+    inward = float((dP0[distant] - dPf[distant]).mean())
+    # explicit ANGULAR snuggle: nodes that started in the radial
+    # band but angularly far — did |Δθ| shrink (slide toward θ0)?
+    r0 = np.hypot(X0[:, 0], X0[:, 1])
+    rf = np.hypot(X[:, 0], X[:, 1])
+    a0 = np.arctan2(
+        X0[:, 1] * np.cos(TH0) - X0[:, 0] * np.sin(TH0),
+        X0[:, 0] * np.cos(TH0) + X0[:, 1] * np.sin(TH0))
+    af = np.arctan2(
+        X[:, 1] * np.cos(TH0) - X[:, 0] * np.sin(TH0),
+        X[:, 0] * np.cos(TH0) + X[:, 1] * np.sin(TH0))
+    band = (np.abs(r0 - R0) < 2 * WR) & (np.abs(a0) > WTH)
+    ang_snuggle = (float((np.abs(a0[band]) - np.abs(af[band])).mean())
+                   if band.any() else float("nan"))
+    # outer-ring angular concentration toward TH0 + radial drift
+    is_b = _pinned_mask(m.dm, tuple(_auto_pinned_labels(m)))
+    outer = is_b & (rf > 0.9 * R_O)
+    ncluster = int((np.abs(af[outer]) < WTH).sum())
+    drift = float(np.abs(rf[outer] - R_O).max())  # ~0 ⇒ on rim
+    return ratio, minA, inward, ang_snuggle, ncluster, drift
+
+
+print(f"{'slip':>5} {'solver':>7} | {'far/near':>8} {'minA':>6} "
+      f"{'inward':>7} {'angSnug':>8} {'#@rim<Wθ':>8} {'rimDrift':>9} "
+      f"| gamg KSP")
+print("-" * 92)
+panels = []
+m0, _, X0u = mk("u")
+tris = _tri_cells(m0.dm)
+panels.append(("undeformed", X0u))
+for slip in (False, True):
+    for mode in ("direct", "gamg"):
+        m, f, X0 = mk(f"s{int(slip)}{mode}")
+        e = _edge_pairs(m.dm); tr = _tri_cells(m.dm)
+        pin = _auto_pinned_labels(m)
+        _winslow_elliptic(m, f, pin, False, boundary_slip=slip,
+                          linear_solver=mode, phi_degree=2)
+        X = np.asarray(m.X.coords).copy()
+        r, mA, inw, asn, nc, drf = diag(X, X0, e, tr, m)
+        ks = ""
+        if mode == "gamg":
+            k = [kk for kk in _WINSLOW_CACHE if kk[0] == id(m)
+                 and kk[-2] == "gamg" and kk[-1] == 2][0]
+            ksp = _WINSLOW_CACHE[k][1].snes.getKSP()
+            ks = f"r={ksp.getConvergedReason()} it={ksp.getIterationNumber()}"
+        print(f"{str(slip):>5} {mode:>7} | {r:8.3f} {mA:6.3f} "
+              f"{inw:+7.4f} {asn:+8.4f} {nc:8d} {drf:9.2e} | {ks}",
+              flush=True)
+        if mode == "direct":
+            panels.append((f"slip={slip}", X))
+
+fig, ax = plt.subplots(1, 3, figsize=(16, 5.4))
+for a, (name, X) in zip(ax, panels):
+    a.triplot(mtri.Triangulation(X[:, 0], X[:, 1], tris),
+              lw=0.4, color="#1f4e8c")
+    a.plot(PX, PY, "o", ms=12, mfc="none", mec="#c0392b", mew=2)
+    a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+    a.set_title(name, fontsize=12)
+fig.suptitle(f"Polar Lorentzian v2 (balanced cores, true angle), "
+             f"feature (r={R0}, θ={TH0:g}) red, AMP={AMP:g} res-{RES} "
+             f"— slip lets rim nodes slide toward θ₀", fontsize=12)
+fig.tight_layout(rect=[0, 0, 1, 0.94])
+fig.savefig("/tmp/metric_mesh/ma_polar_slip_v2.png", dpi=135)
+print("\nsaved /tmp/metric_mesh/ma_polar_slip_v2.png")
diff --git a/scripts/ma_profile_diag.py b/scripts/ma_profile_diag.py
new file mode 100644
index 00000000..90ed0f3e
--- /dev/null
+++ b/scripts/ma_profile_diag.py
@@ -0,0 +1,75 @@
+"""Decisive diagnostic: compare the FE Monge-Ampere realized radial
+node displacement to the EXACT equidistribution displacement.
+
+Exact: node initially at fractional radial position xi (uniform
+start) must move to r_eq(xi) = interp(xi, cumulative-mass, s). The
+required radial displacement profile dr_exact(r0) is what grad(phi)
+must reproduce. If the FE dr is an order of magnitude smaller / a
+near-uniform shift rather than the sharply varying exact profile,
+the FE MA solve is converging to a spurious (non-Brenier) branch,
+not just an inaccurate Hessian.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _auto_pinned_labels)
+
+R_I, R_O = 0.5, 1.0
+WIDTH = 0.12
+AMP = 8.0
+RES = 16
+
+# ---- exact radial equidistribution map ----
+N = 200_000
+s = np.linspace(R_I, R_O, N)
+ds = s[1] - s[0]
+rho = 1.0 + AMP * np.exp(-(((s - R_O) / WIDTH) ** 2))
+dens = rho * s
+m = np.concatenate([[0.0], np.cumsum(0.5 * (dens[1:] + dens[:-1]) * ds)])
+m /= m[-1]
+
+
+def r_exact(r0):
+    xi = (r0 - R_I) / (R_O - R_I)
+    return np.interp(xi, m, s)
+
+
+# ---- FE Monge-Ampere solve ----
+mesh = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                          cellSize=1.0 / RES, qdegree=3)
+r0v = uw.discretisation.MeshVariable(
+    "r0diag", mesh, vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+X0 = np.asarray(mesh.X.coords)
+r0v.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+f = 1.0 + AMP * sympy.exp(-(((r0v.sym[0]) - R_O) / WIDTH) ** 2)
+
+c0 = np.asarray(mesh.X.coords).copy()
+rad0 = np.sqrt((c0 ** 2).sum(axis=1))
+pinned = _auto_pinned_labels(mesh)
+_winslow_elliptic(mesh, f, pinned, True, n_picard=120, relax=1.0,
+                  step_frac=None, picard_relax=0.25)
+c1 = np.asarray(mesh.X.coords).copy()
+rad1 = np.sqrt((c1 ** 2).sum(axis=1))
+
+dr_fe = rad1 - rad0
+dr_ex = r_exact(rad0) - rad0
+
+# bin by initial radius
+edges = np.linspace(R_I, R_O, 11)
+print(f"\nAMP={AMP}  radial displacement: FE vs EXACT (by r0 bin)")
+print(f"{'r0 bin':>14}  {'mean dr_FE':>11}  {'mean dr_EXACT':>13}  "
+      f"{'ratio':>7}")
+for k in range(len(edges) - 1):
+    lo, hi = edges[k], edges[k + 1]
+    sel = (rad0 >= lo) & (rad0 < hi)
+    if sel.sum() == 0:
+        continue
+    fe = dr_fe[sel].mean()
+    ex = dr_ex[sel].mean()
+    rr = fe / ex if abs(ex) > 1e-9 else float("nan")
+    print(f"  [{lo:.3f},{hi:.3f})  {fe:>11.4f}  {ex:>13.4f}  "
+          f"{rr:>7.3f}")
+print(f"\n max|dr_FE|={np.abs(dr_fe).max():.4f}   "
+      f"max|dr_EXACT|={np.abs(dr_ex).max():.4f}")
diff --git a/scripts/ma_profile_phases.py b/scripts/ma_profile_phases.py
new file mode 100644
index 00000000..0aa0ed56
--- /dev/null
+++ b/scripts/ma_profile_phases.py
@@ -0,0 +1,82 @@
+"""Phase-resolved profile of a single _winslow_elliptic (MA) call,
+cold then warm, AMP=8 on the res-16 Annulus. Wraps the inner
+solver objects' .solve() with timers (via the cache) so we see
+where the ~12 s cold / ~34 s warm goes: φ Poisson, Hessian
+recovery, ∇φ projection, evaluate, _deform_mesh, and the
+first-iter DM/SNES (re)build.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import smoothing as S
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+
+
+def case(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+T = {}
+
+
+def _wrap(obj, label):
+    raw = obj.solve
+    def timed(*a, **k):
+        t = time.perf_counter()
+        r = raw(*a, **k)
+        T.setdefault(label, []).append(time.perf_counter() - t)
+        return r
+    obj.solve = timed
+    return raw
+
+
+m, f = case("p0")
+pin = S._auto_pinned_labels(m)
+
+t = time.perf_counter()
+S._winslow_elliptic(m, f, pin, False)
+cold = time.perf_counter() - t
+
+# wrap cached inner solvers for the warm call
+key = list(S._WINSLOW_CACHE)[0]
+phi, ps, gradphi, gproj, hsolver, vol_field = S._WINSLOW_CACHE[key]
+_wrap(ps, "phi_poisson")
+_wrap(hsolver, "hessian")
+_wrap(gproj, "gradphi_proj")
+_dm_raw = m._deform_mesh
+def _dm_timed(*a, **k):
+    t0 = time.perf_counter()
+    r = _dm_raw(*a, **k)
+    T.setdefault("deform_mesh", []).append(time.perf_counter() - t0)
+    return r
+m._deform_mesh = _dm_timed
+
+t = time.perf_counter()
+S._winslow_elliptic(m, f, pin, False)
+warm = time.perf_counter() - t
+
+print(f"\n=== MA AMP={AMP} RES={RES}  cold={cold:.2f}s  warm={warm:.2f}s ===")
+for label in ("phi_poisson", "hessian", "gradphi_proj", "deform_mesh"):
+    v = T.get(label, [])
+    if not v:
+        continue
+    a = np.array(v)
+    print(f"{label:14s} n={len(a):3d}  total={a.sum():7.2f}s  "
+          f"mean={a.mean()*1e3:8.1f}ms  first={a[0]*1e3:8.1f}ms  "
+          f"rest_mean={(a[1:].mean()*1e3 if len(a)>1 else 0):8.1f}ms")
+acct = sum(np.array(T.get(l, [0])).sum()
+           for l in ("phi_poisson", "hessian", "gradphi_proj",
+                     "deform_mesh"))
+print(f"{'accounted':14s}            total={acct:7.2f}s  "
+      f"(warm {warm:.2f}s; unaccounted {warm-acct:.2f}s)")
diff --git a/scripts/ma_radial_anisotropy.py b/scripts/ma_radial_anisotropy.py
new file mode 100644
index 00000000..fd7be9e4
--- /dev/null
+++ b/scripts/ma_radial_anisotropy.py
@@ -0,0 +1,127 @@
+"""Anisotropy-aware truth: measure RADIAL vs TANGENTIAL edge length
+vs radius for the MA-redistributed Annulus (res-16, AMP=8), P1/P2/P3
++ undeformed. The reported deep/near ≈1.71 is a per-node mean of ALL
+incident edges — it averages the collapsed radial edges with the
+≈-unchanged tangential ones and so HIDES the true radial
+compression the user sees in the grid. Overlay the exact 1D radial
+OT spacing for reference. Saves /tmp/metric_mesh/ma_radial_profile.png.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels)
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+
+
+def case(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+def split_edges(coords, edges):
+    """Return (r_mid, Lrad, Ltan) for each edge: classify by the
+    angle of the edge vector to the local radial direction."""
+    p0, p1 = coords[edges[:, 0]], coords[edges[:, 1]]
+    mid = 0.5 * (p0 + p1)
+    rmid = np.linalg.norm(mid, axis=1)
+    rhat = mid / np.maximum(rmid, 1e-30)[:, None]
+    ev = p1 - p0
+    L = np.linalg.norm(ev, axis=1)
+    radial_frac = np.abs((ev * rhat).sum(axis=1)) / np.maximum(L, 1e-30)
+    is_rad = radial_frac > np.cos(np.pi / 4)      # within 45° of r̂
+    return rmid, L, is_rad
+
+
+m0, _ = case("u")
+edges = _edge_pairs(m0.dm)
+X0 = np.asarray(m0.X.coords).copy()
+dr0 = (R_O - R_I) / RES                            # nominal radial Δ
+
+# exact 1-D radial OT profile (the ground truth for the radial dir)
+s = np.linspace(R_I, R_O, 200_000)
+rho = 1.0 + AMP * np.exp(-(((s - R_O) / WIDTH) ** 2))
+dens = rho * s
+mcum = np.concatenate(
+    [[0.0], np.cumsum(0.5 * (dens[1:] + dens[:-1]) * (s[1] - s[0]))])
+mcum /= mcum[-1]
+rn = np.interp(np.linspace(0, 1, RES + 1), mcum, s)
+ot_rmid = 0.5 * (rn[1:] + rn[:-1])
+ot_dr = np.diff(rn)
+
+bins = np.linspace(R_I, R_O, 13)
+bc = 0.5 * (bins[1:] + bins[:-1])
+
+
+def radial_profile(coords):
+    rmid, L, is_rad = split_edges(coords, edges)
+    out_r, out_t = [], []
+    for i in range(len(bins) - 1):
+        m = (rmid >= bins[i]) & (rmid < bins[i + 1])
+        out_r.append(L[m & is_rad].mean() if (m & is_rad).any()
+                     else np.nan)
+        out_t.append(L[m & ~is_rad].mean() if (m & ~is_rad).any()
+                     else np.nan)
+    return np.array(out_r), np.array(out_t)
+
+
+fig, (axr, axt) = plt.subplots(1, 2, figsize=(15, 5.6))
+runs = [("undeformed", X0)]
+for p in (1, 2, 3):
+    m, f = case(f"p{p}")
+    pin = _auto_pinned_labels(m)
+    _winslow_elliptic(m, f, pin, False, phi_degree=p)
+    runs.append((f"P{p}", np.asarray(m.X.coords).copy()))
+
+cols = {"undeformed": "k", "P1": "#2a9d8f",
+        "P2": "#1f4e8c", "P3": "#c0392b"}
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+tris = _tri_cells(m0.dm)
+und_rad = None
+und_minrad = None
+print(f"{'mesh':>10} | band-mean radial Δ (rim) | vs undef | "
+      f"MIN radial Δ | vs undef | minA/meanA")
+print("-" * 82)
+for name, X in runs:
+    pr, pt = radial_profile(X)
+    axr.plot(bc, pr, "o-", lw=1.6, ms=4, color=cols[name], label=name)
+    axt.plot(bc, pt, "o-", lw=1.6, ms=4, color=cols[name], label=name)
+    rr = pr[np.isfinite(pr)][-1]
+    # global MIN radial edge length (the thinnest sliver anywhere)
+    rmid, L, is_rad = split_edges(X, edges)
+    minrad = L[is_rad].min()
+    A = np.abs(_signed_areas(X, tris))
+    minA = A.min() / A.mean()
+    if name == "undeformed":
+        und_rad, und_minrad = rr, minrad
+    print(f"{name:>10} | {rr:24.4f} | {rr/und_rad:8.3f} | "
+          f"{minrad:12.5f} | {minrad/und_minrad:8.3f} | {minA:8.4f}")
+
+axr.plot(ot_rmid, ot_dr, "k--", lw=1.4,
+         label="exact 1-D radial OT")
+axr.set_title("RADIAL edge length vs radius")
+axt.set_title("TANGENTIAL edge length vs radius")
+for a in (axr, axt):
+    a.set_xlabel("radius"); a.set_ylabel("edge length")
+    a.axvline(R_O, color="grey", ls=":", lw=0.8)
+    a.legend(fontsize=9); a.grid(alpha=0.3)
+fig.suptitle(f"Annulus res-{RES}, AMP={AMP:g}: the d/n≈1.71 metric "
+             f"averages these two — radial collapses, tangential ~frozen",
+             fontsize=13)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+fig.savefig("/tmp/metric_mesh/ma_radial_profile.png", dpi=130)
+print("\nsaved /tmp/metric_mesh/ma_radial_profile.png")
+print(f"exact 1-D radial OT near/dr0 (AMP={AMP:g}) = "
+      f"{ot_dr.min()/dr0:.3f}")
diff --git a/scripts/ma_res_sweep.py b/scripts/ma_res_sweep.py
new file mode 100644
index 00000000..c7adbf54
--- /dev/null
+++ b/scripts/ma_res_sweep.py
@@ -0,0 +1,69 @@
+"""Is the BFO convex-branch undershoot a single-shot under-resolution
+limit? Sweep mesh resolution for AMP=8 and compare the converged FE
+deep/near edge ratio to the (resolution-independent) exact ~10.5.
+
+If the FE ratio climbs strongly toward ~10 as RES increases, a
+single MA solve is resolution-limited in the ~WIDTH-wide metric
+band ⇒ the path forward is an outer map iteration (mesh adapts
+toward the feature, re-solve with better local resolution). If it
+stays ~1.1 at all RES, the formulation is wrong.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _auto_pinned_labels)
+
+R_I, R_O = 0.5, 1.0
+WIDTH = 0.12
+AMP = 8.0
+
+
+def mesh_triangles(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    tris = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            tris.append(vs)
+    return np.asarray(tris, dtype=np.int64)
+
+
+def edge_ratio(coords, TRI):
+    a, b, c = coords[TRI[:, 0]], coords[TRI[:, 1]], coords[TRI[:, 2]]
+    el = np.concatenate([
+        np.linalg.norm(a - b, axis=1),
+        np.linalg.norm(b - c, axis=1),
+        np.linalg.norm(c - a, axis=1)])
+    cent = (a + b + c) / 3.0
+    cr = np.repeat(np.sqrt((cent ** 2).sum(axis=1)), 3)
+    near = cr > (R_O - WIDTH)
+    deep = cr < (R_O - 0.30)
+    return el[near].mean(), el[deep].mean()
+
+
+print(f"AMP={AMP}  exact deep/near ≈ 10.5 (resolution-independent)")
+for RES in (16, 32, 48):
+    mesh = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                              cellSize=1.0 / RES, qdegree=3)
+    TRI = mesh_triangles(mesh)
+    c0 = np.asarray(mesh.X.coords).copy()
+    r0 = uw.discretisation.MeshVariable(
+        f"r0rs{RES}", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    X0 = np.asarray(mesh.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    pinned = _auto_pinned_labels(mesh)
+    _winslow_elliptic(mesh, f, pinned, False, n_picard=120,
+                      relax=1.0, step_frac=None, picard_relax=0.25)
+    c1 = np.asarray(mesh.X.coords).copy()
+    en0, ed0 = edge_ratio(c0, TRI)
+    en1, ed1 = edge_ratio(c1, TRI)
+    print(f"  RES={RES:>2}  cells/band≈{WIDTH*RES:.1f}  "
+          f"deep/near {ed0/en0:.3f}->{ed1/en1:.3f}  "
+          f"near {en0:.4f}->{en1:.4f}")
diff --git a/scripts/ma_show_grids.py b/scripts/ma_show_grids.py
new file mode 100644
index 00000000..bd9717f4
--- /dev/null
+++ b/scripts/ma_show_grids.py
@@ -0,0 +1,95 @@
+"""Render the MA-redistributed Annulus grids: undeformed vs
+phi_degree 1/2/3 (direct path, AMP=8, res-16). Node moves only —
+topology fixed — so we triplot the fixed connectivity with the
+deformed coords. Title shows the honest deep/near grading.
+Saves /tmp/metric_mesh/ma_grids.png.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels, _tri_cells)
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    return float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+                 / nl[r > R_O - 0.05].mean())
+
+
+def case(tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+fig, ax = plt.subplots(1, 4, figsize=(20, 5.4))
+m0, _ = case("u")
+tris = _tri_cells(m0.dm)
+edges = _edge_pairs(m0.dm)
+X0 = np.asarray(m0.X.coords).copy()
+
+panels = [("undeformed", X0, None)]
+for pdeg in (1, 2, 3):
+    m, f = case(f"p{pdeg}")
+    pin = _auto_pinned_labels(m)
+    _winslow_elliptic(m, f, pin, False, phi_degree=pdeg)
+    Xd = np.asarray(m.X.coords).copy()
+    panels.append((f"phi P{pdeg}", Xd,
+                    honest_ratio(Xd, edges)))
+
+for a, (name, X, dn) in zip(ax, panels):
+    t = mtri.Triangulation(X[:, 0], X[:, 1], tris)
+    a.triplot(t, lw=0.4, color="#1f4e8c")
+    a.set_aspect("equal")
+    a.set_xticks([]); a.set_yticks([])
+    ttl = name if dn is None else f"{name}   d/n = {dn:.3f}"
+    a.set_title(ttl, fontsize=13)
+    th = np.linspace(0, 2 * np.pi, 400)
+    for rr, c in ((R_O, "#c0392b"), (R_I, "#c0392b")):
+        a.plot(rr * np.cos(th), rr * np.sin(th), c, lw=0.6, alpha=0.5)
+
+fig.suptitle(
+    f"MA metric redistribution — Annulus res-{RES}, AMP={AMP:g} "
+    f"(metric peaks at the outer rim r={R_O:g})", fontsize=14)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+out = "/tmp/metric_mesh/ma_grids.png"
+fig.savefig(out, dpi=130)
+print("saved", out)
+
+# zoom on the outer band where the grading concentrates
+fig2, ax2 = plt.subplots(1, 4, figsize=(20, 5.4))
+for a, (name, X, dn) in zip(ax2, panels):
+    t = mtri.Triangulation(X[:, 0], X[:, 1], tris)
+    a.triplot(t, lw=0.5, color="#1f4e8c")
+    a.set_aspect("equal")
+    a.set_xlim(-0.15, 0.15); a.set_ylim(0.82, 1.04)
+    a.set_xticks([]); a.set_yticks([])
+    a.set_title(name if dn is None else f"{name}  d/n={dn:.3f}",
+                fontsize=13)
+fig2.suptitle("Outer-rim zoom (top of annulus) — node bunching "
+              "toward r=1", fontsize=14)
+fig2.tight_layout(rect=[0, 0, 1, 0.95])
+out2 = "/tmp/metric_mesh/ma_grids_zoom.png"
+fig2.savefig(out2, dpi=130)
+print("saved", out2)
diff --git a/scripts/ma_show_grids_hires.py b/scripts/ma_show_grids_hires.py
new file mode 100644
index 00000000..3f191fd6
--- /dev/null
+++ b/scripts/ma_show_grids_hires.py
@@ -0,0 +1,114 @@
+"""High-res MA grids: undeformed / P1 / P2 at RES 32 & 48 (AMP=8),
+full annulus + outer-rim zoom, annotated with the HONEST anisotropic
+numbers (d/n, minA/meanA, min-radial-edge vs undeformed) so the rim
+over-collapse is visible and quantified at finer resolution.
+Saves /tmp/metric_mesh/ma_grids_hires{,_zoom}.png.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.tri as mtri
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, AMP = 1.0, 0.5, 0.12, 8.0
+RESS = [32, 48]
+DEGS = [None, 1, 2]            # None = undeformed
+
+
+def case(res, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+def metrics(coords, edges, tris):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    dn = float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+               / nl[r > R_O - 0.05].mean())
+    p0, p1 = coords[v0], coords[v1]
+    mid = 0.5 * (p0 + p1)
+    rm = np.linalg.norm(mid, axis=1)
+    rhat = mid / np.maximum(rm, 1e-30)[:, None]
+    ev = p1 - p0
+    rad = np.abs((ev * rhat).sum(axis=1)) / np.maximum(Le, 1e-30) \
+        > np.cos(np.pi / 4)
+    minrad = Le[rad].min()
+    A = np.abs(_signed_areas(coords, tris))
+    return dn, minrad, float(A.min() / A.mean())
+
+
+def build():
+    data = {}
+    for res in RESS:
+        m0, _ = case(res, f"u{res}")
+        edges = _edge_pairs(m0.dm)
+        tris = _tri_cells(m0.dm)
+        X0 = np.asarray(m0.X.coords).copy()
+        und_minrad = metrics(X0, edges, tris)[1]
+        col = {None: (X0, None, None, 1.0)}
+        for d in DEGS:
+            if d is None:
+                continue
+            m, f = case(res, f"p{d}_{res}")
+            pin = _auto_pinned_labels(m)
+            _winslow_elliptic(m, f, pin, False, phi_degree=d)
+            X = np.asarray(m.X.coords).copy()
+            dn, mr, mA = metrics(X, edges, tris)
+            col[d] = (X, dn, mA, mr / und_minrad)
+        data[res] = (tris, col)
+    return data
+
+
+data = build()
+titles = {None: "undeformed", 1: "P1", 2: "P2"}
+
+for zoom in (False, True):
+    fig, ax = plt.subplots(len(RESS), len(DEGS),
+                           figsize=(15, 5.0 * len(RESS)))
+    for i, res in enumerate(RESS):
+        tris, col = data[res]
+        for j, d in enumerate(DEGS):
+            a = ax[i, j]
+            X, dn, mA, mrr = col[d]
+            t = mtri.Triangulation(X[:, 0], X[:, 1], tris)
+            a.triplot(t, lw=0.22 if not zoom else 0.5,
+                      color="#1f4e8c")
+            a.set_aspect("equal"); a.set_xticks([]); a.set_yticks([])
+            if zoom:
+                a.set_xlim(-0.16, 0.16); a.set_ylim(0.83, 1.03)
+            lbl = f"res-{res}  {titles[d]}"
+            if dn is not None:
+                lbl += (f"\nd/n={dn:.2f}  minA/meanA={mA:.3f}"
+                        f"  minRad={mrr:.2f}× undef")
+            a.set_title(lbl, fontsize=11)
+            if not zoom:
+                th = np.linspace(0, 2 * np.pi, 400)
+                a.plot(R_O * np.cos(th), R_O * np.sin(th),
+                       "#c0392b", lw=0.5, alpha=0.4)
+    tag = "outer-rim zoom" if zoom else "full annulus"
+    fig.suptitle(f"MA grids, AMP={AMP:g} — {tag} "
+                 f"(metric peaks at the pinned rim r=1)", fontsize=14)
+    fig.tight_layout(rect=[0, 0, 1, 0.96])
+    out = ("/tmp/metric_mesh/ma_grids_hires_zoom.png" if zoom
+           else "/tmp/metric_mesh/ma_grids_hires.png")
+    fig.savefig(out, dpi=135)
+    print("saved", out)
diff --git a/scripts/ma_slip_test.py b/scripts/ma_slip_test.py
new file mode 100644
index 00000000..02d435d2
--- /dev/null
+++ b/scripts/ma_slip_test.py
@@ -0,0 +1,134 @@
+"""Does boundary tangential slip relieve the volumetric spring's
+touchy/anisotropic refinement? Volumetric spring, slip OFF vs ON,
+on the localised INTERIOR blob (where the streakiness showed) and
+the surface band. Plots a zoom for visual judgement.
+
+SAFETY CHECK (user concern: nodes drifting off the surface): we
+report max |r_final − r_orig| over boundary nodes — with the
+per-ring radius projection this MUST be ~0 (slip is purely
+tangential; the radial DOF is removed).
+"""
+from __future__ import annotations
+import os
+import time
+import numpy as np
+import matplotlib.pyplot as plt
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels,
+    _pinned_mask, _tri_cells, _signed_areas)
+
+OUT = "/tmp/metric_mesh"
+R_O, R_I, RES = 1.0, 0.5, 16
+CX, CY, W = 0.78, 0.0, 0.12
+AMPS = [8.0, 20.0]
+
+
+def mesh_tris(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    o = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            o.append(vs)
+    return np.asarray(o, np.int64)
+
+
+def far_near(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); cnt = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(cnt, a, 1.0)
+    nl = s / np.maximum(cnt, 1.0)
+    d = np.hypot(coords[:, 0] - CX, coords[:, 1] - CY)
+    return float(nl[d > 4 * W].mean() / nl[d < W].mean())
+
+
+def quality(coords, TRI):
+    a = np.abs(_signed_areas(coords, TRI))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    X0v = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    X0v.data[:, 0] = X0[:, 0]
+    X0v.data[:, 1] = X0[:, 1]
+    d2 = (X0v.sym[0] - CX) ** 2 + (X0v.sym[1] - CY) ** 2
+    f = 1.0 + amp * sympy.exp(-d2 / W ** 2)
+    return m, f, X0
+
+
+def bnd_drift(m, X0):
+    """max radial change of boundary nodes about the origin."""
+    isb = _pinned_mask(m.dm, _auto_pinned_labels(m))
+    c1 = np.asarray(m.X.coords)
+    r0 = np.linalg.norm(X0[isb], axis=1)
+    r1 = np.linalg.norm(c1[isb], axis=1)
+    return float(np.abs(r1 - r0).max())
+
+
+res = []
+for amp in AMPS:
+    row = []
+    for slip in (False, True):
+        m, f, X0 = case(amp, f"{int(amp)}{int(slip)}")
+        e = _edge_pairs(m.dm); TRI = mesh_tris(m)
+        p = _auto_pinned_labels(m)
+        t = time.perf_counter()
+        _winslow_elliptic(m, f, p, False, boundary_slip=slip)
+        dt = time.perf_counter() - t
+        c1 = np.asarray(m.X.coords).copy()
+        print(f"AMP={amp:4.0f} slip={str(slip):5} "
+              f"far/near={far_near(c1, e):.2f} "
+              f"q={quality(c1, TRI):.3f} "
+              f"bnd_drift={bnd_drift(m, X0):.2e} "
+              f"{dt:.2f}s", flush=True)
+        row.append((slip, c1, TRI))
+    res.append((amp, row))
+
+if uw.mpi.rank == 0:
+    th = np.linspace(0, 2 * np.pi, 240)
+    fig, ax = plt.subplots(2, 4, figsize=(24, 12),
+                           facecolor="white")
+    for ri, (amp, row) in enumerate(res):
+        for ci, (slip, c1, TRI) in enumerate(row):
+            # full
+            a = ax[ri, 2 * ci]
+            a.triplot(c1[:, 0], c1[:, 1], TRI, "k-", lw=0.5)
+            a.plot(R_O * np.cos(th), R_O * np.sin(th),
+                   "tab:red", lw=1.2)
+            a.add_patch(plt.Circle((CX, CY), W, fill=False,
+                        ec="tab:green", lw=2))
+            a.set_title(f"AMP={amp:g} slip={slip} (full)",
+                        fontsize=13)
+            a.set_aspect("equal"); a.set_xticks([])
+            a.set_yticks([])
+            # zoom on blob
+            a = ax[ri, 2 * ci + 1]
+            a.triplot(c1[:, 0], c1[:, 1], TRI, "k-", lw=0.9)
+            a.add_patch(plt.Circle((CX, CY), W, fill=False,
+                        ec="tab:green", lw=2))
+            a.set_xlim(CX - 3 * W, CX + 3 * W)
+            a.set_ylim(CY - 3 * W, CY + 3 * W)
+            a.set_title(f"AMP={amp:g} slip={slip} (blob zoom)",
+                        fontsize=13)
+            a.set_aspect("equal"); a.set_xticks([])
+            a.set_yticks([])
+    fig.suptitle("MA: boundary slip OFF vs ON — "
+                 "does tangential rim motion relieve the "
+                 "anisotropy at the interior blob?", fontsize=14)
+    fig.tight_layout(rect=[0, 0, 1, 0.96])
+    pth = os.path.join(OUT, "ma_slip.png")
+    fig.savefig(pth, dpi=130, bbox_inches="tight")
+    print("Saved", pth)
diff --git a/scripts/ma_solver_scaling.py b/scripts/ma_solver_scaling.py
new file mode 100644
index 00000000..021031fe
--- /dev/null
+++ b/scripts/ma_solver_scaling.py
@@ -0,0 +1,71 @@
+"""Cost scaling: serial MUMPS direct vs GAMG-reuse, AMP=8, as the
+Annulus is refined. Direct is optimal for tiny 2D problems; the
+question is whether the GAMG-reuse path's cost grows *slower* (the
+parallel/3D-scalable argument). Grading must stay bit-for-bit at
+every resolution. Per-RES print (killable).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs, _auto_pinned_labels)
+
+R_O, R_I, WIDTH, AMP = 1.0, 0.5, 0.12, 8.0
+RESS = [16, 24, 32, 48]
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    return float(nl[(r >= R_I) & (r < R_I + 0.20)].mean()
+                 / nl[r > R_O - 0.05].mean())
+
+
+def case(res, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / res, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+def timed(m, f, mode):
+    e = _edge_pairs(m.dm); pin = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_elliptic(m, f, pin, False, linear_solver=mode)
+    cold = time.perf_counter() - t
+    dn = honest_ratio(np.asarray(m.X.coords), e)
+    t = time.perf_counter()
+    _winslow_elliptic(m, f, pin, False, linear_solver=mode)
+    warm = time.perf_counter() - t
+    return cold, warm, dn
+
+
+print(f"{'RES':>4} {'nodes':>7} | {'direct cold/warm':>18} | "
+      f"{'gamg cold/warm':>18} | {'warm ratio':>10} | d/n")
+print("-" * 78)
+for res in RESS:
+    md, f = case(res, f"d{res}")
+    nv = np.asarray(md.X.coords).shape[0]
+    dc, dw, dnd = timed(md, f, "direct")
+    mg, f = case(res, f"g{res}")
+    gc, gw, dng = timed(mg, f, "gamg")
+    ratio = gw / dw if dw > 0 else float("nan")
+    flag = "" if abs(dnd - dng) < 5e-3 else "  <-GRADING MISMATCH"
+    print(f"{res:4d} {nv:7d} | {dc:8.2f}{dw:9.2f}   | "
+          f"{gc:8.2f}{gw:9.2f}   | {ratio:9.2f}x | "
+          f"{dnd:.3f}/{dng:.3f}{flag}", flush=True)
+print("\n(direct optimal for tiny 2D; watch whether the gamg/direct "
+      "warm ratio shrinks with N — the 3D/parallel-scalable signal)")
diff --git a/scripts/ma_tune.py b/scripts/ma_tune.py
new file mode 100644
index 00000000..bdb5f128
--- /dev/null
+++ b/scripts/ma_tune.py
@@ -0,0 +1,98 @@
+"""Tune the Monge-Ampere equidistribution move on an undeformed
+Annulus: sweep (n_picard, relax, step_frac) and report the
+deep/near edge-length ratio AND the minimum signed triangle area
+(tangling check). Calls _winslow_elliptic directly so no rebuild is
+needed between parameter trials.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _auto_pinned_labels)
+
+R_O, R_I = 1.0, 0.5
+RES = 16
+WIDTH = 0.12
+
+
+def mesh_triangles(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    tris = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            tris.append(vs)
+    return np.asarray(tris, dtype=np.int64)
+
+
+def signed_areas(coords, TRI):
+    a, b, c = coords[TRI[:, 0]], coords[TRI[:, 1]], coords[TRI[:, 2]]
+    return 0.5 * ((b[:, 0] - a[:, 0]) * (c[:, 1] - a[:, 1])
+                  - (b[:, 1] - a[:, 1]) * (c[:, 0] - a[:, 0]))
+
+
+def edge_ratio(coords, TRI):
+    a, b, c = coords[TRI[:, 0]], coords[TRI[:, 1]], coords[TRI[:, 2]]
+    el = np.concatenate([
+        np.linalg.norm(a - b, axis=1),
+        np.linalg.norm(b - c, axis=1),
+        np.linalg.norm(c - a, axis=1)])
+    cent = (a + b + c) / 3.0
+    cr = np.repeat(np.sqrt((cent ** 2).sum(axis=1)), 3)
+    near = cr > (R_O - WIDTH)
+    deep = cr < (R_O - 0.30)
+    return el[near].mean(), el[deep].mean()
+
+
+TRIALS = [
+    dict(n_outer=1,  n_picard=40, relax=1.0, step_frac=None,
+         picard_relax=0.4),
+    dict(n_outer=4,  n_picard=40, relax=1.0, step_frac=None,
+         picard_relax=0.4),
+    dict(n_outer=8,  n_picard=40, relax=1.0, step_frac=None,
+         picard_relax=0.4),
+    dict(n_outer=15, n_picard=40, relax=1.0, step_frac=None,
+         picard_relax=0.4),
+]
+
+for amp in (0.0, 8.0, 20.0):
+    print(f"\n================  AMP = {amp:g}  ================")
+    for t in TRIALS:
+        mesh = uw.meshing.Annulus(
+            radiusOuter=R_O, radiusInner=R_I,
+            cellSize=1.0 / RES, qdegree=3)
+        TRI = mesh_triangles(mesh)
+        c0 = np.asarray(mesh.X.coords).copy()
+        a0 = signed_areas(c0, TRI)
+        orient = np.sign(np.median(a0))  # consistent CCW/CW sign
+
+        r0 = uw.discretisation.MeshVariable(
+            f"r0_{int(amp)}_{t['n_picard']}_{int(t['relax']*100)}",
+            mesh, vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+        X0 = np.asarray(mesh.X.coords)
+        r0.data[:, 0] = np.sqrt(X0[:, 0] ** 2 + X0[:, 1] ** 2)
+        f = 1.0 + amp * sympy.exp(
+            -(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+
+        pinned = _auto_pinned_labels(mesh)
+        _winslow_elliptic(mesh, f, pinned, False,
+                          n_outer=t["n_outer"],
+                          n_picard=t["n_picard"],
+                          relax=t["relax"],
+                          step_frac=t["step_frac"],
+                          picard_relax=t["picard_relax"])
+
+        c1 = np.asarray(mesh.X.coords).copy()
+        a1 = signed_areas(c1, TRI) * orient   # positive = good
+        en0, ed0 = edge_ratio(c0, TRI)
+        en1, ed1 = edge_ratio(c1, TRI)
+        n_inv = int((a1 <= 0.0).sum())
+        print(
+            f"  outer={t['n_outer']:>2}  "
+            f"ratio {ed0/en0:.3f}->{ed1/en1:.3f}  "
+            f"near {en0:.4f}->{en1:.4f}  deep {ed0:.4f}->{ed1:.4f}  "
+            f"min_area {a1.min():.2e}  inverted={n_inv}")
diff --git a/scripts/neumann_solve_probe.py b/scripts/neumann_solve_probe.py
new file mode 100644
index 00000000..79dda180
--- /dev/null
+++ b/scripts/neumann_solve_probe.py
@@ -0,0 +1,73 @@
+"""Root-cause isolation: is the pure-Neumann + constant_nullspace
+scalar Poisson solve itself reliable, independent of all the
+Monge-Ampere machinery?
+
+Manufactured radially-symmetric solution on the Annulus with
+phi'(R_I)=phi'(R_O)=0 (so natural zero-flux Neumann is exact):
+
+    phi_exact(r) = cos(pi * (r - R_I) / (R_O - R_I))
+    phi'_exact(r) = -(pi/L) sin(pi (r-R_I)/L),  L = R_O - R_I
+    source s = Laplacian phi = phi'' + phi'/r       (mean-NOT-zero;
+        subtract its area mean so the Neumann problem is compatible)
+
+Solve  Laplacian phi = s  with NO essential BC + constant_nullspace,
+then compare numeric |grad phi| to the exact profile across mesh
+resolution. If the error explodes / grad phi -> 0 as RES rises, the
+Neumann nullspace solve is the broken foundation under the MA work.
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+
+R_I, R_O = 0.5, 1.0
+L = R_O - R_I
+
+for RES in (16, 32, 48):
+    mesh = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                              cellSize=1.0 / RES, qdegree=3)
+    phi = uw.discretisation.MeshVariable(
+        f"phi_np{RES}", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    ps = uw.systems.Poisson(mesh, phi)
+    ps.constitutive_model = uw.constitutive_models.DiffusionModel
+    ps.constitutive_model.Parameters.diffusivity = 1.0
+    ps.constant_nullspace = True
+
+    x, y = mesh.X
+    rr = sympy.Symbol("rr", positive=True)
+    phi_r = sympy.cos(sympy.pi * (rr - R_I) / L)
+    fp = sympy.diff(phi_r, rr)
+    fpp = sympy.diff(fp, rr)
+    s_r = fpp + fp / rr           # radial Laplacian
+    r_cart = sympy.sqrt(x ** 2 + y ** 2)
+    s = s_r.subs(rr, r_cart)
+
+    # make the source mean-zero (area-weighted) for Neumann
+    # compatibility — sample nodally
+    coords = np.asarray(mesh.X.coords)
+    s_nodal = np.asarray(uw.function.evaluate(s, coords)).reshape(-1)
+    rad = np.sqrt((coords ** 2).sum(axis=1))
+    s_mean = float(np.mean(s_nodal))
+    ps.f = sympy.Matrix([[s - s_mean]])
+    ps.solve(zero_init_guess=True)
+
+    # numeric vs exact |grad phi| (radial), via a safe vector proj
+    gv = uw.discretisation.MeshVariable(
+        f"gphi_np{RES}", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    gp = uw.systems.Vector_Projection(mesh, gv)
+    gp.smoothing = 0.0
+    gp.uw_function = sympy.Matrix(
+        [phi.sym[0].diff(mesh.X[0]), phi.sym[0].diff(mesh.X[1])]).T
+    gp.solve()
+    g_arr = np.asarray(
+        uw.function.evaluate(gv.sym, coords)).reshape(len(coords), -1)
+    gnum = np.linalg.norm(g_arr[:, :2], axis=1)
+    gex = (np.pi / L) * np.abs(
+        np.sin(np.pi * (rad - R_I) / L))
+
+    rel = np.linalg.norm(gnum - gex) / max(
+        np.linalg.norm(gex), 1e-30)
+    print(f"RES={RES:>2}  max|gradphi| num={gnum.max():.4f} "
+          f"exact={gex.max():.4f}  rel-L2(grad)={rel:.3e}")
diff --git a/scripts/plot_metric_meshes.py b/scripts/plot_metric_meshes.py
new file mode 100644
index 00000000..e2f50e01
--- /dev/null
+++ b/scripts/plot_metric_meshes.py
@@ -0,0 +1,86 @@
+"""Plot-only (loads the saved case_*.npz from show_metric_mesh.py
+— no re-solving). Produces large, clear mesh figures to judge the
+metric-driven grading visually:
+
+  /tmp/metric_mesh/meshes_big.png   full annulus, Spring vs MA
+  /tmp/metric_mesh/meshes_zoom.png  zoomed outer-band wedge
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import matplotlib.pyplot as plt
+
+OUT = "/tmp/metric_mesh"
+R_O, R_I, WIDTH = 1.0, 0.5, 0.12
+AMPS = [0, 2, 8, 20]
+th = np.linspace(0, 2 * np.pi, 360)
+
+
+def load(method, amp):
+    d = np.load(os.path.join(OUT, f"case_{method}_amp{amp}.npz"))
+    return d["coords1"], d["tri"]
+
+
+# ---------- 1. big full-annulus grid ----------
+fig, axes = plt.subplots(2, 4, figsize=(22, 11.5),
+                         facecolor="white")
+for ri, method in enumerate(("spring", "ma")):
+    for ci, amp in enumerate(AMPS):
+        c1, TRI = load(method, amp)
+        ax = axes[ri, ci]
+        ax.set_facecolor("white")
+        ax.triplot(c1[:, 0], c1[:, 1], TRI, color="black", lw=0.6)
+        ax.plot(R_O * np.cos(th), R_O * np.sin(th),
+                color="tab:red", lw=1.4)
+        ax.plot(R_I * np.cos(th), R_I * np.sin(th),
+                color="tab:blue", lw=1.4)
+        ax.set_title(("uniform AMP=0" if amp == 0
+                      else f"AMP={amp}"), fontsize=14)
+        ax.set_aspect("equal")
+        ax.set_xticks([]); ax.set_yticks([])
+    axes[ri, 0].set_ylabel(
+        "ELASTIC-SPRING" if method == "spring"
+        else "MONGE–AMPÈRE", fontsize=15)
+fig.suptitle("Metric-driven mesh grading — full annulus "
+             "(red = outer surface where the metric peaks)",
+             fontsize=15)
+fig.tight_layout(rect=[0, 0, 1, 0.96])
+p1 = os.path.join(OUT, "meshes_big.png")
+fig.savefig(p1, dpi=130, bbox_inches="tight")
+print("Saved", p1)
+
+# ---------- 2. zoomed outer-band wedge (AMP 8 & 20) ----------
+fig, axes = plt.subplots(2, 4, figsize=(22, 11.5),
+                         facecolor="white")
+cols = [("spring", 8), ("ma", 8), ("spring", 20), ("ma", 20)]
+# top row: full; bottom row: zoom into a 70° wedge near r=R_O
+for ci, (method, amp) in enumerate(cols):
+    c1, TRI = load(method, amp)
+    label = ("Spring" if method == "spring" else "MA")
+    # full
+    ax = axes[0, ci]
+    ax.triplot(c1[:, 0], c1[:, 1], TRI, color="black", lw=0.6)
+    ax.plot(R_O * np.cos(th), R_O * np.sin(th),
+            color="tab:red", lw=1.4)
+    ax.add_patch(plt.Rectangle((0.30, -0.05), 0.78, 0.95,
+                 fill=False, ec="tab:green", lw=1.5, ls="--"))
+    ax.set_title(f"{label}  AMP={amp}", fontsize=14)
+    ax.set_aspect("equal"); ax.set_xticks([]); ax.set_yticks([])
+    # zoom: a box on the right side spanning the outer band
+    ax = axes[1, ci]
+    ax.triplot(c1[:, 0], c1[:, 1], TRI, color="black", lw=0.9)
+    ax.plot(R_O * np.cos(th), R_O * np.sin(th),
+            color="tab:red", lw=1.8)
+    ax.set_xlim(0.30, 1.08)
+    ax.set_ylim(-0.05, 0.90)
+    ax.set_title(f"{label}  AMP={amp}  (zoom: deep→surface)",
+                 fontsize=13)
+    ax.set_aspect("equal"); ax.set_xticks([]); ax.set_yticks([])
+fig.suptitle("Zoom on the deep-interior → outer-surface "
+             "transition (green dashed = zoom box). Look for the "
+             "fine band hugging the red surface vs coarse interior.",
+             fontsize=14)
+fig.tight_layout(rect=[0, 0, 1, 0.95])
+p2 = os.path.join(OUT, "meshes_zoom.png")
+fig.savefig(p2, dpi=130, bbox_inches="tight")
+print("Saved", p2)
diff --git a/scripts/precond_bench.py b/scripts/precond_bench.py
new file mode 100644
index 00000000..05d487eb
--- /dev/null
+++ b/scripts/precond_bench.py
@@ -0,0 +1,105 @@
+"""MA efficiency: is the cheap elastic spring a good PRECONDITIONER
+for the (stable but expensive) Monge–Ampère solve?
+
+For AMP ∈ {8, 20}, on identical fresh Annulus setups, compare:
+  A  MA only            (_winslow_elliptic from uniform; baseline)
+  B  spring only        (reference — cheap but weaker)
+  C  spring → MA n_out=1 (MA polishes; treats source as from-uniform)
+  D  spring → MA n_out=3 (MA polishes the patch-volume RESIDUAL)
+
+Report: wall time, honest deep/near grading, and mesh quality
+(min triangle area / mean — lower = nastier cells).
+Per-line flush so it is killable early.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _winslow_elliptic, _edge_pairs,
+    _auto_pinned_labels, _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def quality(mesh):
+    dm = mesh.dm
+    tris = _tri_cells(dm)
+    if tris is None:
+        return float("nan")
+    a = np.abs(_signed_areas(np.asarray(mesh.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print(f"{'AMP':>4} {'variant':>22} {'time/s':>8} "
+      f"{'deep/near':>10} {'minA/meanA':>11}")
+print("-" * 60)
+for amp in (8.0, 20.0):
+    # A: MA only
+    m, f = case(amp, f"A{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_elliptic(m, f, p, False)
+    dt = time.perf_counter() - t
+    print(f"{amp:4.0f} {'A MA only':>22} {dt:8.2f} "
+          f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+          f"{quality(m):11.4f}", flush=True)
+
+    # B: spring only
+    m, f = case(amp, f"B{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, f, p, False)
+    dt = time.perf_counter() - t
+    print(f"{amp:4.0f} {'B spring only':>22} {dt:8.2f} "
+          f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+          f"{quality(m):11.4f}", flush=True)
+
+    # C: spring -> MA (n_outer=1, source as-from-uniform)
+    m, f = case(amp, f"C{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, f, p, False)
+    _winslow_elliptic(m, f, p, False, n_outer=1)
+    dt = time.perf_counter() - t
+    print(f"{amp:4.0f} {'C spring->MA n1':>22} {dt:8.2f} "
+          f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+          f"{quality(m):11.4f}", flush=True)
+
+    # D: spring -> MA (n_outer=3, MA drives the patch-vol residual)
+    m, f = case(amp, f"D{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, f, p, False)
+    _winslow_elliptic(m, f, p, False, n_outer=3)
+    dt = time.perf_counter() - t
+    print(f"{amp:4.0f} {'D spring->MA n3':>22} {dt:8.2f} "
+          f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+          f"{quality(m):11.4f}", flush=True)
diff --git a/scripts/precond_mild.py b/scripts/precond_mild.py
new file mode 100644
index 00000000..1e5b0789
--- /dev/null
+++ b/scripts/precond_mild.py
@@ -0,0 +1,106 @@
+"""Mild-spring preconditioner for MA. Idea (user): spring at a weak
+metric (or few sweeps) sets the node-motion DIRECTION while keeping
+the mesh VALID (it degenerates only when pushed hard); then MA does
+the strong part stably & — from a pre-aligned start — hopefully
+cheaper.
+
+For target AMP ∈ {8, 20}, identical fresh Annulus setups:
+  A   MA only  (full AMP)                         [baseline]
+  Bm  spring only, MILD (AMP=2)                    [stays valid?]
+  E   spring(AMP=2) -> MA(full, n_outer=1)
+  F   spring(AMP=2) -> MA(full, n_outer=3 resid.)
+  G   spring(full AMP, 30 sweeps) -> MA(full,n1)   ["few iters"]
+Report wall time, honest deep/near, mesh quality minA/meanA
+(0 ⇒ degenerate sliver; healthy ~0.02–0.3).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _winslow_elliptic, _edge_pairs,
+    _auto_pinned_labels, _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+SPRING_MILD_AMP = 2.0
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def quality(mesh):
+    tris = _tri_cells(mesh.dm)
+    if tris is None:
+        return float("nan")
+    a = np.abs(_signed_areas(np.asarray(mesh.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    full = 1.0 + amp * sympy.exp(
+        -(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    mild = 1.0 + SPRING_MILD_AMP * sympy.exp(
+        -(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, full, mild
+
+
+def show(amp, name, m, e, dt):
+    print(f"{amp:4.0f} {name:>26} {dt:8.2f} "
+          f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+          f"{quality(m):11.4f}", flush=True)
+
+
+print(f"{'AMP':>4} {'variant':>26} {'time/s':>8} "
+      f"{'deep/near':>10} {'minA/meanA':>11}")
+print("-" * 64)
+for amp in (8.0, 20.0):
+    m, full, mild = case(amp, f"A{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter(); _winslow_elliptic(m, full, p, False)
+    show(amp, "A MA only", m, e, time.perf_counter() - t)
+
+    m, full, mild = case(amp, f"Bm{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter(); _winslow_spring(m, mild, p, False)
+    show(amp, "Bm spring MILD only", m, e, time.perf_counter() - t)
+
+    m, full, mild = case(amp, f"E{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, mild, p, False)
+    _winslow_elliptic(m, full, p, False, n_outer=1)
+    show(amp, "E springMILD->MA n1", m, e, time.perf_counter() - t)
+
+    m, full, mild = case(amp, f"F{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, mild, p, False)
+    _winslow_elliptic(m, full, p, False, n_outer=3)
+    show(amp, "F springMILD->MA n3", m, e, time.perf_counter() - t)
+
+    m, full, mild = case(amp, f"G{int(amp)}")
+    e = _edge_pairs(m.dm); p = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    _winslow_spring(m, full, p, False, n_sweeps=30)
+    _winslow_elliptic(m, full, p, False, n_outer=1)
+    show(amp, "G springFULL30->MA n1", m, e,
+         time.perf_counter() - t)
diff --git a/scripts/setup_sanity.py b/scripts/setup_sanity.py
new file mode 100644
index 00000000..299adb85
--- /dev/null
+++ b/scripts/setup_sanity.py
@@ -0,0 +1,139 @@
+"""Is the metric-grading SETUP correct? Checks the things common to
+EVERY method (so a bug here would make them all look weak):
+
+  1. fraction of vertices that are PINNED (over-pinning ⇒ nothing
+     can move regardless of method)
+  2. the metric ρ_tgt actually seen by the smoother
+     (uw.function.evaluate(metric, coords)) — min/max + radial
+     profile (flat ⇒ no target ⇒ no grading possible)
+  3. the spring rest-length field L0 deep-vs-near ratio
+     (should be ~ (ρ_near/ρ_deep)^(1/2) ≈ 3 for AMP=8)
+  4. how far nodes actually move when smoothed, by radius band
+"""
+from __future__ import annotations
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing import smoothing as S
+
+R_I, R_O, W, AMP, RES = 0.5, 1.0, 0.12, 8.0, 16
+
+mesh = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                          cellSize=1.0 / RES, qdegree=3)
+dm = mesh.dm
+pS0, pE0 = dm.getDepthStratum(0)
+nv = pE0 - pS0
+
+r0 = uw.discretisation.MeshVariable(
+    "r0s", mesh, vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+X0 = np.asarray(mesh.X.coords)
+rad0 = np.sqrt((X0 ** 2).sum(axis=1))
+r0.data[:, 0] = rad0
+metric = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / W) ** 2)
+
+# --- 1. pinning ---
+pinned_labels = S._auto_pinned_labels(mesh)
+# re-fetch dm AFTER MeshVariable creation (stale-DM footgun)
+dm = mesh.dm
+is_pinned = S._pinned_mask(dm, pinned_labels)
+print(f"boundaries on mesh : "
+      f"{[getattr(b,'name',None) for b in mesh.boundaries]}")
+print(f"auto-pinned labels : {pinned_labels}")
+print(f"vertices           : {nv}")
+print(f"pinned             : {is_pinned.sum()} "
+      f"({100.0*is_pinned.sum()/nv:.1f}%)   "
+      f"free: {(~is_pinned).sum()}")
+# how many pinned are actually ON a boundary ring (r≈R_I or R_O)?
+on_ring = (np.abs(rad0 - R_O) < 1e-6) | (np.abs(rad0 - R_I) < 1e-6)
+print(f"verts on a ring    : {on_ring.sum()}   "
+      f"pinned-but-interior: "
+      f"{int((is_pinned & ~on_ring).sum())}")
+
+# --- 2. metric actually seen by the smoother ---
+coords = np.asarray(mesh.X.coords)
+rho = np.asarray(uw.function.evaluate(metric, coords)).reshape(-1)
+print(f"\nrho via uw.function.evaluate(metric, coords):")
+print(f"  min={rho.min():.4f}  max={rho.max():.4f}  "
+      f"mean={rho.mean():.4f}")
+# radial profile
+for lo in np.linspace(R_I, R_O, 6)[:-1]:
+    hi = lo + (R_O - R_I) / 5
+    m = (rad0 >= lo) & (rad0 < hi)
+    if m.any():
+        print(f"  r∈[{lo:.2f},{hi:.2f})  rho mean={rho[m].mean():.3f}"
+              f"  (analytic f={1+AMP*np.exp(-(((lo+hi)/2-R_O)/W)**2):.3f})")
+
+# --- 3. rest-length field L0 ---
+edges = S._edge_pairs(dm)
+v0, v1 = edges[:, 0], edges[:, 1]
+w = np.maximum(rho, 1e-30) ** (-0.5)
+w_edge = 0.5 * (w[v0] + w[v1])
+e = coords[v1] - coords[v0]
+Lc = np.linalg.norm(e, axis=1)
+L0 = (Lc.sum() / w_edge.sum()) * w_edge
+emid = 0.5 * (rad0[v0] + rad0[v1])
+near = emid > (R_O - W)
+deep = emid < (R_O - 0.30)
+rho_edge = 0.5 * (rho[v0] + rho[v1])
+print(f"\nspring rest length L0:")
+print(f"  near-band mean L0={L0[near].mean():.4f}  "
+      f"deep mean L0={L0[deep].mean():.4f}  "
+      f"deep/near={L0[deep].mean()/L0[near].mean():.2f}  "
+      f"(want (rho_near/rho_deep)^0.5 = "
+      f"{(rho_edge[near].mean()/rho_edge[deep].mean())**0.5:.2f})")
+
+# --- 4. actual node motion after smoothing ---
+S.smooth_mesh_interior(mesh, metric=metric, verbose=False)
+c1 = np.asarray(mesh.X.coords)
+dr = c1 - X0
+dmag = np.linalg.norm(dr, axis=1)
+print(f"\nnode displacement after smooth_mesh_interior:")
+print(f"  max|dx|={dmag.max():.4f}  mean|dx|={dmag.mean():.4f}  "
+      f"moved(>1e-9): {(dmag>1e-9).sum()}/{nv}")
+for lo in np.linspace(R_I, R_O, 6)[:-1]:
+    hi = lo + (R_O - R_I) / 5
+    m = (rad0 >= lo) & (rad0 < hi)
+    if m.any():
+        print(f"  r0∈[{lo:.2f},{hi:.2f})  mean|dx|={dmag[m].mean():.4f}"
+              f"  mean Δr={(np.sqrt((c1[m]**2).sum(1))-rad0[m]).mean():+.4f}")
+
+# --- 5. HONEST per-node radial-spacing metric (not centroid-band) ---
+# For each FINAL node, its mean incident edge length, binned by its
+# FINAL radius. Strong grading ⇒ near-surface mean edge ≪ deep.
+rad1 = np.sqrt((c1 ** 2).sum(axis=1))
+ev1 = c1[v1] - c1[v0]
+Le1 = np.linalg.norm(ev1, axis=1)
+node_edgelen = np.zeros(nv)
+cnt = np.zeros(nv)
+for a in (v0, v1):
+    np.add.at(node_edgelen, a, Le1)
+    np.add.at(cnt, a, 1.0)
+node_edgelen /= np.maximum(cnt, 1.0)
+
+# initial (uniform-ish) reference
+ev0 = X0[v1] - X0[v0]
+Le0 = np.linalg.norm(ev0, axis=1)
+node_edgelen0 = np.zeros(nv)
+cnt0 = np.zeros(nv)
+for a in (v0, v1):
+    np.add.at(node_edgelen0, a, Le0)
+    np.add.at(cnt0, a, 1.0)
+node_edgelen0 /= np.maximum(cnt0, 1.0)
+
+print("\nHONEST metric — mean incident edge length by FINAL radius:")
+print(f"{'radius band':>14} {'before':>9} {'after':>9} {'after/bef':>10}")
+bands = [(0.50, 0.70), (0.70, 0.85), (0.85, 0.95), (0.95, 1.00)]
+res = {}
+for lo, hi in bands:
+    mb = (rad1 >= lo) & (rad1 < hi)
+    m0 = (rad0 >= lo) & (rad0 < hi)
+    if mb.any() and m0.any():
+        a_ = node_edgelen[mb].mean()
+        b_ = node_edgelen0[m0].mean()
+        res[(lo, hi)] = a_
+        print(f"  [{lo:.2f},{hi:.2f})  {b_:9.4f} {a_:9.4f} "
+              f"{a_/b_:10.3f}")
+if (0.50, 0.70) in res and (0.95, 1.00) in res:
+    print(f"\n  >>> deep/near edge ratio AFTER = "
+          f"{res[(0.50,0.70)]/res[(0.95,1.00)]:.2f}  "
+          f"(uniform start ≈ 1.0; exact equidistribution ~10)")
diff --git a/scripts/show_metric_mesh.py b/scripts/show_metric_mesh.py
new file mode 100644
index 00000000..dfdebc64
--- /dev/null
+++ b/scripts/show_metric_mesh.py
@@ -0,0 +1,152 @@
+"""Visual + honest-metric comparison of the metric-driven mesh
+grading on an UNDEFORMED uniform Annulus.
+
+Two methods on identical setups, side by side:
+  row 1 — elastic-spring equilibrium  (smooth_mesh_interior default)
+  row 2 — Monge–Ampère / BFO          (_winslow_elliptic, preserved)
+columns — AMP = 0, 2, 8, 20  in  f(r0)=1+AMP·exp(-((r0-R_O)/W)^2)
+
+r0 is a degree-1 scalar set ONCE to the initial radius (Lagrangian).
+
+The grading number printed/annotated is the HONEST metric:
+per-node mean incident edge length binned by the node's FINAL
+radius, deep/near. (The old centroid-band edge_ratio averaged the
+thin strong near-surface compression with the bulk Lagrangian
+shift and understated grading by ~40% — that was a validation
+bug, not a method failure.)
+
+Outputs:
+  /tmp/metric_mesh/meshes.png   (mesh pictures — judge visually)
+  /tmp/metric_mesh/case_*.npz
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import matplotlib.pyplot as plt
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_elliptic, _edge_pairs)
+
+OUT = "/tmp/metric_mesh"
+os.makedirs(OUT, exist_ok=True)
+
+R_O, R_I = 1.0, 0.5
+RES = 16
+WIDTH = 0.12
+AMPS = [0.0, 2.0, 8.0, 20.0]
+
+
+def mesh_triangles(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    tris = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            tris.append(vs)
+    return np.asarray(tris, dtype=np.int64)
+
+
+def honest_ratio(coords, edges):
+    """deep/near ratio of per-node mean incident edge length,
+    binned by each node's FINAL radius. ~1 = no grading;
+    >1 = refined near surface (the design intent)."""
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv)
+    c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le)
+        np.add.at(c, a, 1.0)
+    nodelen = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)            # r∈[0.50,0.70)
+    near = (r > R_O - 0.05) & (r <= R_O + 1e-9)     # r∈(0.95,1.00]
+    if not deep.any() or not near.any():
+        return float("nan")
+    return float(nodelen[deep].mean() / nodelen[near].mean())
+
+
+def build_case(amp, idx):
+    mesh = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                              cellSize=1.0 / RES, qdegree=3)
+    TRI = mesh_triangles(mesh)
+    c0 = np.asarray(mesh.X.coords).copy()
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{idx}", mesh, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(mesh.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return mesh, TRI, c0, f
+
+
+results = {"spring": [], "ma": []}
+for k, amp in enumerate(AMPS):
+    # --- elastic-spring (public API default metric path) ---
+    mesh, TRI, c0, f = build_case(amp, f"sp{k}")
+    edges = _edge_pairs(mesh.dm)
+    uw.meshing.smooth_mesh_interior(mesh, metric=f, verbose=False)
+    c1 = np.asarray(mesh.X.coords).copy()
+    rsp = honest_ratio(c1, edges)
+    r0sp = honest_ratio(c0, edges)
+    results["spring"].append((amp, c0, c1, TRI, r0sp, rsp))
+    np.savez(os.path.join(OUT, f"case_spring_amp{int(amp)}.npz"),
+             coords0=c0, coords1=c1, tri=TRI)
+    uw.pprint(f"[spring] AMP={amp:5.1f}  honest deep/near "
+              f"{r0sp:.2f} -> {rsp:.2f}")
+
+    # --- Monge–Ampère / BFO (preserved, called directly) ---
+    mesh, TRI, c0, f = build_case(amp, f"ma{k}")
+    edges = _edge_pairs(mesh.dm)
+    pinned = uw.meshing.smoothing._auto_pinned_labels(mesh)
+    _winslow_elliptic(mesh, f, pinned, False)
+    c1 = np.asarray(mesh.X.coords).copy()
+    rma = honest_ratio(c1, edges)
+    r0ma = honest_ratio(c0, edges)
+    results["ma"].append((amp, c0, c1, TRI, r0ma, rma))
+    np.savez(os.path.join(OUT, f"case_ma_amp{int(amp)}.npz"),
+             coords0=c0, coords1=c1, tri=TRI)
+    uw.pprint(f"[MA]     AMP={amp:5.1f}  honest deep/near "
+              f"{r0ma:.2f} -> {rma:.2f}")
+
+if uw.mpi.rank == 0:
+    n = len(AMPS)
+    fig, axes = plt.subplots(2, n, figsize=(3.6 * n, 7.4),
+                             facecolor="white")
+    th = np.linspace(0, 2 * np.pi, 240)
+    rows = [("Elastic-spring equilibrium", results["spring"]),
+            ("Monge–Ampère (BFO, preserved)", results["ma"])]
+    for ri, (label, rows_data) in enumerate(rows):
+        for ci, (amp, c0, c1, TRI, r0v, r1v) in enumerate(
+                rows_data):
+            ax = axes[ri, ci]
+            ax.set_facecolor("white")
+            ax.triplot(c1[:, 0], c1[:, 1], TRI,
+                       color="black", lw=0.4)
+            ax.plot(R_O * np.cos(th), R_O * np.sin(th),
+                    color="tab:red", lw=1.0)
+            ax.plot(R_I * np.cos(th), R_I * np.sin(th),
+                    color="tab:blue", lw=1.0)
+            ttl = ("uniform (AMP=0)" if amp == 0
+                   else f"AMP={amp:g}")
+            ax.set_title(f"{ttl}\nhonest deep/near = {r1v:.2f}",
+                         fontsize=10)
+            ax.set_aspect("equal")
+            ax.set_xticks([])
+            ax.set_yticks([])
+        axes[ri, 0].set_ylabel(label, fontsize=11)
+    fig.suptitle(
+        f"Metric-driven grading, undeformed Annulus (res={RES})  "
+        f"f(r0)=1+AMP·exp(-((r0-{R_O:g})/{WIDTH:g})^2)\n"
+        f"honest metric = per-node mean incident edge by FINAL "
+        f"radius, deep/near  (1.0 = none; exact OT ≈ 10 at AMP=8)",
+        fontsize=11)
+    fig.tight_layout(rect=[0, 0, 1, 0.96])
+    out_png = os.path.join(OUT, "meshes.png")
+    fig.savefig(out_png, dpi=140, bbox_inches="tight")
+    print(f"\nSaved {out_png}")
diff --git a/scripts/slip_test.py b/scripts/slip_test.py
new file mode 100644
index 00000000..f19f6ce3
--- /dev/null
+++ b/scripts/slip_test.py
@@ -0,0 +1,134 @@
+"""Does boundary tangential slip relieve the volumetric spring's
+touchy/anisotropic refinement? Volumetric spring, slip OFF vs ON,
+on the localised INTERIOR blob (where the streakiness showed) and
+the surface band. Plots a zoom for visual judgement.
+
+SAFETY CHECK (user concern: nodes drifting off the surface): we
+report max |r_final − r_orig| over boundary nodes — with the
+per-ring radius projection this MUST be ~0 (slip is purely
+tangential; the radial DOF is removed).
+"""
+from __future__ import annotations
+import os
+import time
+import numpy as np
+import matplotlib.pyplot as plt
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _edge_pairs, _auto_pinned_labels,
+    _pinned_mask, _tri_cells, _signed_areas)
+
+OUT = "/tmp/metric_mesh"
+R_O, R_I, RES = 1.0, 0.5, 16
+CX, CY, W = 0.78, 0.0, 0.12
+AMPS = [8.0, 20.0]
+
+
+def mesh_tris(m):
+    dm = m.dm
+    cS, cE = dm.getHeightStratum(0)
+    pS, pE = dm.getDepthStratum(0)
+    o = []
+    for c in range(cS, cE):
+        cl = dm.getTransitiveClosure(c)[0]
+        vs = [p - pS for p in cl if pS <= p < pE]
+        if len(vs) == 3:
+            o.append(vs)
+    return np.asarray(o, np.int64)
+
+
+def far_near(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); cnt = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(cnt, a, 1.0)
+    nl = s / np.maximum(cnt, 1.0)
+    d = np.hypot(coords[:, 0] - CX, coords[:, 1] - CY)
+    return float(nl[d > 4 * W].mean() / nl[d < W].mean())
+
+
+def quality(coords, TRI):
+    a = np.abs(_signed_areas(coords, TRI))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    X0v = uw.discretisation.MeshVariable(
+        f"X0_{tag}", m, vtype=uw.VarType.VECTOR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    X0v.data[:, 0] = X0[:, 0]
+    X0v.data[:, 1] = X0[:, 1]
+    d2 = (X0v.sym[0] - CX) ** 2 + (X0v.sym[1] - CY) ** 2
+    f = 1.0 + amp * sympy.exp(-d2 / W ** 2)
+    return m, f, X0
+
+
+def bnd_drift(m, X0):
+    """max radial change of boundary nodes about the origin."""
+    isb = _pinned_mask(m.dm, _auto_pinned_labels(m))
+    c1 = np.asarray(m.X.coords)
+    r0 = np.linalg.norm(X0[isb], axis=1)
+    r1 = np.linalg.norm(c1[isb], axis=1)
+    return float(np.abs(r1 - r0).max())
+
+
+res = []
+for amp in AMPS:
+    row = []
+    for slip in (False, True):
+        m, f, X0 = case(amp, f"{int(amp)}{int(slip)}")
+        e = _edge_pairs(m.dm); TRI = mesh_tris(m)
+        p = _auto_pinned_labels(m)
+        t = time.perf_counter()
+        _winslow_spring(m, f, p, False, boundary_slip=slip)
+        dt = time.perf_counter() - t
+        c1 = np.asarray(m.X.coords).copy()
+        print(f"AMP={amp:4.0f} slip={str(slip):5} "
+              f"far/near={far_near(c1, e):.2f} "
+              f"q={quality(c1, TRI):.3f} "
+              f"bnd_drift={bnd_drift(m, X0):.2e} "
+              f"{dt:.2f}s", flush=True)
+        row.append((slip, c1, TRI))
+    res.append((amp, row))
+
+if uw.mpi.rank == 0:
+    th = np.linspace(0, 2 * np.pi, 240)
+    fig, ax = plt.subplots(2, 4, figsize=(24, 12),
+                           facecolor="white")
+    for ri, (amp, row) in enumerate(res):
+        for ci, (slip, c1, TRI) in enumerate(row):
+            # full
+            a = ax[ri, 2 * ci]
+            a.triplot(c1[:, 0], c1[:, 1], TRI, "k-", lw=0.5)
+            a.plot(R_O * np.cos(th), R_O * np.sin(th),
+                   "tab:red", lw=1.2)
+            a.add_patch(plt.Circle((CX, CY), W, fill=False,
+                        ec="tab:green", lw=2))
+            a.set_title(f"AMP={amp:g} slip={slip} (full)",
+                        fontsize=13)
+            a.set_aspect("equal"); a.set_xticks([])
+            a.set_yticks([])
+            # zoom on blob
+            a = ax[ri, 2 * ci + 1]
+            a.triplot(c1[:, 0], c1[:, 1], TRI, "k-", lw=0.9)
+            a.add_patch(plt.Circle((CX, CY), W, fill=False,
+                        ec="tab:green", lw=2))
+            a.set_xlim(CX - 3 * W, CX + 3 * W)
+            a.set_ylim(CY - 3 * W, CY + 3 * W)
+            a.set_title(f"AMP={amp:g} slip={slip} (blob zoom)",
+                        fontsize=13)
+            a.set_aspect("equal"); a.set_xticks([])
+            a.set_yticks([])
+    fig.suptitle("Volumetric spring: boundary slip OFF vs ON — "
+                 "does tangential rim motion relieve the "
+                 "anisotropy at the interior blob?", fontsize=14)
+    fig.tight_layout(rect=[0, 0, 1, 0.96])
+    pth = os.path.join(OUT, "slip.png")
+    fig.savefig(pth, dpi=130, bbox_inches="tight")
+    print("Saved", pth)
diff --git a/scripts/spring_converge.py b/scripts/spring_converge.py
new file mode 100644
index 00000000..1b43f5b7
--- /dev/null
+++ b/scripts/spring_converge.py
@@ -0,0 +1,53 @@
+"""Did the spring PCG actually converge — and does converging it
+properly grade more? Run AMP=8 with increasing iteration budgets,
+report final |g| (→0 ⇒ true equilibrium reached), rms(L-L0)/L0,
+the HONEST deep/near grading, and wall time.
+
+If |g|→~1e-6 and grading plateaus ⇒ the converged spring
+equilibrium IS that weak (fixed-topology frustration). If grading
+keeps rising as |g| falls ⇒ 300 iters was just too few (cheap to
+fix — more PCG iters).
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _edge_pairs, _auto_pinned_labels)
+
+R_O, R_I, WIDTH, RES, AMP = 1.0, 0.5, 0.12, 16, 8.0
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+for n in (300, 1000, 3000, 10000):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{n}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + AMP * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    e = _edge_pairs(m.dm)
+    pin = _auto_pinned_labels(m)
+    t = time.perf_counter()
+    # verbose prints the final "spring PCG iter N/N: ... |g|=..."
+    _winslow_spring(m, f, pin, True, n_sweeps=n)
+    dt = time.perf_counter() - t
+    rr = honest_ratio(np.asarray(m.X.coords), e)
+    print(f"==> n_sweeps={n:>6}  time={dt:7.2f}s  "
+          f"honest deep/near={rr:.3f}", flush=True)
diff --git a/scripts/stagnant_lid_R_compare_plot.py b/scripts/stagnant_lid_R_compare_plot.py
new file mode 100644
index 00000000..4a77a512
--- /dev/null
+++ b/scripts/stagnant_lid_R_compare_plot.py
@@ -0,0 +1,126 @@
+"""Adapt the step-125 T field at multiple R values and render
+all the resulting meshes side by side over the |∇T| field.
+"""
+from __future__ import annotations
+import os
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+
+
+SRC_DIR = os.path.expanduser(
+    '~/+Simulations/StagnantLid/uniform_res16_Ra1e7_dEta1e4')
+SRC_STEM = "sl_uniform_res16_Ra1e7_dEta1e4_step00125"
+OUT_BASE = os.path.expanduser(
+    '~/+Simulations/StagnantLid/R_compare')
+R_LIST = [1.0, 1.2, 1.5, 2.0, 3.0, 4.0, 6.0, 10.0]
+
+os.makedirs(OUT_BASE, exist_ok=True)
+
+
+def load_uniform():
+    m = uw.discretisation.Mesh(os.path.join(
+        SRC_DIR, f"{SRC_STEM}.mesh.00000.h5"))
+    T = uw.discretisation.MeshVariable(
+        "T_v2p1", m, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T.read_timestep(SRC_STEM, "T_v2p1", 0, outputPath=SRC_DIR)
+    return m, T
+
+
+def gradT_mag(mesh, T):
+    X = mesh.CoordinateSystem.X
+    return sympy.sqrt(T.sym[0].diff(X[0]) ** 2
+                      + T.sym[0].diff(X[1]) ** 2)
+
+
+# ---- 1. Adapt + save snapshots for each R ----------------------
+
+for R in R_LIST:
+    out_dir = os.path.join(OUT_BASE, f"R{R}")
+    os.makedirs(out_dir, exist_ok=True)
+    snap = os.path.join(out_dir, "adapted.mesh.00000.h5")
+    if os.path.exists(snap):
+        print(f"R={R}: already adapted, skipping")
+        continue
+    print(f"R={R}: adapting...")
+    m, T = load_uniform()
+    if R > 1.0:
+        rho = uw.meshing.metric_density_from_gradient(
+            m, T, amp=8.0, lo_percentile=50.0,
+            hi_percentile=97.0, name=f"R{R}")
+        t0 = time.time()
+        uw.meshing.smooth_mesh_interior(
+            m, metric=rho, method="anisotropic",
+            method_kwargs=dict(resolution_ratio=R,
+                               relax=0.2, n_outer=12))
+        print(f"  adapted in {time.time() - t0:.1f}s")
+    m.write_timestep(
+        filename="adapted", index=0, outputPath=out_dir,
+        meshVars=[T], meshUpdates=True, create_xdmf=True)
+
+
+# ---- 2. Render all R in one plot -------------------------------
+
+# Pre-pass: shared |∇T| color range
+g_max = 0.0
+loaded = []
+for R in R_LIST:
+    m_path = os.path.join(OUT_BASE, f"R{R}",
+                          "adapted.mesh.00000.h5")
+    m = uw.discretisation.Mesh(m_path)
+    T = uw.discretisation.MeshVariable(
+        "T_v2p1", m, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T.read_timestep("adapted", "T_v2p1", 0,
+                    outputPath=os.path.join(OUT_BASE, f"R{R}"))
+    loaded.append((R, m, T))
+    pv_T = vis.meshVariable_to_pv_mesh_object(T)
+    g = vis.scalar_fn_to_pv_points(pv_T, gradT_mag(m, T))
+    g_max = max(g_max, float(np.nanmax(g)))
+print(f"global |∇T|max = {g_max:.3e}")
+
+ncols = len(R_LIST)
+pl = pv.Plotter(shape=(1, ncols), off_screen=True,
+                window_size=(900 * ncols, 900),
+                border=False)
+pl.set_background("white")
+
+for col, (R, m, T) in enumerate(loaded):
+    pv_g = vis.meshVariable_to_pv_mesh_object(T)
+    pv_g.point_data["gradT"] = vis.scalar_fn_to_pv_points(
+        pv_g, gradT_mag(m, T))
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+
+    # Compute alignment for the title
+    rho = uw.meshing.metric_density_from_gradient(
+        m, T, amp=8.0, lo_percentile=50.0, hi_percentile=97.0,
+        name=f"R{R}_plot")
+    mm = uw.meshing.mesh_metric_mismatch(m, rho,
+                                          resolution_ratio=R)
+
+    pl.subplot(0, col)
+    title = (f"R={R}\n"
+             f"alignment r={mm['alignment']:+.2f}\n"
+             f"misalign={mm['misalignment']:.2f}")
+    pl.add_text(title, font_size=14, color="black")
+    pl.add_mesh(pv_g, scalars="gradT", cmap="Greens",
+                clim=(0.0, g_max), show_edges=False,
+                lighting=False,
+                show_scalar_bar=(col == ncols - 1),
+                scalar_bar_args=dict(title="|∇T|",
+                                     color="black"))
+    pl.add_mesh(edges, color="black", line_width=0.8,
+                lighting=False, opacity=0.65)
+    pl.view_xy()
+    pl.camera.zoom(1.3)
+
+out = os.path.join(OUT_BASE, "plot_R_compare.png")
+pl.screenshot(out)
+pl.close()
+print(f"wrote {out}")
diff --git a/scripts/stagnant_lid_R_sweep.py b/scripts/stagnant_lid_R_sweep.py
new file mode 100644
index 00000000..8f6df542
--- /dev/null
+++ b/scripts/stagnant_lid_R_sweep.py
@@ -0,0 +1,182 @@
+"""Adapt-strength sweep: how does resolution_ratio affect both
+mesh quality and Stokes-solve cost on the adapted mesh?
+
+For each R in {1.0=no-op, 1.2, 1.5, 2.0, 3.0}:
+  - adapt the step-125 T snapshot
+  - report mesh.quality stats (minA/meanA, area max/min, edge p95/p05)
+  - run a cold+warm Stokes solve with default GAMG, record wall+iters
+"""
+from __future__ import annotations
+import os
+import time
+import argparse
+import numpy as np
+import sympy
+
+import underworld3 as uw
+
+
+SRC = os.path.expanduser(
+    '~/+Simulations/StagnantLid/uniform_res16_Ra1e7_dEta1e4')
+STEM = "sl_uniform_res16_Ra1e7_dEta1e4_step00125"
+Ra = 1.0e7
+theta_FK = float(np.log(1.0e4))
+
+p = argparse.ArgumentParser()
+p.add_argument('--R-list', type=str,
+               default="1.0,1.2,1.5,2.0,3.0")
+args = p.parse_args()
+
+R_list = [float(x) for x in args.R_list.split(',')]
+
+
+def build_problem(mesh, T, V, P):
+    X = mesh.CoordinateSystem.X
+    r_sym = sympy.sqrt(X[0] ** 2 + X[1] ** 2)
+    unit_r = mesh.CoordinateSystem.unit_e_0
+    s = uw.systems.Stokes(mesh, velocityField=V, pressureField=P)
+    s.constitutive_model = uw.constitutive_models.ViscousFlowModel
+    s.constitutive_model.Parameters.shear_viscosity_0 = (
+        sympy.exp(theta_FK * (1 - T.sym[0])))
+    s.tolerance = 1.0e-5
+    s.penalty = 0.0
+    s.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+    KFS = 1.0e6
+    fs_term = (KFS * V.sym.dot(unit_r) * unit_r)
+    s.add_natural_bc(fs_term, mesh.boundaries.Upper.name)
+    T_cond = sympy.log(r_sym / 1.0) / sympy.log(0.5 / 1.0)
+    s.bodyforce = Ra * (T.sym[0] - T_cond) * unit_r
+    return s
+
+
+def mesh_stats(mesh):
+    from underworld3.meshing.smoothing import (
+        _tri_cells, _signed_areas)
+    tris = _tri_cells(mesh.dm)
+    A = np.abs(_signed_areas(np.asarray(mesh.X.coords), tris))
+    return dict(minA=A.min(), meanA=A.mean(), maxA=A.max(),
+                minA_meanA=A.min() / A.mean(),
+                area_ratio=A.max() / A.min())
+
+
+def run_solve(stokes, V, P, mode):
+    if mode == 'cold':
+        V.data[...] = 0.0
+        P.data[...] = 0.0
+        zero = True
+    else:
+        zero = False
+    t0 = time.time()
+    try:
+        stokes.solve(zero_init_guess=zero)
+        wall = time.time() - t0
+        reason = int(stokes.snes.getConvergedReason())
+        its = int(stokes.snes.getIterationNumber())
+    except Exception as e:
+        return dict(mode=mode, wall=None, reason=None,
+                    its=None, err=str(e))
+    vmax = float(np.sqrt(V.data[:, 0] ** 2
+                         + V.data[:, 1] ** 2).max())
+    return dict(mode=mode, wall=wall, reason=reason,
+                its=its, vmax=vmax)
+
+
+print(f"adapt-strength sweep: R in {R_list}")
+print(f"{'R':>5} | {'minA/meanA':>10} {'A max/min':>10} | "
+      f"{'cold its':>8} {'cold wall':>10} "
+      f"{'warm its':>8} {'warm wall':>10}  {'|v|max':>10}")
+print("-" * 90)
+
+results = []
+for R in R_list:
+    # Fresh mesh load each round (avoid cross-R contamination)
+    mesh = uw.discretisation.Mesh(
+        os.path.join(SRC, f"{STEM}.mesh.00000.h5"))
+    T = uw.discretisation.MeshVariable(
+        "T_v2p1", mesh, vtype=uw.VarType.SCALAR, degree=3,
+        continuous=True)
+    V = uw.discretisation.MeshVariable(
+        "V_v2p1", mesh, vtype=uw.VarType.VECTOR, degree=2,
+        continuous=True)
+    P = uw.discretisation.MeshVariable(
+        "P_v2p1", mesh, vtype=uw.VarType.SCALAR, degree=1,
+        continuous=True)
+    T.read_timestep(STEM, "T_v2p1", 0, outputPath=SRC)
+    V.read_timestep(STEM, "V_v2p1", 0, outputPath=SRC)
+    P.read_timestep(STEM, "P_v2p1", 0, outputPath=SRC)
+
+    # Adapt (skip when R<=1 = no-op, just take uniform mesh stats)
+    if R > 1.0:
+        rho = uw.meshing.metric_density_from_gradient(
+            mesh, T, amp=8.0,
+            lo_percentile=50.0, hi_percentile=97.0,
+            name=f"R{R:.1f}")
+        old_X = np.asarray(mesh.X.coords).copy()
+        old_T = np.asarray(T.data).copy()
+        old_V = np.asarray(V.data).copy()
+        old_P = np.asarray(P.data).copy()
+        t_adapt0 = time.time()
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="anisotropic",
+            method_kwargs=dict(resolution_ratio=R,
+                               relax=0.2, n_outer=12))
+        t_adapt = time.time() - t_adapt0
+        # Remap
+        new_X = np.asarray(mesh.X.coords).copy()
+        new_Tx = np.asarray(T.coords).copy()
+        new_Vx = np.asarray(V.coords).copy()
+        new_Px = np.asarray(P.coords).copy()
+        mesh._deform_mesh(old_X)
+        T.data[...] = old_T
+        V.data[...] = old_V
+        P.data[...] = old_P
+        rT = np.asarray(uw.function.evaluate(
+            T.sym[0], new_Tx)).reshape(-1)
+        rV = np.asarray(uw.function.evaluate(V.sym, new_Vx))
+        rP = np.asarray(uw.function.evaluate(
+            P.sym[0], new_Px)).reshape(-1)
+        mesh._deform_mesh(new_X)
+        T.data[:, 0] = rT
+        V.data[...] = rV.reshape(V.data.shape)
+        P.data[:, 0] = rP
+        V_remap = np.asarray(V.data).copy()
+        P_remap = np.asarray(P.data).copy()
+    else:
+        t_adapt = 0.0
+        V_remap = np.asarray(V.data).copy()
+        P_remap = np.asarray(P.data).copy()
+
+    stats = mesh_stats(mesh)
+    stokes = build_problem(mesh, T, V, P)
+
+    cold = run_solve(stokes, V, P, 'cold')
+    # Warm: feed the remap V,P back
+    V.data[...] = V_remap
+    P.data[...] = P_remap
+    warm = run_solve(stokes, V, P, 'warm')
+
+    line = (f"{R:>5.2f} | {stats['minA_meanA']:>10.4f} "
+            f"{stats['area_ratio']:>10.2f} | "
+            f"{cold['its']:>8d} {cold['wall']:>9.2f}s "
+            f"{warm['its']:>8d} {warm['wall']:>9.2f}s  "
+            f"{cold['vmax']:>10.2e}")
+    print(line, flush=True)
+    results.append(dict(R=R, stats=stats, cold=cold, warm=warm,
+                        adapt_wall=t_adapt))
+
+# Save
+out = os.path.expanduser(
+    '~/+Simulations/StagnantLid/R_sweep_summary.npz')
+np.savez(out,
+         R=np.asarray([r['R'] for r in results]),
+         minA_meanA=np.asarray([r['stats']['minA_meanA']
+                                for r in results]),
+         area_ratio=np.asarray([r['stats']['area_ratio']
+                                for r in results]),
+         cold_its=np.asarray([r['cold']['its'] for r in results]),
+         cold_wall=np.asarray([r['cold']['wall'] for r in results]),
+         warm_its=np.asarray([r['warm']['its'] for r in results]),
+         warm_wall=np.asarray([r['warm']['wall'] for r in results]),
+         adapt_wall=np.asarray([r['adapt_wall'] for r in results]),
+         vmax=np.asarray([r['cold']['vmax'] for r in results]))
+print(f"\nsaved {out}")
diff --git a/scripts/stagnant_lid_adapt.py b/scripts/stagnant_lid_adapt.py
new file mode 100644
index 00000000..a70d61da
--- /dev/null
+++ b/scripts/stagnant_lid_adapt.py
@@ -0,0 +1,241 @@
+"""Take the step-125 settled state of the Ra=1e7 Δη=1e4 stagnant
+lid uniform-mesh run and adapt the mesh to the |∇T| metric — the
+catalogue's equidist mover at resolution_ratio=1.5 (validated
+production setting). Remap T/V/P onto the adapted nodes via FE
+evaluate (the validated local-FE remap from adaptive_saturation
+.py:adapt_local_fe_interp). Save the adapted state so we can use
+it as side (b) of the Stokes-solver preset sweep.
+
+Output: ~/+Simulations/StagnantLid/adapted_R15_Ra1e7_dEta1e4/
+  - adapted.mesh.{T,V,P}.00000.h5   — fields on the graded mesh
+  - plot_adapt_compare.png          — uniform vs adapted side-by-side
+"""
+from __future__ import annotations
+import os
+import sys
+import time
+import argparse
+import numpy as np
+import sympy
+
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+
+
+_t0 = time.time()
+_tprev = _t0
+
+
+def _stage(label):
+    global _tprev
+    t = time.time()
+    dt = t - _tprev
+    tot = t - _t0
+    print(f"  [{tot:6.2f}s | +{dt:5.2f}s]  {label}", flush=True)
+    _tprev = t
+
+
+p = argparse.ArgumentParser()
+p.add_argument('--src-dir', type=str,
+               default=os.path.expanduser(
+                   '~/+Simulations/StagnantLid/'
+                   'uniform_res16_Ra1e7_dEta1e4'))
+p.add_argument('--src-step', type=int, default=125)
+p.add_argument('--out-dir', type=str,
+               default=os.path.expanduser(
+                   '~/+Simulations/StagnantLid/'
+                   'adapted_R15_Ra1e7_dEta1e4'))
+p.add_argument('--resolution-ratio', type=float, default=1.5)
+p.add_argument('--amp', type=float, default=8.0)
+p.add_argument('--lo-pct', type=float, default=50.0)
+p.add_argument('--hi-pct', type=float, default=97.0)
+p.add_argument('--n-outer', type=int, default=12)
+p.add_argument('--relax', type=float, default=0.2)
+args = p.parse_args()
+
+
+# ---------------------- locate the source snapshot --------------
+
+src_dir = args.src_dir
+src_tag = os.path.basename(src_dir.rstrip('/'))
+stem = f"sl_{src_tag}_step{args.src_step:05d}"
+mesh_path = os.path.join(src_dir, f"{stem}.mesh.00000.h5")
+if not os.path.exists(mesh_path):
+    sys.exit(f"missing snapshot mesh: {mesh_path}")
+print(f"loading {stem} from {src_dir}")
+
+mesh = uw.discretisation.Mesh(mesh_path)
+_stage("mesh load")
+
+# Re-create MeshVariables matching the saved layout
+# (T degree=3 scalar; V degree=2 vector; P degree=1 scalar)
+T = uw.discretisation.MeshVariable(
+    "T_v2p1", mesh, vtype=uw.VarType.SCALAR,
+    degree=3, continuous=True, varsymbol="T")
+V = uw.discretisation.MeshVariable(
+    "V_v2p1", mesh, vtype=uw.VarType.VECTOR,
+    degree=2, continuous=True, varsymbol=r"\mathbf{v}")
+P = uw.discretisation.MeshVariable(
+    "P_v2p1", mesh, vtype=uw.VarType.SCALAR,
+    degree=1, continuous=True, varsymbol="p")
+
+T.read_timestep(stem, "T_v2p1", 0, outputPath=src_dir)
+V.read_timestep(stem, "V_v2p1", 0, outputPath=src_dir)
+P.read_timestep(stem, "P_v2p1", 0, outputPath=src_dir)
+_stage("field load (T,V,P)")
+
+print(f"  loaded: T=[{T.data.min():.3f},{T.data.max():.3f}], "
+      f"|v|max={float(np.sqrt(V.data[:,0]**2 + V.data[:,1]**2).max()):.2e}")
+
+
+# ---------------------- capture uniform mesh for the compare ----
+
+uniform_X = np.asarray(mesh.X.coords).copy()
+uniform_T = np.asarray(T.data).copy()
+uniform_V = np.asarray(V.data).copy()
+uniform_P = np.asarray(P.data).copy()
+
+
+# ---------------------- build metric + adapt --------------------
+
+print(f"building ρ ∝ |∇T| metric "
+      f"(amp={args.amp}, pct=[{args.lo_pct:.0f},{args.hi_pct:.0f}])")
+rho = uw.meshing.metric_density_from_gradient(
+    mesh, T, amp=args.amp,
+    lo_percentile=args.lo_pct, hi_percentile=args.hi_pct,
+    name="sl_adapt")
+_stage("metric build (|∇T| projection + percentile-normalise)")
+
+print(f"adapting (anisotropic, resolution_ratio={args.resolution_ratio}, "
+      f"n_outer={args.n_outer}, relax={args.relax})")
+old_X = uniform_X.copy()
+old_T = uniform_T.copy()
+old_V = uniform_V.copy()
+old_P = uniform_P.copy()
+
+uw.meshing.smooth_mesh_interior(
+    mesh, metric=rho, method="anisotropic",
+    method_kwargs=dict(
+        resolution_ratio=args.resolution_ratio,
+        relax=args.relax, n_outer=args.n_outer))
+_stage("mover (Winslow anisotropic MMPDE)")
+
+new_X = np.asarray(mesh.X.coords).copy()
+new_Tx = np.asarray(T.coords).copy()
+new_Vx = np.asarray(V.coords).copy()
+new_Px = np.asarray(P.coords).copy()
+
+
+# ---------------------- local-FE remap T, V, P ------------------
+# Per the catalogue's adapt_local_fe_interp: restore field on the
+# OLD geometry, FE-evaluate at NEW dof positions, write onto the
+# NEW geometry. (Topology-preserving ⇒ FE basis is the same; we
+# just sample the old solution at the new physical points.)
+
+mesh._deform_mesh(old_X)
+T.data[...] = old_T
+V.data[...] = old_V
+P.data[...] = old_P
+
+remap_T = np.asarray(uw.function.evaluate(
+    T.sym[0], new_Tx)).reshape(-1)
+remap_V = np.asarray(uw.function.evaluate(V.sym, new_Vx))
+remap_P = np.asarray(uw.function.evaluate(
+    P.sym[0], new_Px)).reshape(-1)
+
+mesh._deform_mesh(new_X)
+T.data[:, 0] = remap_T
+V.data[...] = remap_V.reshape(V.data.shape)
+P.data[:, 0] = remap_P
+_stage("FE remap T,V,P → new DOF coords")
+
+print(f"  after remap: T=[{T.data.min():.3f},{T.data.max():.3f}], "
+      f"|v|max={float(np.sqrt(V.data[:,0]**2 + V.data[:,1]**2).max()):.2e}")
+
+
+# ---------------------- save adapted snapshot -------------------
+
+os.makedirs(args.out_dir, exist_ok=True)
+mesh.write_timestep(
+    filename="adapted", index=0, outputPath=args.out_dir,
+    meshVars=[T, V, P], meshUpdates=True, create_xdmf=True)
+_stage("write adapted snapshot to disk")
+print(f"saved adapted snapshot to {args.out_dir}")
+
+
+# ---------------------- side-by-side comparison plot ------------
+
+print("rendering uniform vs adapted comparison...")
+pl = pv.Plotter(shape=(1, 2), off_screen=True,
+                window_size=(1800, 900), border=False)
+pl.set_background("white")
+
+# Compute |v|max once (same field on both panels)
+Vmax = float(np.sqrt(V.data[:, 0] ** 2
+                     + V.data[:, 1] ** 2).max())
+
+# Subplot 0: uniform mesh + T (restore uniform geometry to render)
+mesh._deform_mesh(old_X)
+T.data[...] = old_T
+V.data[...] = old_V
+
+pv_T_unif = vis.meshVariable_to_pv_mesh_object(T)
+pv_T_unif.point_data["T"] = np.asarray(T.data[:, 0])
+edges_unif = vis.mesh_to_pv_mesh(mesh).extract_all_edges()
+
+pl.subplot(0, 0)
+pl.add_text(f"uniform res-16  (step {args.src_step})",
+            font_size=14, color="black")
+pl.add_mesh(pv_T_unif, scalars="T", cmap="RdBu_r",
+            clim=(0.0, 1.0), show_edges=False, lighting=False,
+            show_scalar_bar=False)
+pl.add_mesh(edges_unif, color="#202020", line_width=0.7,
+            lighting=False, opacity=0.55)
+pl.view_xy()
+pl.camera.zoom(1.25)
+
+# Subplot 1: adapted mesh + remapped T
+mesh._deform_mesh(new_X)
+T.data[:, 0] = remap_T
+V.data[...] = remap_V.reshape(V.data.shape)
+
+pv_T_adapt = vis.meshVariable_to_pv_mesh_object(T)
+pv_T_adapt.point_data["T"] = np.asarray(T.data[:, 0])
+edges_adapt = vis.mesh_to_pv_mesh(mesh).extract_all_edges()
+
+pl.subplot(0, 1)
+pl.add_text(f"adapted (R={args.resolution_ratio}, "
+            f"|∇T| metric, amp={args.amp}, "
+            f"pct={args.lo_pct:.0f}/{args.hi_pct:.0f})",
+            font_size=14, color="black")
+pl.add_mesh(pv_T_adapt, scalars="T", cmap="RdBu_r",
+            clim=(0.0, 1.0), show_edges=False, lighting=False,
+            show_scalar_bar=True,
+            scalar_bar_args=dict(title="T", color="black"))
+pl.add_mesh(edges_adapt, color="#202020", line_width=0.7,
+            lighting=False, opacity=0.55)
+pl.view_xy()
+pl.camera.zoom(1.25)
+
+out_png = os.path.join(args.out_dir, "plot_adapt_compare.png")
+pl.screenshot(out_png)
+pl.close()
+print(f"saved {out_png}")
+
+
+# ---------------------- mesh stats ------------------------------
+
+# Quick mesh-quality numbers
+from underworld3.meshing.smoothing import _tri_cells, _signed_areas
+tris = _tri_cells(mesh.dm)
+A = np.abs(_signed_areas(np.asarray(mesh.X.coords), tris))
+A_uniform = np.abs(_signed_areas(uniform_X, _tri_cells(mesh.dm)))
+print(f"mesh stats:")
+print(f"  uniform : minA/meanA = {A_uniform.min()/A_uniform.mean():.3f}, "
+      f"A range [{A_uniform.min():.2e}, {A_uniform.max():.2e}]")
+print(f"  adapted : minA/meanA = {A.min()/A.mean():.3f}, "
+      f"A range [{A.min():.2e}, {A.max():.2e}], "
+      f"max/min = {A.max()/A.min():.2f}")
diff --git a/scripts/stagnant_lid_adapt_compare_plot.py b/scripts/stagnant_lid_adapt_compare_plot.py
new file mode 100644
index 00000000..a327c701
--- /dev/null
+++ b/scripts/stagnant_lid_adapt_compare_plot.py
@@ -0,0 +1,98 @@
+"""Clean comparison: uniform vs adapted mesh.
+  Row 1 — T field (no edges)
+  Row 2 — |∇T| field + mesh edges overlaid (the diagnostic — is
+           the refinement going where the metric demands?)
+Loads existing snapshots — no recomputation.
+"""
+from __future__ import annotations
+import os
+import numpy as np
+import sympy
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+pv.OFF_SCREEN = True
+
+
+def load(src_dir, stem):
+    m = uw.discretisation.Mesh(os.path.join(
+        src_dir, f"{stem}.mesh.00000.h5"))
+    T = uw.discretisation.MeshVariable(
+        f"T_view_{id(m)}", m, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    T.read_timestep(stem, "T_v2p1", 0, outputPath=src_dir)
+    return m, T
+
+
+U_DIR = os.path.expanduser(
+    '~/+Simulations/StagnantLid/uniform_res16_Ra1e7_dEta1e4')
+U_STEM = "sl_uniform_res16_Ra1e7_dEta1e4_step00125"
+A_DIR = os.path.expanduser(
+    '~/+Simulations/StagnantLid/adapted_R15_Ra1e7_dEta1e4')
+A_STEM = "adapted"
+
+mu, Tu = load(U_DIR, U_STEM)
+ma, Ta = load(A_DIR, A_STEM)
+
+
+def gradT_mag_sym(mesh, T):
+    X = mesh.CoordinateSystem.X
+    return sympy.sqrt(T.sym[0].diff(X[0]) ** 2
+                      + T.sym[0].diff(X[1]) ** 2)
+
+
+# Pre-pass: shared color limit for |∇T| so both panels are
+# comparable on the same scale.
+g_max = 0.0
+for (m, T) in [(mu, Tu), (ma, Ta)]:
+    pv_T = vis.meshVariable_to_pv_mesh_object(T)
+    g = vis.scalar_fn_to_pv_points(pv_T, gradT_mag_sym(m, T))
+    g_max = max(g_max, float(np.nanmax(g)))
+print(f"|∇T|max (shared clim) = {g_max:.3e}")
+
+
+pl = pv.Plotter(shape=(2, 2), off_screen=True,
+                window_size=(1600, 1600), border=False)
+pl.set_background("white")
+
+for col, (label, m, T) in enumerate([
+    ("uniform res-16", mu, Tu),
+    ("adapted R=1.5  ρ ∝ |∇T|", ma, Ta),
+]):
+    # Top row: T field (no edges)
+    pv_T = vis.meshVariable_to_pv_mesh_object(T)
+    pv_T.point_data["T"] = np.asarray(T.data[:, 0])
+    pl.subplot(0, col)
+    pl.add_text(label, font_size=14, color="black")
+    pl.add_mesh(pv_T, scalars="T", cmap="RdBu_r",
+                clim=(0.0, 1.0), show_edges=False, lighting=False,
+                show_scalar_bar=(col == 1),
+                scalar_bar_args=dict(title="T", color="black"))
+    pl.view_xy()
+    pl.camera.zoom(1.3)
+
+    # Bottom row: |∇T| + mesh edges (the diagnostic)
+    pv_g = vis.meshVariable_to_pv_mesh_object(T)
+    pv_g.point_data["gradT"] = vis.scalar_fn_to_pv_points(
+        pv_g, gradT_mag_sym(m, T))
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+    pl.subplot(1, col)
+    pl.add_text(f"|∇T| + mesh edges  ({label})",
+                font_size=14, color="black")
+    pl.add_mesh(pv_g, scalars="gradT", cmap="Greens",
+                clim=(0.0, g_max), show_edges=False,
+                lighting=False,
+                show_scalar_bar=(col == 1),
+                scalar_bar_args=dict(title="|∇T|",
+                                     color="black"))
+    pl.add_mesh(edges, color="black", line_width=0.7,
+                lighting=False, opacity=0.55)
+    pl.view_xy()
+    pl.camera.zoom(1.3)
+
+out = os.path.join(
+    A_DIR, "plot_adapt_compare_clean.png")
+pl.screenshot(out)
+pl.close()
+print(f"wrote {out}")
diff --git a/scripts/stagnant_lid_adapt_loop.py b/scripts/stagnant_lid_adapt_loop.py
new file mode 100644
index 00000000..fbe01ad1
--- /dev/null
+++ b/scripts/stagnant_lid_adapt_loop.py
@@ -0,0 +1,473 @@
+"""Stagnant-lid convection with PERIODIC adaptation in the time
+loop. Adapts every K steps using the named strategy
+(``--strategy med`` by default); the mover's
+``skip_threshold`` is active so adapts get skipped when the
+mesh is already aligned with the current metric.
+
+Loop pattern per step:
+  1. estimate dt
+  2. if (step % adapt_every == 0): build metric, call mover
+     with skip_threshold; FE-remap T, zero V,P if mesh moved
+  3. solve advdiff
+  4. solve Stokes (warm if no adapt this step; cold if adapt)
+  5. log Nu, vrms, T extents
+"""
+from __future__ import annotations
+import os
+import sys
+import time
+import argparse
+import numpy as np
+import sympy
+import underworld3 as uw
+
+
+p = argparse.ArgumentParser()
+p.add_argument('--src-dir', type=str,
+               default=os.path.expanduser(
+                   '~/+Simulations/StagnantLid/'
+                   'uniform_res16_Ra1e7_dEta1e4'))
+p.add_argument('--src-stem', type=str,
+               default='sl_uniform_res16_Ra1e7_dEta1e4_step00125')
+p.add_argument('--strategy', type=str, default='med',
+               choices=list(uw.meshing.ADAPT_STRATEGIES.keys()))
+p.add_argument('--adapt-every', type=int, default=5)
+p.add_argument('--n-steps', type=int, default=100)
+p.add_argument('--log-every', type=int, default=2)
+p.add_argument('--snapshot-every', type=int, default=20)
+p.add_argument('--out-tag', type=str, default=None)
+p.add_argument('--resume', action='store_true')
+p.add_argument('--grad-smooth-h0', type=float, default=0.0,
+               help='gradient_smoothing_length expressed as a '
+                    'multiple of mean h0 (background cell size). '
+                    '0 = no smoothing; 2.0 = L = 2·h0 (the '
+                    "production gradient-side de-noising).")
+p.add_argument('--refinement', type=float, default=0.0,
+               help='If > 0, use uw.meshing.follow_metric() with '
+                    'this refinement value instead of the legacy '
+                    'strategy-based path. coarsening="auto" '
+                    '(= refinement^(1/d)) and metric='
+                    '"front-following" are used. 0 = use the '
+                    'legacy --strategy path.')
+p.add_argument('--max-t', type=float, default=0.0,
+               help='If > 0, stop the loop as soon as t_sim '
+                    'reaches this value (in addition to the '
+                    '--n-steps cap).')
+p.add_argument('--from-perturbation', action='store_true',
+               help='Start from the near-conductive initial '
+                    'state (T_cond + small mode-5 perturbation, '
+                    'V=P=0) instead of loading from --src-stem. '
+                    'Builds a fresh Annulus(0.5, 1.0, '
+                    'cellSize=1/16, qdegree=3) to match the '
+                    'uniform-res16 setup.')
+p.add_argument('--skip-threshold', type=float, default=-1.0,
+               help='Override the adapt skip threshold. -1 (the '
+                    'default) means use the strategy default '
+                    '(typically 0.9). Set to a very high value '
+                    '(e.g. 99) to never skip — adapt every '
+                    '--adapt-every steps. 0 means always skip.')
+p.add_argument('--dt-mult', type=float, default=1.0,
+               help='Multiplier on estimate_dt (which returns '
+                    'the single-cell crossing time, CFL=1). SLCN '
+                    'is unconditionally stable, so multipliers '
+                    '> 1 (e.g. 3-5) give larger physical-time '
+                    'steps at modest accuracy cost. 1.0 is the '
+                    'historic default.')
+p.add_argument('--Ra', type=float, default=1.0e7,
+               help='Rayleigh number (default 1e7).')
+p.add_argument('--delta-eta', type=float, default=1.0e4,
+               help='Frank-Kamenetskii viscosity contrast '
+                    'eta(cold)/eta(hot). Default 1e4 (stiff '
+                    'stagnant lid). 100 = much softer lid, more '
+                    'dynamic flow.')
+p.add_argument('--pert-mode', type=int, default=5,
+               help='Azimuthal wavenumber of the initial T '
+                    'perturbation. Mode 5 gives the classic '
+                    'five-cell symmetric pattern; mode 1 breaks '
+                    'symmetry, drives drifting / time-varying '
+                    'convection.')
+p.add_argument('--pert-amplitude', type=float, default=0.01,
+               help='Amplitude of the initial T perturbation '
+                    '(relative to T_cond ~ 1).')
+args = p.parse_args()
+
+
+tag = args.out_tag or f"adapt_loop_{args.strategy}_every{args.adapt_every}"
+OUT_DIR = os.path.expanduser(
+    f'~/+Simulations/StagnantLid/{tag}')
+os.makedirs(OUT_DIR, exist_ok=True)
+
+Ra = float(args.Ra)
+theta_FK = float(np.log(float(args.delta_eta)))
+STRAT = uw.meshing.ADAPT_STRATEGIES[args.strategy]
+print(f"=== adaptive convection: strategy={args.strategy} "
+      f"({STRAT['description']}) ===")
+print(f"  every {args.adapt_every} steps, "
+      f"skip_threshold={STRAT['skip_threshold']}, "
+      f"R={STRAT['resolution_ratio']}")
+print(f"  out: {OUT_DIR}")
+
+
+# --- resume / fresh-start logic ---
+def _latest_snapshot():
+    import glob, re
+    fs = glob.glob(os.path.join(OUT_DIR, "step*.mesh.00000.h5"))
+    idxs = []
+    for f in fs:
+        m = re.search(r"step(\d+)\.mesh\.00000\.h5$",
+                      os.path.basename(f))
+        if m:
+            idxs.append(int(m.group(1)))
+    if not idxs:
+        return None
+    s_max = max(idxs)
+    return s_max, f"step{s_max:04d}"
+
+
+resume_info = _latest_snapshot() if args.resume else None
+if resume_info is not None:
+    resume_step, resume_label = resume_info
+    print(f"  resuming from {resume_label}")
+    mesh = uw.discretisation.Mesh(
+        os.path.join(OUT_DIR, f"{resume_label}.mesh.00000.h5"))
+elif args.from_perturbation:
+    resume_step = 0
+    resume_label = None
+    # Fresh Annulus matching the uniform-res16 setup.
+    mesh = uw.meshing.Annulus(
+        radiusOuter=1.0, radiusInner=0.5,
+        cellSize=1.0/16, qdegree=3)
+else:
+    resume_step = 0
+    resume_label = None
+    mesh = uw.discretisation.Mesh(
+        os.path.join(args.src_dir,
+                     f"{args.src_stem}.mesh.00000.h5"))
+
+T = uw.discretisation.MeshVariable(
+    "T_v2p1", mesh, vtype=uw.VarType.SCALAR, degree=3,
+    continuous=True, varsymbol="T")
+V = uw.discretisation.MeshVariable(
+    "V_v2p1", mesh, vtype=uw.VarType.VECTOR, degree=2,
+    continuous=True, varsymbol=r"\mathbf{v}")
+P = uw.discretisation.MeshVariable(
+    "P_v2p1", mesh, vtype=uw.VarType.SCALAR, degree=1,
+    continuous=True, varsymbol="p")
+
+if resume_label:
+    T.read_timestep(resume_label, "T_v2p1", 0, outputPath=OUT_DIR)
+    V.read_timestep(resume_label, "V_v2p1", 0, outputPath=OUT_DIR)
+    try:
+        P.read_timestep(resume_label, "P_v2p1", 0,
+                        outputPath=OUT_DIR)
+    except Exception:
+        P.data[...] = 0.0
+elif args.from_perturbation:
+    # T_cond + amp · sin(m·θ) · sin(π(r-r_i)/(r_o-r_i))
+    r_inner, r_o = 0.5, 1.0
+    X = mesh.CoordinateSystem.X
+    r_sym = sympy.sqrt(X[0]**2 + X[1]**2)
+    th_sym = sympy.atan2(X[1], X[0])
+    T_cond = sympy.log(r_sym/r_o) / sympy.log(r_inner/r_o)
+    init_T = (float(args.pert_amplitude)
+              * sympy.sin(float(args.pert_mode) * th_sym)
+              * sympy.sin(np.pi * (r_sym - r_inner)
+                          / (r_o - r_inner))
+              + T_cond)
+    T.data[...] = np.asarray(uw.function.evaluate(
+        init_T, T.coords)).reshape(-1, 1)
+    V.data[...] = 0.0
+    P.data[...] = 0.0
+else:
+    T.read_timestep(args.src_stem, "T_v2p1", 0,
+                    outputPath=args.src_dir)
+    V.read_timestep(args.src_stem, "V_v2p1", 0,
+                    outputPath=args.src_dir)
+    P.read_timestep(args.src_stem, "P_v2p1", 0,
+                    outputPath=args.src_dir)
+print(f"  loaded T=[{T.data.min():.3f},{T.data.max():.3f}]  "
+      f"|v|max={float(np.sqrt(V.data[:,0]**2+V.data[:,1]**2).max()):.2e}")
+
+
+X = mesh.CoordinateSystem.X
+r_sym = sympy.sqrt(X[0] ** 2 + X[1] ** 2)
+unit_r = mesh.CoordinateSystem.unit_e_0
+
+stokes = uw.systems.Stokes(mesh, velocityField=V, pressureField=P)
+stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+stokes.constitutive_model.Parameters.shear_viscosity_0 = (
+    sympy.exp(theta_FK * (1 - T.sym[0])))
+stokes.tolerance = 1.0e-5
+stokes.penalty = 0.0
+stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+KFS = 1.0e6
+fs = (KFS * V.sym.dot(unit_r) * unit_r)
+stokes.add_natural_bc(fs, mesh.boundaries.Upper.name)
+T_cond = sympy.log(r_sym / 1.0) / sympy.log(0.5 / 1.0)
+stokes.bodyforce = Ra * (T.sym[0] - T_cond) * unit_r
+
+adv = uw.systems.AdvDiffusionSLCN(
+    mesh, u_Field=T, V_fn=V.sym, verbose=False,
+    theta=1.0, monotone_mode='clamp')
+adv.constitutive_model = uw.constitutive_models.DiffusionModel
+adv.constitutive_model.Parameters.diffusivity = 1.0
+adv.tolerance = 1.0e-4
+adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+
+
+# --- Nu evaluators ---
+# Two variants:
+#   _nu_surface()  surface heat flux on the cold (Upper) boundary
+#                  via uw.maths.BdIntegral — Nu = 1 at conduction.
+#                  The canonical thermal-convection Nusselt number.
+#   _nu_midshell() advective+conductive radial flux integrated on a
+#                  circle at r = R_EVAL via Projection + point sample.
+#                  Cheaper, but susceptible to projection noise.
+#
+# Q_COND is the analytic ΔT=1 conductive surface flux on the annulus:
+#   Q_COND = 2π / ln(R_o/R_i),  so Nu_conduction = 1.
+Q_COND = 2.0 * np.pi / np.log(1.0 / 0.5)
+_X = mesh.CoordinateSystem.X
+_n = mesh.Gamma_N
+# Outward conductive flux on the outer (Upper) cold boundary:
+#   q_n = -∇T · n̂      (n̂ outward, T decreasing outward ⇒ q_n > 0)
+_qn_outer = -(T.sym[0].diff(_X[0]) * _n[0]
+              + T.sym[0].diff(_X[1]) * _n[1])
+_bd_qn_upper = uw.maths.BdIntegral(
+    mesh=mesh, fn=_qn_outer,
+    boundary=mesh.boundaries.Upper.name)
+
+
+def _nu_surface():
+    """Surface Nusselt number via BdIntegral on the cold boundary."""
+    return float(_bd_qn_upper.evaluate()) / Q_COND
+
+
+# Legacy mid-shell variant, kept for cross-checking
+_qf = uw.discretisation.MeshVariable(
+    "qr_flux", mesh, vtype=uw.VarType.SCALAR,
+    degree=2, continuous=True)
+_qproj = uw.systems.Projection(mesh, _qf)
+_qproj.smoothing = 0.0
+_er = mesh.CoordinateSystem.unit_e_0
+_qproj.uw_function = (
+    (V.sym[0] * _er[0] + V.sym[1] * _er[1]) * T.sym[0]
+    - (T.sym[0].diff(X[0]) * _er[0]
+       + T.sym[0].diff(X[1]) * _er[1]))
+_TH_EVAL = np.linspace(0, 2 * np.pi, 720, endpoint=False)
+_R_EVAL = 0.75
+_PTS_EVAL = np.column_stack([_R_EVAL * np.cos(_TH_EVAL),
+                              _R_EVAL * np.sin(_TH_EVAL)])
+
+
+def _nu_midshell():
+    _qproj.solve()
+    q = np.asarray(uw.function.evaluate(
+        _qf.sym[0], _PTS_EVAL)).reshape(-1)
+    return float(q.mean() * _R_EVAL * 2.0 * np.pi) / Q_COND
+
+
+# Default Nu reported in the history is now the surface variant.
+_nu = _nu_surface
+
+
+def snapshot(step):
+    label = "init" if step == 0 else f"step{step:04d}"
+    mesh.write_timestep(filename=label, index=0,
+                        outputPath=OUT_DIR,
+                        meshVars=[T, V, P], meshUpdates=True,
+                        create_xdmf=True)
+
+
+def _adapt_step():
+    """Build metric + invoke mover with skip_threshold; FE-remap
+    T (V,P zeroed) if the mover actually moved nodes.
+    Returns (moved, misalignment) tuple — misalignment is the
+    current-mesh alignment score against the target metric BEFORE
+    the adapt fires."""
+    old_X = np.asarray(mesh.X.coords).copy()
+    old_T = np.asarray(T.data).copy()
+    h0 = float(mesh._radii.mean())
+    grad_L = (args.grad_smooth_h0 * h0
+              if args.grad_smooth_h0 > 0 else None)
+    # Resolve the effective skip threshold for THIS adapt
+    if args.skip_threshold >= 0:
+        sk = (None if args.skip_threshold > 10.0
+              else args.skip_threshold)
+    else:
+        sk = STRAT["skip_threshold"]
+    # Diagnostic: measure misalignment BEFORE adapting so we can
+    # log it whether or not the adapt fires.
+    coar_val = float(args.refinement) ** 0.5 if args.refinement > 0 else 1.0
+    R = max(float(args.refinement), coar_val) if args.refinement > 0 else 1.0
+    if args.refinement > 0:
+        rho_diag = uw.meshing.metric_density_from_gradient(
+            mesh, T, refinement=float(args.refinement),
+            coarsening="auto", metric_choice="front-following",
+            gradient_smoothing_length=grad_L, name="diag")
+    else:
+        rho_diag = uw.meshing.metric_density_from_gradient(
+            mesh, T, strategy=args.strategy, name="diag",
+            gradient_smoothing_length=grad_L)
+    mm = uw.meshing.mesh_metric_mismatch(
+        mesh, rho_diag, resolution_ratio=R)
+    misalign = float(mm["misalignment"])
+    print(f"  mismatch before adapt: misalignment={misalign:.3f} "
+          f"(skip threshold {sk})", flush=True)
+    if args.refinement > 0:
+        moved = uw.meshing.follow_metric(
+            mesh, T,
+            refinement=args.refinement,
+            coarsening="auto",
+            metric="front-following",
+            skip_threshold=sk,
+            gradient_smoothing_length=grad_L,
+            verbose=True,
+        )
+        new_X = np.asarray(mesh.X.coords).copy()
+        if not moved:
+            return False, misalign
+    else:
+        rho = uw.meshing.metric_density_from_gradient(
+            mesh, T, strategy=args.strategy, name="loop",
+            gradient_smoothing_length=grad_L)
+        uw.meshing.smooth_mesh_interior(
+            mesh, metric=rho, method="anisotropic",
+            strategy=args.strategy,
+            method_kwargs=dict(relax=0.2, n_outer=12),
+            verbose=True)
+        new_X = np.asarray(mesh.X.coords).copy()
+        if np.allclose(new_X, old_X):
+            return False, misalign
+    # FE-remap T; explicitly zero V,P post-adapt
+    new_Tx = np.asarray(T.coords).copy()
+    mesh._deform_mesh(old_X)
+    T.data[...] = old_T
+    rT = np.asarray(uw.function.evaluate(
+        T.sym[0], new_Tx)).reshape(-1)
+    mesh._deform_mesh(new_X)
+    T.data[:, 0] = rT
+    V.data[...] = 0.0
+    P.data[...] = 0.0
+    return True, misalign
+
+
+# Initial Stokes solve
+print("  initial Stokes solve...", flush=True)
+t0 = time.time()
+stokes.solve(zero_init_guess=False)
+print(f"  init done {time.time()-t0:.1f}s "
+      f"|v|max={float(np.sqrt(V.data[:,0]**2+V.data[:,1]**2).max()):.2e}",
+      flush=True)
+
+
+hist = []
+t_sim = 0.0
+if resume_label:
+    hpath = os.path.join(OUT_DIR, "history.npz")
+    if os.path.exists(hpath):
+        z = np.load(hpath)
+        for i in range(len(z['step'])):
+            if int(z['step'][i]) > resume_step:
+                continue
+            _mis = (float(z['misalignment'][i])
+                    if 'misalignment' in z.files else float('nan'))
+            hist.append((int(z['step'][i]),
+                         float(z['t'][i]),
+                         float(z['dt'][i]),
+                         float(z['wall'][i]),
+                         float(z['vrms'][i]),
+                         float(z['Nu'][i]),
+                         float(z['Tmin'][i]),
+                         float(z['Tmax'][i]),
+                         int(z['adapted'][i]),
+                         _mis))
+        if hist:
+            t_sim = hist[-1][1]
+            print(f"  resumed history: {len(hist)} entries, "
+                  f"t={t_sim:.5f}")
+else:
+    snapshot(0)
+
+START_STEP = resume_step + 1 if resume_label else 1
+END_STEP = (resume_step if resume_label else 0) + args.n_steps + 1
+
+print(f"  running steps {START_STEP}..{END_STEP - 1} "
+      f"(snapshot every {args.snapshot_every}, "
+      f"log every {args.log_every})")
+print(f"{'step':>5} {'t':>9} {'dt':>10} {'wall':>7} "
+      f"{'vrms':>10} {'Nu':>8} {'T[min,max]':>22} {'adapt'}")
+
+n_adapt_skipped = 0
+n_adapt_done = 0
+for s in range(START_STEP, END_STEP):
+    t_step_0 = time.time()
+    did_adapt = False
+    misalign = float('nan')
+    if args.strategy != "off" and (s % args.adapt_every == 0):
+        did_adapt, misalign = _adapt_step()
+        if did_adapt:
+            n_adapt_done += 1
+        else:
+            n_adapt_skipped += 1
+    # Stokes BEFORE AdvDiff. Otherwise the AdvDiff step right
+    # after an adapt uses V=0 (cold restart inside _adapt_step),
+    # which causes a one-step pure-diffusion smearing of T at
+    # the BL and a visible Nu dip (the artifact at t≈0.011 in
+    # the previous run). With Stokes first, V is freshly
+    # computed from the just-remapped T before AdvDiff uses it,
+    # and the SLCN trace-back history stays consistent.
+    try:
+        stokes.solve(zero_init_guess=did_adapt)
+        dt = adv.estimate_dt(direction_aware=True) * float(args.dt_mult)
+        adv.solve(timestep=dt, zero_init_guess=False)
+    except Exception as e:
+        print(f"  EXCEPTION at step {s}: {e}", flush=True)
+        break
+    t_sim += dt
+    wall = time.time() - t_step_0
+
+    T_arr = T.data[:, 0]
+    if np.isnan(T_arr).any() or np.isinf(T_arr).any():
+        print(f"  step {s}: NaN/Inf in T — ABORT", flush=True)
+        break
+    Tmin, Tmax = float(T_arr.min()), float(T_arr.max())
+    if Tmax > 1.1 or Tmin < -0.1:
+        print(f"  step {s}: T overshoot [{Tmin:+.4f},{Tmax:+.4f}]"
+              f" — ABORT", flush=True)
+        break
+
+    v_sq = np.asarray(uw.function.evaluate(
+        V.sym.dot(V.sym), mesh.X.coords))
+    vrms = float(np.sqrt(np.mean(v_sq)))
+    Nu_val = _nu()
+
+    hist.append((s, t_sim, dt, wall, vrms, Nu_val,
+                 Tmin, Tmax, int(did_adapt), misalign))
+    _h = np.asarray(hist)
+    np.savez(os.path.join(OUT_DIR, "history.npz"),
+             step=_h[:, 0], t=_h[:, 1], dt=_h[:, 2],
+             wall=_h[:, 3], vrms=_h[:, 4], Nu=_h[:, 5],
+             Tmin=_h[:, 6], Tmax=_h[:, 7], adapted=_h[:, 8],
+             misalignment=_h[:, 9])
+    if s % args.snapshot_every == 0:
+        snapshot(s)
+    if s % args.log_every == 0:
+        print(f"{s:>5d} {t_sim:>9.5f} {dt:>10.3e} "
+              f"{wall:>6.2f}s {vrms:>10.3e} {Nu_val:>+8.3f} "
+              f"[{Tmin:+.3f},{Tmax:+.3f}]  "
+              f"{'ADAPT' if did_adapt else ''}",
+              flush=True)
+    if args.max_t > 0 and t_sim >= args.max_t:
+        print(f"  reached max_t={args.max_t} at step {s} "
+              f"(t_sim={t_sim:.5f}) — STOPPING", flush=True)
+        # Final snapshot for the movie
+        if s % args.snapshot_every != 0:
+            snapshot(s)
+        break
+
+print(f"=== done; adapts done={n_adapt_done}, "
+      f"skipped={n_adapt_skipped} ===", flush=True)
+if hist:
+    snapshot(int(hist[-1][0]))
diff --git a/scripts/stagnant_lid_plot.py b/scripts/stagnant_lid_plot.py
new file mode 100644
index 00000000..ec2d16d7
--- /dev/null
+++ b/scripts/stagnant_lid_plot.py
@@ -0,0 +1,232 @@
+"""Plot T-field snapshots + V arrows + Nu/vrms time series for a
+stagnant-lid run produced by stagnant_lid_uniform.py.
+
+Rendering: UW3 pyvista helpers (the project's standard for
+high-order field viz) — high-order T on its own DOF cloud +
+deformed-mesh edges overlay + add_arrows from V's DOF cloud,
+white background, lighting off (per repo memory:
+feedback_pyvista_viz_pattern.md).
+
+Time-series scalars (Nu, vrms, η range) stay in matplotlib —
+no benefit from pyvista there.
+"""
+from __future__ import annotations
+import os
+import glob
+import re
+import math
+import argparse
+import numpy as np
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+
+import underworld3 as uw
+import underworld3.visualisation as vis
+import pyvista as pv
+
+
+p = argparse.ArgumentParser()
+p.add_argument('--run-dir', type=str,
+               default=os.path.expanduser(
+                   '~/+Simulations/StagnantLid/'
+                   'uniform_res32_Ra1e6_dEta1e4'))
+p.add_argument('--max-snapshots', type=int, default=8)
+args = p.parse_args()
+
+pv.OFF_SCREEN = True
+
+run_dir = args.run_dir
+tag = os.path.basename(run_dir.rstrip('/'))
+
+hist_path = os.path.join(run_dir, f"sl_{tag}_history.npz")
+if not os.path.exists(hist_path):
+    raise SystemExit(f"history not found: {hist_path}")
+H = np.load(hist_path)
+print(f"history: {len(H['step'])} log entries, "
+      f"step range {H['step'].min()}..{H['step'].max()}, "
+      f"t_sim {H['t_sim'].min():.3f}..{H['t_sim'].max():.3f}")
+
+
+# ---- Nu, vrms time series + η range (matplotlib) ---------------
+
+fig, ax = plt.subplots(3, 1, figsize=(8, 9), sharex=True)
+ax[0].plot(H['step'], H['Nu'], '-o', ms=3, lw=1.0)
+ax[0].axhline(1.0, color='gray', ls=':', lw=0.7,
+              label='pure conduction')
+ax[0].set_ylabel('Nu  (mid-shell total flux)')
+ax[0].grid(alpha=0.3)
+ax[0].legend(loc='best', fontsize=9)
+
+ax[1].semilogy(H['step'], H['vrms'], '-o', ms=3, lw=1.0)
+ax[1].set_ylabel(r'$v_\mathrm{rms}$')
+ax[1].grid(alpha=0.3, which='both')
+
+ax[2].semilogy(H['step'], H['eta_max'], '-o', ms=3, lw=1.0,
+               label=r'$\eta_\max$ (cold, lid)')
+ax[2].semilogy(H['step'], H['eta_min'], '-s', ms=3, lw=1.0,
+               label=r'$\eta_\min$ (hot, base)')
+ax[2].set_ylabel(r'realised $\eta$ range')
+ax[2].set_xlabel('step')
+ax[2].grid(alpha=0.3, which='both')
+ax[2].legend(loc='best', fontsize=9)
+fig.suptitle(f"{tag}  —  t={H['t_sim'][-1]:.4f}", fontsize=11)
+fig.tight_layout()
+out_ts = os.path.join(run_dir, f"plot_{tag}_timeseries.png")
+fig.savefig(out_ts, dpi=130, bbox_inches='tight')
+print(f"  wrote {out_ts}")
+plt.close(fig)
+
+
+# ---- T snapshots + V arrows (pyvista) --------------------------
+
+mesh_files = sorted(glob.glob(os.path.join(
+    run_dir, f"sl_{tag}_step*.mesh.00000.h5")))
+init_file = sorted(glob.glob(os.path.join(
+    run_dir, f"sl_{tag}_init.mesh.00000.h5")))
+pat = re.compile(r"sl_.+_step(\d+)\.mesh\.00000\.h5$")
+entries = []
+for f in init_file:
+    entries.append((0, f))
+for f in mesh_files:
+    m = pat.search(os.path.basename(f))
+    if m:
+        entries.append((int(m.group(1)), f))
+entries.sort(key=lambda e: e[0])
+
+if not entries:
+    print("no snapshots found, skipping field plots")
+    raise SystemExit(0)
+
+if len(entries) > args.max_snapshots:
+    idx = np.linspace(0, len(entries) - 1,
+                      args.max_snapshots).round().astype(int)
+    entries = [entries[i] for i in idx]
+
+
+# Pre-pass: find global |v|max so streamline tube width is
+# comparable across panels — tubes scale with local |v| (radius
+# proportional to magnitude), so the lid shows as thin lines and
+# the active layer as fat ones.
+print("  scanning |v|max across snapshots...", flush=True)
+global_Vmax = 0.0
+for (step, mfile) in entries:
+    m_ = uw.discretisation.Mesh(mfile)
+    V_ = uw.discretisation.MeshVariable(
+        f"V_scan_{step}", m_, vtype=uw.VarType.VECTOR,
+        degree=2, continuous=True)
+    stem_ = os.path.basename(mfile)[:-len(".mesh.00000.h5")]
+    V_.read_timestep(stem_, "V_v2p1", 0, outputPath=run_dir)
+    vm_ = float(np.sqrt(V_.data[:, 0] ** 2
+                        + V_.data[:, 1] ** 2).max())
+    if vm_ > global_Vmax:
+        global_Vmax = vm_
+print(f"  global |v|max = {global_Vmax:.3e}", flush=True)
+if global_Vmax <= 0:
+    global_Vmax = 1.0
+
+# Plot grid
+n = len(entries)
+ncol = min(3, n)
+nrow = math.ceil(n / ncol)
+pl = pv.Plotter(shape=(nrow, ncol), off_screen=True,
+                window_size=(900 * ncol, 900 * nrow),
+                border=False)
+pl.set_background("white")
+
+for k, (step, mfile) in enumerate(entries):
+    rr, cc = divmod(k, ncol)
+    m = uw.discretisation.Mesh(mfile)
+    T = uw.discretisation.MeshVariable(
+        f"T_view_{step}", m, vtype=uw.VarType.SCALAR,
+        degree=3, continuous=True)
+    V = uw.discretisation.MeshVariable(
+        f"V_view_{step}", m, vtype=uw.VarType.VECTOR,
+        degree=2, continuous=True)
+    stem = os.path.basename(mfile)[:-len(".mesh.00000.h5")]
+    T.read_timestep(stem, "T_v2p1", 0, outputPath=run_dir)
+    V.read_timestep(stem, "V_v2p1", 0, outputPath=run_dir)
+
+    # T on its DOF cloud (P3) — Delaunay-triangulated viz mesh
+    pv_T = vis.meshVariable_to_pv_mesh_object(T)
+    pv_T.point_data["T"] = np.asarray(T.data[:, 0])
+
+    # Mesh edges for context
+    edges = vis.mesh_to_pv_mesh(m).extract_all_edges()
+
+    # Streamlines on a 2-D Cartesian ImageData (uniform grid) +
+    # masked to the annulus. Pyvista's `.streamlines()` does the
+    # integration; we get tube radius proportional to local |v|
+    # so the lid shows as thin and the active layer as bold.
+    n_cart = 220
+    extent = 1.05
+    xs = np.linspace(-extent, extent, n_cart)
+    ys = np.linspace(-extent, extent, n_cart)
+    Xc, Yc = np.meshgrid(xs, ys, indexing='xy')
+    Rc = np.sqrt(Xc ** 2 + Yc ** 2)
+    in_ann = (Rc > 0.51) & (Rc < 0.99)
+    pts_eval = np.column_stack([Xc.ravel(), Yc.ravel()])
+    bad = ~in_ann.ravel()
+    if bad.any():
+        th_proj = np.arctan2(pts_eval[bad, 1], pts_eval[bad, 0])
+        pts_eval[bad, 0] = 0.75 * np.cos(th_proj)
+        pts_eval[bad, 1] = 0.75 * np.sin(th_proj)
+    Vx = np.asarray(uw.function.evaluate(
+        V.sym[0], pts_eval)).reshape(-1)
+    Vy = np.asarray(uw.function.evaluate(
+        V.sym[1], pts_eval)).reshape(-1)
+    Vx[bad] = 0.0
+    Vy[bad] = 0.0
+    # Build pyvista ImageData carrying the velocity
+    img = pv.ImageData(
+        dimensions=(n_cart, n_cart, 1),
+        spacing=((2 * extent) / (n_cart - 1),
+                 (2 * extent) / (n_cart - 1), 1.0),
+        origin=(-extent, -extent, 0.0))
+    Vvec3 = np.zeros((n_cart * n_cart, 3))
+    Vvec3[:, 0] = Vx
+    Vvec3[:, 1] = Vy
+    img.point_data["V"] = Vvec3
+    img.point_data["Vmag"] = np.sqrt(Vx ** 2 + Vy ** 2)
+    img.set_active_vectors("V")
+    # Seed points: a polar tile inside the annulus so every
+    # azimuth gets coverage but seeds avoid the lid (where v≈0
+    # gives stranded short streamlines).
+    seed_r = np.linspace(0.55, 0.78, 4)
+    seed_th = np.linspace(0, 2 * np.pi, 60, endpoint=False)
+    sR, sT = np.meshgrid(seed_r, seed_th, indexing='ij')
+    seed_pts = np.column_stack([
+        (sR * np.cos(sT)).ravel(),
+        (sR * np.sin(sT)).ravel(),
+        np.zeros(sR.size)])
+    seeds = pv.PolyData(seed_pts)
+    streams = img.streamlines_from_source(
+        seeds, vectors="V",
+        integration_direction="both",
+        max_step_length=0.02,
+        compute_vorticity=False)
+    Vmax_step = float(np.sqrt(V.data[:, 0] ** 2
+                              + V.data[:, 1] ** 2).max())
+
+    pl.subplot(rr, cc)
+    pl.add_text(f"step {step}    "
+                f"|v|max = {Vmax_step:.2e}",
+                font_size=12, color="black")
+    pl.add_mesh(pv_T, scalars="T", cmap="RdBu_r",
+                clim=(0.0, 1.0), show_edges=False,
+                lighting=False,
+                show_scalar_bar=(k == n - 1),
+                scalar_bar_args=dict(title="T", color="black"))
+    pl.add_mesh(edges, color="#202020", line_width=0.6,
+                lighting=False, opacity=0.4)
+    if streams is not None and streams.n_points > 0:
+        pl.add_mesh(streams, color="black",
+                    line_width=2.0, opacity=0.5,
+                    lighting=False)
+    pl.view_xy()
+    pl.camera.zoom(1.25)
+
+out_T = os.path.join(run_dir, f"plot_{tag}_T_snapshots.png")
+pl.screenshot(out_T)
+pl.close()
+print(f"  wrote {out_T}")
diff --git a/scripts/stagnant_lid_uniform.py b/scripts/stagnant_lid_uniform.py
new file mode 100644
index 00000000..19da7319
--- /dev/null
+++ b/scripts/stagnant_lid_uniform.py
@@ -0,0 +1,312 @@
+"""Frank-Kamenetskii stagnant-lid convection, uniform mesh.
+
+Stage-1 probe for the parallel-safe + scalable error-estimator
+arc (B in the catalogue follow-ups). Purpose: characterise the
+realised T- and η-gradient fields on a baseline uniform mesh,
+so we can choose mover bunching strategy + GAMG-stress test
+parameters for the adaptive run that follows.
+
+Viscosity law (FK):
+    η(T) = exp(θ · (1 - T)),  θ = ln(Δη)
+With Δη = 1e4 ⇒ θ ≈ 9.21:
+    T=0 (cold, outer) → η = Δη = 1e4     (the lid)
+    T=1 (hot,  inner) → η = 1             (active layer reference)
+This normalisation pins the active-layer viscosity at the
+standard value of unity, so velocities and the pressure-velocity
+coupling have a sensible scale (effective Ra ≈ Ra_input in the
+active layer, with stiffness contrast Δη on top).
+
+Annulus geometry, T_inner=1 / T_outer=0, free-slip outer +
+no-slip inner (the existing trusted BC pair).  Ra=1e6 default.
+
+Output: ~/+Simulations/StagnantLid/<tag>/ (XDMF + UW h5; viewable
+in Finder / ParaView).  History saved as atomic .npz alongside
+the checkpoints.
+"""
+from __future__ import annotations
+import os
+import sys
+import argparse
+import time
+import numpy as np
+import sympy
+
+import underworld3 as uw
+
+
+# ---------------------- CLI -------------------------------------
+
+p = argparse.ArgumentParser()
+p.add_argument('--res', type=int, default=32,
+               help='cellSize = 1/res')
+p.add_argument('--n-steps', type=int, default=1500)
+p.add_argument('--Ra', type=float, default=1.0e6)
+p.add_argument('--delta-eta', type=float, default=1.0e4,
+               help='η contrast: FK θ = ln(Δη)')
+p.add_argument('--diffusion-theta', type=float, default=1.0,
+               help='AdvDiff timestepping θ; 1.0 = BE (most '
+                    'stable at high Ra, recommended for high-Ra)')
+p.add_argument('--monotone-mode', type=str, default='clamp')
+p.add_argument('--t-degree', type=int, default=3)
+p.add_argument('--v-degree', type=int, default=2)
+p.add_argument('--p-degree', type=int, default=1)
+p.add_argument('--stokes-tol', type=float, default=1.0e-5)
+p.add_argument('--stokes-snes-opt', type=str, default='direct',
+               choices=['default', 'direct', 'gamg-noagr',
+                        'gamg-noagrsor', 'gamg-full'],
+               help='Stokes solver preset. Catalogue: at Δη=1e4 '
+                    'default GAMG aggregation fails (DIVERGED_LINE_'
+                    'SEARCH on every step); "direct" (MUMPS LU) is '
+                    'the gold-standard probe solver. GAMG variants '
+                    'available for the parallel-scaling sweep.')
+p.add_argument('--capture-every', type=int, default=50)
+p.add_argument('--log-every', type=int, default=10)
+p.add_argument('--tag', type=str, default=None,
+               help='run tag (defaults to res/Ra/dEta string)')
+p.add_argument('--outdir', type=str,
+               default=os.path.expanduser('~/+Simulations/StagnantLid'))
+args = p.parse_args()
+
+
+tag = (args.tag if args.tag else
+       f"uniform_res{args.res}_Ra{args.Ra:.0e}_dEta{args.delta_eta:.0e}"
+       .replace('+0', '').replace('-0', '-'))
+run_dir = os.path.join(args.outdir, tag)
+os.makedirs(run_dir, exist_ok=True)
+
+theta_FK = float(np.log(args.delta_eta))
+
+
+# ---------------------- mesh + fields ---------------------------
+
+r_inner, r_o = 0.5, 1.0
+cellsize = 1.0 / args.res
+qdeg = max(3, args.v_degree + 1)
+
+mesh = uw.meshing.Annulus(
+    radiusOuter=r_o, radiusInner=r_inner,
+    cellSize=cellsize, qdegree=qdeg,
+)
+r, th = mesh.CoordinateSystem.R
+unit_r = mesh.CoordinateSystem.unit_e_0
+
+pair_tag = f"v{args.v_degree}p{args.p_degree}"
+v = uw.discretisation.MeshVariable(
+    f"V_{pair_tag}", mesh, vtype=uw.VarType.VECTOR,
+    degree=args.v_degree, continuous=True,
+    varsymbol=r"\mathbf{v}")
+P = uw.discretisation.MeshVariable(
+    f"P_{pair_tag}", mesh, vtype=uw.VarType.SCALAR,
+    degree=args.p_degree, continuous=True, varsymbol="p")
+T = uw.discretisation.MeshVariable(
+    f"T_{pair_tag}", mesh, vtype=uw.VarType.SCALAR,
+    degree=args.t_degree, continuous=True, varsymbol="T")
+
+
+# ---------------------- Stokes with FK viscosity ----------------
+
+stokes = uw.systems.Stokes(mesh, velocityField=v, pressureField=P)
+stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
+# Direct symbolic FK law — the c-tensor build path no longer
+# calls simplify() (per docs/historical-notes), so the JIT
+# compiler differentiates this correctly for Newton Jacobians.
+stokes.constitutive_model.Parameters.shear_viscosity_0 = (
+    sympy.exp(theta_FK * (1 - T.sym[0])))
+stokes.tolerance = args.stokes_tol
+stokes.penalty = 0.0
+
+# Stokes solver preset (catalogue: solver-strategies-catalogue.md)
+_SNES_OPT = {
+    'default':      {},                                  # newtonls+bt+GAMG
+    'direct':       {'ksp_type': 'preonly',
+                     'pc_type': 'lu',
+                     'pc_factor_mat_solver_type': 'mumps',
+                     'mat_mumps_icntl_24': 1},
+    'gamg-noagr':   {'pc_gamg_aggressive_coarsening': 0},
+    'gamg-noagrsor':{'pc_gamg_aggressive_coarsening': 0,
+                     'mg_levels_ksp_type': 'richardson',
+                     'mg_levels_pc_type': 'sor',
+                     'mg_levels_ksp_max_it': 2},
+    'gamg-full':    {'pc_gamg_agg_nsmooths': 1,
+                     'pc_gamg_threshold': 0.02,
+                     'pc_gamg_threshold_scale': 0.5,
+                     'pc_gamg_aggressive_coarsening': 0,
+                     'mg_levels_ksp_type': 'richardson',
+                     'mg_levels_pc_type': 'sor',
+                     'mg_levels_ksp_max_it': 2},
+}
+for _k, _vopt in _SNES_OPT[args.stokes_snes_opt].items():
+    stokes.petsc_options[_k] = _vopt
+
+# No-slip inner, free-slip outer (trusted BC pair, no Stokes
+# nullspace). Lid forms at the outer (cold) boundary.
+stokes.add_essential_bc((0.0, 0.0), mesh.boundaries.Lower.name)
+KFS = 1.0e6
+fs_term = (KFS * v.sym.dot(unit_r) * unit_r)
+stokes.add_natural_bc(fs_term, mesh.boundaries.Upper.name)
+
+# Buoyancy: Ra · (T - T_cond) · r̂ (T_cond = the logarithmic
+# annular conduction reference so a still-conductive state is
+# force-balanced)
+T_cond = sympy.log(r / r_o) / sympy.log(r_inner / r_o)
+stokes.bodyforce = args.Ra * (T.sym[0] - T_cond) * unit_r
+
+
+# ---------------------- AdvDiff ---------------------------------
+
+adv = uw.systems.AdvDiffusionSLCN(
+    mesh, u_Field=T, V_fn=v.sym, verbose=False,
+    theta=args.diffusion_theta,
+    monotone_mode=(None if args.monotone_mode in ('None', 'none')
+                   else args.monotone_mode),
+)
+adv.constitutive_model = uw.constitutive_models.DiffusionModel
+adv.constitutive_model.Parameters.diffusivity = 1.0
+adv.tolerance = 1.0e-4
+adv.add_dirichlet_bc(1.0, mesh.boundaries.Lower.name)
+adv.add_dirichlet_bc(0.0, mesh.boundaries.Upper.name)
+
+
+# ---------------------- IC --------------------------------------
+
+# Logarithmic conduction + mode-5 perturbation
+init_T = (
+    0.01 * sympy.sin(5.0 * th)
+    * sympy.sin(np.pi * (r - r_inner) / (r_o - r_inner))
+    + T_cond
+)
+T.data[...] = np.asarray(uw.function.evaluate(
+    init_T, T.coords)).reshape(-1, 1)
+
+
+# ---------------------- diagnostics -----------------------------
+
+Q_COND = 2.0 * np.pi / np.log(r_o / r_inner)  # total cond flow
+
+
+def _shell_flow(T_field, V_field, r_eval, n_th=720):
+    """Total radial heat flow through circle r=r_eval:
+        ∮ (v_r T - ∂T/∂r) r dθ
+    Projects the symbolic flux to a nodal field so the integrand
+    is FE-consistent (not raw ∂T)."""
+    X = mesh.CoordinateSystem.X
+    er = mesh.CoordinateSystem.unit_e_0
+    grad_T_r = (T_field.sym[0].diff(X[0]) * er[0]
+                + T_field.sym[0].diff(X[1]) * er[1])
+    vr = (V_field.sym[0] * er[0] + V_field.sym[1] * er[1])
+    qsym = vr * T_field.sym[0] - grad_T_r
+    qf = uw.discretisation.MeshVariable(
+        f"_qr_{int(time.time() * 1e6) % 10 ** 8}", mesh,
+        vtype=uw.VarType.SCALAR, degree=2, continuous=True)
+    proj = uw.systems.Projection(mesh, qf)
+    proj.uw_function = qsym
+    proj.smoothing = 0.0
+    proj.solve()
+    th_eval = np.linspace(0, 2 * np.pi, n_th, endpoint=False)
+    pts = np.column_stack(
+        [r_eval * np.cos(th_eval), r_eval * np.sin(th_eval)])
+    q = np.asarray(uw.function.evaluate(qf.sym[0], pts)).reshape(-1)
+    return float(q.mean() * r_eval * 2.0 * np.pi)
+
+
+def _diagnostics():
+    """vrms, mid-shell Nu, T extents, realised η extents."""
+    # vrms over the mesh interior
+    v_sq = np.asarray(uw.function.evaluate(
+        v.sym.dot(v.sym), mesh.X.coords))
+    vrms = float(np.sqrt(np.mean(v_sq)))
+    # mid-shell Nu (avoids near-BL stencil under-resolution)
+    Nu = _shell_flow(T, v, 0.5 * (r_inner + r_o)) / Q_COND
+    T_arr = T.data[:, 0]
+    eta_arr = np.exp(theta_FK * (1 - T_arr))
+    return {
+        'vrms': vrms, 'Nu': Nu,
+        'Tmin': float(T_arr.min()), 'Tmax': float(T_arr.max()),
+        'eta_min': float(eta_arr.min()),
+        'eta_max': float(eta_arr.max()),
+    }
+
+
+def _capture(step, t_sim):
+    label = f"step{step:05d}" if step > 0 else 'init'
+    mesh.write_timestep(
+        filename=f"sl_{tag}_{label}", index=0, outputPath=run_dir,
+        meshVars=[T, v, P], meshUpdates=True,
+        create_xdmf=True)
+
+
+# ---------------------- main loop -------------------------------
+
+print(f"=== stagnant lid uniform-mesh probe ===", flush=True)
+print(f"  tag           : {tag}", flush=True)
+print(f"  out           : {run_dir}", flush=True)
+print(f"  Ra            : {args.Ra:.2e}", flush=True)
+print(f"  Δη (target)   : {args.delta_eta:.2e} "
+      f"⇒ FK θ = {theta_FK:.4f}", flush=True)
+print(f"  res / cellsz  : {args.res} / {cellsize:.4f}", flush=True)
+print(f"  T/V/P degree  : {args.t_degree}/{args.v_degree}/"
+      f"{args.p_degree}", flush=True)
+print(f"  AdvDiff θ     : {args.diffusion_theta} "
+      f"({'BE' if args.diffusion_theta == 1.0 else 'CN' if args.diffusion_theta == 0.5 else 'mixed'})",
+      flush=True)
+print(f"  monotone mode : {args.monotone_mode}", flush=True)
+print(f"  Stokes preset : {args.stokes_snes_opt}", flush=True)
+
+t0_wall = time.time()
+stokes.solve(zero_init_guess=True)
+print(f"  initial Stokes solve done "
+      f"({time.time() - t0_wall:.1f}s wall)", flush=True)
+
+t_sim = 0.0
+history = {
+    'step': [], 't_sim': [], 'dt': [], 'wall': [],
+    'vrms': [], 'Nu': [], 'Tmin': [], 'Tmax': [],
+    'eta_min': [], 'eta_max': [],
+}
+
+
+def _save_history():
+    np.savez(os.path.join(run_dir, f"sl_{tag}_history.npz"),
+             **{k: np.asarray(v) for k, v in history.items()})
+
+
+# Initial snapshot
+_capture(0, t_sim)
+d0 = _diagnostics()
+print(f"  IC: vrms={d0['vrms']:.3e}  Nu={d0['Nu']:+.3f}  "
+      f"T=[{d0['Tmin']:+.3f},{d0['Tmax']:+.3f}]  "
+      f"η=[{d0['eta_min']:.2e},{d0['eta_max']:.2e}]",
+      flush=True)
+
+for s in range(1, args.n_steps + 1):
+    t_step_0 = time.time()
+    dt = adv.estimate_dt()
+    adv.solve(timestep=dt, zero_init_guess=False)
+    stokes.solve(zero_init_guess=False)
+    t_sim += dt
+    wall = time.time() - t_step_0
+
+    if s % args.log_every == 0 or s == 1:
+        d = _diagnostics()
+        history['step'].append(s)
+        history['t_sim'].append(t_sim)
+        history['dt'].append(dt)
+        history['wall'].append(wall)
+        for k in ('vrms', 'Nu', 'Tmin', 'Tmax',
+                  'eta_min', 'eta_max'):
+            history[k].append(d[k])
+        _save_history()
+        print(f"  step {s:5d}  t={t_sim:7.4f}  Δt={dt:.2e}  "
+              f"wall={wall:5.2f}s  vrms={d['vrms']:.2e}  "
+              f"Nu={d['Nu']:+5.2f}  T=[{d['Tmin']:+.3f},"
+              f"{d['Tmax']:+.3f}]  η=[{d['eta_min']:.1e},"
+              f"{d['eta_max']:.1e}]",
+              flush=True)
+
+    if args.capture_every > 0 and s % args.capture_every == 0:
+        _capture(s, t_sim)
+
+print(f"=== done ({time.time() - t0_wall:.1f}s wall, "
+      f"sim t={t_sim:.4f}) ===", flush=True)
+_capture(args.n_steps, t_sim)
+_save_history()
diff --git a/scripts/vol_spring_test.py b/scripts/vol_spring_test.py
new file mode 100644
index 00000000..9c6a7c97
--- /dev/null
+++ b/scripts/vol_spring_test.py
@@ -0,0 +1,69 @@
+"""Volumetric spring: equal edge springs (shape) + per-cell area
+constraint (size). Compare to the known MA-only baseline
+(AMP8: d/n 1.71, minA/meanA 0.026, ~11s ; AMP20: 1.54, 0.281, ~18s).
+
+Sweep the size/shape weight ratio. Want: strong deep/near AND
+healthy minA/meanA (no slivers — the whole point of equal edge
+springs) at low cost.
+"""
+from __future__ import annotations
+import time
+import numpy as np
+import sympy
+import underworld3 as uw
+from underworld3.meshing.smoothing import (
+    _winslow_spring, _edge_pairs, _auto_pinned_labels,
+    _tri_cells, _signed_areas)
+
+R_O, R_I, WIDTH, RES = 1.0, 0.5, 0.12, 16
+
+
+def honest_ratio(coords, edges):
+    v0, v1 = edges[:, 0], edges[:, 1]
+    Le = np.linalg.norm(coords[v1] - coords[v0], axis=1)
+    nv = coords.shape[0]
+    s = np.zeros(nv); c = np.zeros(nv)
+    for a in (v0, v1):
+        np.add.at(s, a, Le); np.add.at(c, a, 1.0)
+    nl = s / np.maximum(c, 1.0)
+    r = np.sqrt((coords ** 2).sum(axis=1))
+    deep = (r >= R_I) & (r < R_I + 0.20)
+    near = (r > R_O - 0.05)
+    return float(nl[deep].mean() / nl[near].mean())
+
+
+def quality(mesh):
+    tris = _tri_cells(mesh.dm)
+    a = np.abs(_signed_areas(np.asarray(mesh.X.coords), tris))
+    return float(a.min() / a.mean())
+
+
+def case(amp, tag):
+    m = uw.meshing.Annulus(radiusOuter=R_O, radiusInner=R_I,
+                           cellSize=1.0 / RES, qdegree=3)
+    r0 = uw.discretisation.MeshVariable(
+        f"r0_{tag}", m, vtype=uw.VarType.SCALAR,
+        degree=1, continuous=True)
+    X0 = np.asarray(m.X.coords)
+    r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))
+    f = 1.0 + amp * sympy.exp(-(((r0.sym[0]) - R_O) / WIDTH) ** 2)
+    return m, f
+
+
+print("MA-only baseline: AMP8 d/n=1.71 q=0.026 ~11s | "
+      "AMP20 d/n=1.54 q=0.281 ~18s")
+print(f"{'AMP':>4} {'shape_w':>7} {'size_w':>6} {'time/s':>7} "
+      f"{'deep/near':>10} {'minA/meanA':>11}")
+print("-" * 56)
+for amp in (2.0, 8.0, 20.0):
+    for sw, zw in ((1.0, 4.0), (1.0, 8.0), (1.0, 20.0),
+                   (0.3, 8.0)):
+        m, f = case(amp, f"v{int(amp)}_{int(sw*10)}_{int(zw)}")
+        e = _edge_pairs(m.dm)
+        p = _auto_pinned_labels(m)
+        t = time.perf_counter()
+        _winslow_spring(m, f, p, False, shape_w=sw, size_w=zw)
+        dt = time.perf_counter() - t
+        print(f"{amp:4.0f} {sw:7.1f} {zw:6.1f} {dt:7.2f} "
+              f"{honest_ratio(np.asarray(m.X.coords), e):10.3f} "
+              f"{quality(m):11.4f}", flush=True)
diff --git a/src/underworld3/cython/petsc_generic_snes_solvers.pyx b/src/underworld3/cython/petsc_generic_snes_solvers.pyx
index 6bb6b71b..d005a372 100644
--- a/src/underworld3/cython/petsc_generic_snes_solvers.pyx
+++ b/src/underworld3/cython/petsc_generic_snes_solvers.pyx
@@ -704,6 +704,8 @@ class SolverBaseClass(uw_object):
                 self._stokes_nullspace = None
             if hasattr(self, "_stokes_nullspace_basis"):
                 self._stokes_nullspace_basis = ()
+            if hasattr(self, "_constant_nullspace_obj"):
+                self._constant_nullspace_obj = None
 
         # This is a workaround for some problem in the PETSc machinery
         # where we need a surface integral term somewhere on every process
@@ -1656,6 +1658,88 @@ class SNES_Scalar(SolverBaseClass):
 
         self.is_setup = False
 
+        # Optional constant nullspace for pure-Neumann scalar
+        # problems (e.g. an equidistribution / mesh-motion
+        # potential). Off by default; see ``constant_nullspace``.
+        self._constant_nullspace = False
+        # Cached constant PETSc.NullSpace — built lazily in
+        # _attach_constant_nullspace and invalidated on DM/SNES rebuild.
+        self._constant_nullspace_obj = None
+
+    @property
+    def constant_nullspace(self):
+        """Attach a constant nullspace to the Jacobian before solve.
+
+        For a scalar problem with only natural (Neumann) boundary
+        conditions the operator is singular up to an additive
+        constant. Set ``True`` to attach a constant
+        ``PETSc.NullSpace`` to the Jacobian (and its transpose /
+        the preconditioner), which both projects the constant mode
+        out of the Krylov solve and makes PETSc remove the
+        (consistent) component of the RHS — the scalar analogue of
+        the Stokes pressure-nullspace handling. The RHS must be
+        compatible (zero mean) for the Neumann problem to be
+        solvable.
+        """
+        return self._constant_nullspace
+
+    @constant_nullspace.setter
+    def constant_nullspace(self, value):
+        self._constant_nullspace = bool(value)
+
+    def _attach_constant_nullspace(self):
+        """Attach a constant nullspace to the (already set-up) SNES
+        Jacobian. Scalar analogue of ``_attach_stokes_nullspace``."""
+        if not self._constant_nullspace:
+            return
+
+        # A constant nullspace is only valid for a pure-Neumann operator.
+        # If essential (Dirichlet) BCs pin the field, the constant mode is
+        # NOT in the operator's nullspace and projecting it out would
+        # corrupt the solution — guard exactly as the Stokes pressure
+        # nullspace guards against pressure Dirichlet BCs.
+        if len(self.essential_bcs) > 0:
+            boundaries = ", ".join(sorted(
+                {str(getattr(bc, "boundary", "?")) for bc in self.essential_bcs}))
+            raise ValueError(
+                "constant_nullspace=True is only valid for pure-Neumann "
+                "scalar problems, but essential (Dirichlet) boundary "
+                f"conditions are present on: {boundaries}. Remove them or "
+                "set constant_nullspace=False."
+            )
+
+        self.snes.setUp()
+
+        jacobian = self.snes.getJacobian()
+        operator_matrix = jacobian[0]
+        preconditioner_matrix = jacobian[1] if len(jacobian) > 1 else None
+
+        # Cache the constant nullspace on the instance and reuse it across
+        # solves (it depends only on the comm); invalidated to None when the
+        # DM/SNES is rebuilt (see _build).
+        nullspace = self._constant_nullspace_obj
+        if nullspace is None:
+            nullspace = PETSc.NullSpace().create(
+                constant=True, comm=self.dm.comm)
+            self._constant_nullspace_obj = nullspace
+
+        operator_matrix.setNullSpace(nullspace)
+        operator_matrix.setTransposeNullSpace(nullspace)
+        # GAMG (the default PC) builds its coarse hierarchy from the
+        # near-nullspace; for a pure-Neumann operator the constant
+        # mode MUST be supplied here or the PC setup fails
+        # (DIVERGED_LINEAR_SOLVE at 0 iterations on re-solves).
+        operator_matrix.setNearNullSpace(nullspace)
+
+        if preconditioner_matrix is not None:
+            preconditioner_matrix.setNullSpace(nullspace)
+            preconditioner_matrix.setTransposeNullSpace(nullspace)
+            preconditioner_matrix.setNearNullSpace(nullspace)
+
+        if self.verbose and uw.mpi.rank == 0:
+            print(f"SNES_Scalar ({self.name}): attached constant "
+                  f"nullspace", flush=True)
+
     @property
     def tolerance(self):
         """
@@ -2201,6 +2285,11 @@ class SNES_Scalar(SolverBaseClass):
         # Update constants (e.g. changed material params) before solve
         self._update_constants()
 
+        # Pure-Neumann scalar problems: attach a constant nullspace
+        # to the (now set-up) Jacobian. No-op unless
+        # ``constant_nullspace`` was set.
+        self._attach_constant_nullspace()
+
         # solve
         self._snes_solve_with_retries(gvec, divergence_retries, verbose)
 
diff --git a/src/underworld3/discretisation/discretisation_mesh.py b/src/underworld3/discretisation/discretisation_mesh.py
index 1566d309..58a93f5e 100644
--- a/src/underworld3/discretisation/discretisation_mesh.py
+++ b/src/underworld3/discretisation/discretisation_mesh.py
@@ -879,6 +879,156 @@ def length_units(self) -> str:
         """
         return self._length_units
 
+    def quality(self, per_cell=False):
+        r"""Cell-quality diagnostics relevant to FE / solver conditioning.
+
+        Bulk volume ratios (min/mean) hide the handful of
+        near-degenerate cells that nonetheless dominate
+        stiffness-matrix conditioning — a Stokes / saddle-point
+        solve line-search-fails on the *worst* element, not the
+        mean. This reports the tail metrics that actually predict
+        that. For a 2-D simplex (triangle) mesh, per cell:
+
+        * shape quality ``q = 4√3·A / Σℓ²``  (1 = equilateral,
+          → 0 = sliver; folds skew + stretch into one number)
+        * largest interior angle (→ 180° is the conditioning killer)
+        * aspect ratio ``ℓ_max² / (2A)``  (longest edge / shortest
+          altitude)
+        * neighbour size-jump (adjacent-cell area ratio — the mesh
+          gradation the solver actually sees)
+
+        The conditioning-relevant numbers are the *worst* cell
+        (``q_min``, ``angle_max_deg``, ``aspect_max``) and the
+        poor-cell counts, not the means. Non-2-D-simplex meshes get
+        the dimension-agnostic cell-volume-spread subset only.
+
+        Parameters
+        ----------
+        per_cell : bool, default False
+            Also return per-cell arrays (``q``, ``angle_deg``,
+            ``aspect``, ``volume``) under ``"per_cell"`` — for
+            plotting or locating the bad cells.
+
+        Returns
+        -------
+        dict
+            Aggregate + tail stats. Headline scalars (min/max/counts)
+            are MPI-reduced so they are correct in parallel;
+            percentiles and the neighbour size-jump are rank-local
+            estimates (exact in serial — the convention for the
+            mesh-redistribution tooling).
+
+        Examples
+        --------
+        >>> q = mesh.quality()
+        >>> q["q_min"], q["n_q_lt_0p3"], q["aspect_max"]
+        >>> mesh.quality(per_cell=True)["per_cell"]["q"]  # to plot
+        """
+        import numpy as np
+
+        dm = self.dm
+        cdim = self.cdim
+        cStart, cEnd = dm.getHeightStratum(0)
+        pStart, pEnd = dm.getDepthStratum(0)
+        X = np.asarray(
+            dm.getCoordinatesLocal().array).reshape(-1, cdim)
+
+        def _reduce(val, op):
+            if uw.mpi.size > 1:
+                from mpi4py import MPI as _MPI
+                return uw.mpi.comm.allreduce(
+                    val, op=getattr(_MPI, op))
+            return val
+
+        tris = []
+        is_simplex2d = cdim == 2
+        if is_simplex2d:
+            for cc in range(cStart, cEnd):
+                cl = dm.getTransitiveClosure(cc)[0]
+                vs = [p - pStart for p in cl
+                      if pStart <= p < pEnd]
+                if len(vs) != 3:
+                    is_simplex2d = False
+                    break
+                tris.append(vs)
+
+        if not is_simplex2d or not tris:
+            try:
+                vol = np.abs(np.array(
+                    [dm.computeCellGeometryFVM(cc)[0]
+                     for cc in range(cStart, cEnd)]))
+            except Exception:
+                vol = np.array([1.0])
+            if not vol.size:
+                vol = np.array([1.0])
+            n = _reduce(int(vol.size), "SUM")
+            vmin = _reduce(float(vol.min()), "MIN")
+            vsum = _reduce(float(vol.sum()), "SUM")
+            res = dict(
+                n_cells=n, element="non-2D-simplex",
+                vol_min_over_mean=vmin / (vsum / max(n, 1)),
+                shape_metrics=None,
+                note="shape quality / angle / aspect need a 2-D "
+                     "triangle mesh; only volume spread reported")
+            if per_cell:
+                res["per_cell"] = dict(volume=vol)
+            return res
+
+        tri = np.asarray(tris, dtype=np.int64)
+        v0, v1, v2 = X[tri[:, 0]], X[tri[:, 1]], X[tri[:, 2]]
+        a = np.linalg.norm(v1 - v2, axis=1)
+        b = np.linalg.norm(v2 - v0, axis=1)
+        cl_ = np.linalg.norm(v0 - v1, axis=1)
+        A = np.maximum(
+            0.5 * np.abs(np.cross(v1 - v0, v2 - v0)), 1.0e-300)
+        q = 4.0 * np.sqrt(3.0) * A / (a * a + b * b + cl_ * cl_)
+
+        def _ang(o, p, r):
+            return np.degrees(np.arccos(np.clip(
+                (p * p + r * r - o * o) / (2.0 * p * r),
+                -1.0, 1.0)))
+        ang = np.maximum.reduce(
+            [_ang(a, b, cl_), _ang(b, cl_, a), _ang(cl_, a, b)])
+        Lmax = np.maximum.reduce([a, b, cl_])
+        aspect = Lmax * Lmax / (2.0 * A)
+        rel = A / A.mean()
+
+        et = {}
+        for ti, (i, j, k) in enumerate(tri):
+            for u, w in ((i, j), (j, k), (k, i)):
+                et.setdefault((min(u, w), max(u, w)),
+                              []).append(ti)
+        jr = np.array([max(A[t]) / min(A[t])
+                       for t in et.values() if len(t) == 2]
+                      or [1.0])
+
+        n = _reduce(int(tri.shape[0]), "SUM")
+        qsum = _reduce(float(q.sum()), "SUM")
+        Asum = _reduce(float(A.sum()), "SUM")
+        res = dict(
+            n_cells=n, element="2D-simplex",
+            q_min=_reduce(float(q.min()), "MIN"),
+            q_mean=qsum / max(n, 1),
+            q_p01=float(np.percentile(q, 1)),
+            q_p05=float(np.percentile(q, 5)),
+            n_q_lt_0p3=_reduce(int((q < 0.3).sum()), "SUM"),
+            n_q_lt_0p2=_reduce(int((q < 0.2).sum()), "SUM"),
+            angle_max_deg=_reduce(float(ang.max()), "MAX"),
+            n_angle_gt_150=_reduce(int((ang > 150).sum()), "SUM"),
+            n_angle_gt_165=_reduce(int((ang > 165).sum()), "SUM"),
+            aspect_max=_reduce(float(aspect.max()), "MAX"),
+            aspect_p99=float(np.percentile(aspect, 99)),
+            sizejump_max=float(jr.max()),
+            sizejump_p99=float(np.percentile(jr, 99)),
+            n_big_thin=_reduce(
+                int(((rel > 2.0) & (aspect > 4.0)).sum()), "SUM"),
+            vol_min_over_mean=(_reduce(float(A.min()), "MIN")
+                               / (Asum / max(n, 1))))
+        if per_cell:
+            res["per_cell"] = dict(
+                q=q, angle_deg=ang, aspect=aspect, volume=A)
+        return res
+
     def view(self, level=0):
         """
         Displays mesh information at different levels.
@@ -940,6 +1090,32 @@ def view(self, level=0):
 
             uw.pprint(f"Number of cells: {num_cells}\n")
 
+            # Cell-quality summary (the conditioning-relevant tail;
+            # full metrics + per-cell arrays via mesh.quality()).
+            try:
+                Q = self.quality()
+                if Q.get("element") == "2D-simplex":
+                    uw.pprint(
+                        f"Cell quality: q_min={Q['q_min']:.3f} "
+                        f"mean={Q['q_mean']:.2f} | poor(q<0.3): "
+                        f"{Q['n_q_lt_0p3']} | worst aspect "
+                        f"{Q['aspect_max']:.1f} | max size-jump "
+                        f"{Q['sizejump_max']:.1f}\n")
+                    if Q["n_q_lt_0p2"] > 0:
+                        uw.pprint(
+                            f"  ! {Q['n_q_lt_0p2']} cell(s) "
+                            f"q<0.2 (near-degenerate — solver "
+                            f"conditioning hazard)\n")
+                else:
+                    uw.pprint(
+                        f"Cell quality: vol_min/mean="
+                        f"{Q['vol_min_over_mean']:.3f} "
+                        f"(2-D triangle mesh needed for shape "
+                        f"metrics)\n")
+                uw.pprint("  (full metrics: mesh.quality())\n")
+            except Exception:
+                pass
+
             if len(self.vars) > 0:
                 uw.pprint(f"| Variable Name       | component | degree |     type        |")
                 uw.pprint(f"| ---------------------------------------------------------- |")
@@ -1710,9 +1886,16 @@ def _update_projected_normals(self):
         Gamma = self.Gamma
 
         if not hasattr(self, '_projected_normals') or self._projected_normals is None:
-            self._projected_normals = uw.discretisation.MeshVariable(
-                "_n_proj", self, self.cdim, degree=1,
-            )
+            # nuke_coords_and_rebuild() clears this attribute on every deform,
+            # but the underlying MeshVariable persists on the mesh. Recover it
+            # quietly rather than re-creating (which logs a name collision).
+            existing = self.vars.get("_n_proj")
+            if existing is not None:
+                self._projected_normals = existing
+            else:
+                self._projected_normals = uw.discretisation.MeshVariable(
+                    "_n_proj", self, self.cdim, degree=1,
+                )
 
         n = self._projected_normals
         for i in range(self.cdim):
@@ -3645,6 +3828,161 @@ def _increment_mesh_version(self):
             self._mesh_version += 1
             print(f"Mesh version manually incremented to {self._mesh_version}")
 
+    @timing.routine_timer_decorator
+    def OT_adapt(
+        self,
+        field,
+        *,
+        refinement=3.0,
+        coarsening="auto",
+        grad_smoothing_length="auto",
+        metric_choice="front-following",
+        fields_to_remap=None,
+        fields_to_zero=None,
+        skip_threshold=None,
+        reference_coords=None,
+        verbose=False,
+    ):
+        r"""Adapt the mesh in place so cell sizes track ``|∇field|``, using
+        the validated optimal-transport reset pattern.
+
+        Each call resets the mesh to a cached reference (the initial uniform
+        coordinates), FE-remaps ``field`` onto that clean canvas, builds a
+        gradient-density metric, runs the OT mover, and FE-remaps the
+        requested fields onto the adapted positions. Resetting every event
+        (rather than composing adaptations across time steps) is what keeps
+        the mover sliver-free over long runs. The "reset" is internal — from
+        the caller's point of view this just tracks the moving feature.
+
+        Topology is preserved (vertex count, DOF maps, rank partition
+        unchanged); only coordinates move. Registered solvers are marked for
+        rebuild via ``_deform_mesh``.
+
+        Reference coordinates
+        ---------------------
+        The reset target is snapshotted lazily on the **first** call as
+        ``self._ot_adapt_reference_coords`` (a copy of the current
+        ``mesh.X.coords``) and reused thereafter.
+
+        .. warning::
+           If the mesh is deformed by something other than ``OT_adapt``
+           between calls (e.g. a manual ``mesh._deform_mesh(...)`` or a
+           resume that loads a *deformed* snapshot), the cached reference no
+           longer matches the intended pristine state. Use
+           :meth:`OT_adapt_reset_reference` to re-baseline, or pass an
+           explicit ``reference_coords`` for a one-off override.
+
+        Parameters
+        ----------
+        field : MeshVariable
+            Scalar field whose gradient drives refinement (typically ``T``).
+            Always FE-remapped onto the adapted mesh.
+        refinement : float, default 3.0
+            Cell-size envelope ``h0/refinement`` for the densest cells.
+            Validated range 1.5–5; 3 is the Nu sweet spot.
+        coarsening : float or "auto", default "auto"
+            ``"auto"`` = budget-conserving ``refinement**(1/d)``.
+        grad_smoothing_length : "auto", None, float, or Pint Quantity, default "auto"
+            Screened-Poisson de-noising length for ``|∇field|`` before the
+            metric is built — the most effective sliver lever; without it,
+            production refinement chases sub-cell gradient noise.
+            ``"auto"`` (default) ≈ the mesh's uniform cell size (mean edge
+            length) — the validated setting. ``None`` turns it off. A number
+            or Pint length sets ``L`` explicitly; **user-supplied lengths are
+            unit-aware** (non-dimensionalised via the projection), so pass a
+            Pint quantity (or a non-dimensional number) — ``≈ h0`` is mild,
+            ``≈ 2·h0`` stronger.
+        metric_choice : {"front-following", "gradient-uniform"}, default "front-following"
+        fields_to_remap : list of MeshVariable, optional
+            Extra fields to FE-remap onto the adapted positions (``field`` is
+            always remapped). ``None`` ⇒ just ``field``.
+        fields_to_zero : list of MeshVariable, optional
+            Fields to zero after the adapt (e.g. ``[V, P]`` for a cold
+            restart of the flow solve).
+        skip_threshold : float, optional
+            If the mesh is already aligned with the metric (misalignment
+            below this; see :func:`~underworld3.meshing.mesh_metric_mismatch`),
+            skip the adapt and return ``False``. ``None`` ⇒ always adapt.
+        reference_coords : array, optional
+            One-off override of the reset target (does not update the cache).
+        verbose : bool, default False
+
+        Returns
+        -------
+        bool
+            ``True`` if the mesh was adapted, ``False`` if the
+            ``skip_threshold`` check short-circuited it.
+
+        Notes
+        -----
+        Boundary nodes slide tangentially and stay on the boundary for
+        radial coordinate systems (Annulus / shell), using the projected
+        boundary normal ``mesh.Gamma_P1``. Cartesian boundaries are pinned
+        (the vertex-evaluated normal is degenerate there).
+
+        Constrained-manifold meshes (``mesh.dim != mesh.cdim``, e.g. a 2D
+        spherical surface in 3D) are **not supported**: the OT mover would
+        have to constrain *every* node to the surface, not just boundary
+        nodes. See ``docs/developer/design/ot-adapt-api-proposal.md``.
+
+        Examples
+        --------
+        >>> mesh = uw.meshing.Annulus(radiusOuter=1.0, radiusInner=0.5,
+        ...                           cellSize=1/16, qdegree=3)
+        >>> T = uw.discretisation.MeshVariable("T", mesh, 1, degree=3)
+        >>> # ... initialise T ...
+        >>> mesh.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+
+        See Also
+        --------
+        OT_adapt_reset_reference : Re-baseline the reset reference coords.
+        underworld3.meshing.follow_metric : The single-shot anisotropic mover.
+        adapt : Topology-changing MMG remeshing (different mechanism).
+        """
+        if self.dim != self.cdim:
+            raise NotImplementedError(
+                "OT_adapt is not supported on constrained-manifold meshes "
+                f"(mesh.dim={self.dim} != mesh.cdim={self.cdim}). The OT "
+                "mover would need to constrain every node to the surface, "
+                "not just boundary nodes — see "
+                "docs/developer/design/ot-adapt-api-proposal.md."
+            )
+        if (not hasattr(self, "_ot_adapt_reference_coords")
+                or self._ot_adapt_reference_coords is None):
+            # Lazy snapshot of the reset target on first call.
+            self._ot_adapt_reference_coords = numpy.asarray(
+                self.X.coords).copy()
+
+        from underworld3.meshing._ot_adapt import _ot_adapt_step
+
+        return _ot_adapt_step(
+            self, field,
+            refinement=refinement,
+            coarsening=coarsening,
+            grad_smoothing_length=grad_smoothing_length,
+            metric_choice=metric_choice,
+            fields_to_remap=fields_to_remap,
+            fields_to_zero=fields_to_zero,
+            skip_threshold=skip_threshold,
+            reference_coords=reference_coords,
+            verbose=verbose,
+        )
+
+    def OT_adapt_reset_reference(self, coords=None):
+        r"""Re-baseline the reference coordinates used by :meth:`OT_adapt`.
+
+        ``coords=None`` re-snapshots the current ``mesh.X.coords`` as the new
+        reset target; passing explicit ``coords`` (e.g. the initial uniform
+        mesh loaded from a checkpoint) sets those instead. Use on resume,
+        when the loaded mesh is in a deformed state and the cache would
+        otherwise lazily initialise from it.
+        """
+        if coords is None:
+            self._ot_adapt_reference_coords = numpy.asarray(
+                self.X.coords).copy()
+        else:
+            self._ot_adapt_reference_coords = numpy.asarray(coords).copy()
+
     @timing.routine_timer_decorator
     def adapt(self, metric_field, verbose=False):
         r"""
diff --git a/src/underworld3/meshing/__init__.py b/src/underworld3/meshing/__init__.py
index d64cdc44..0fa888d1 100644
--- a/src/underworld3/meshing/__init__.py
+++ b/src/underworld3/meshing/__init__.py
@@ -52,6 +52,10 @@
 
 from .smoothing import (
     smooth_mesh_interior,
+    metric_density_from_gradient,
+    mesh_metric_mismatch,
+    follow_metric,
+    ADAPT_STRATEGIES,
 )
 
 # Make all functions available at module level for backward compatibility
@@ -88,4 +92,8 @@
     "FaultCollection",
     # Mesh smoothing
     "smooth_mesh_interior",
+    "metric_density_from_gradient",
+    "mesh_metric_mismatch",
+    "follow_metric",
+    "ADAPT_STRATEGIES",
 ]
diff --git a/src/underworld3/meshing/_ot_adapt.py b/src/underworld3/meshing/_ot_adapt.py
new file mode 100644
index 00000000..47a845f1
--- /dev/null
+++ b/src/underworld3/meshing/_ot_adapt.py
@@ -0,0 +1,261 @@
+r"""Optimal-transport mesh adaptation — the validated reset-to-uniform step.
+
+This module factors the production pattern that was inlined in
+``scripts/stagnant_lid_adapt_loop.py`` (the ``ot-reset`` branch) into a
+reusable library function. The public entry point is :meth:`Mesh.OT_adapt`
+(see ``discretisation/discretisation_mesh.py``); this module holds the
+algorithm and the boundary-slip helpers it shares with the OT mover
+(``_winslow_equidistribute`` in ``smoothing.py``).
+
+The algorithm, per adapt event:
+
+1. Reset the mesh to its reference (IC uniform) coordinates.
+2. FE-remap the driving ``field`` onto the reference-mesh DOFs.
+3. Build the gradient density metric ``ρ`` on that clean canvas.
+4. Run the OT mover from the uniform canvas (``smooth_mesh_interior``,
+   ``method="ot"``).
+5. FE-remap the requested fields onto the adapted positions and zero any
+   fields flagged for a cold restart.
+
+The "reset every event" discipline is load-bearing: carrying mesh state
+*across* time steps is the broken incremental pattern (slivers lock in).
+Composition *within* an adapt is fine. See
+``docs/developer/design/ot-adapt-api-proposal.md`` and the
+``project_ot_reset_validated`` memory note.
+
+Boundary slip uses the mesh's **projected boundary normals**
+(``mesh.Gamma_P1`` / ``mesh._update_projected_normals``) — the symbolic
+``mesh.Gamma`` projected to a P1 vector field and normalised. This is the
+general, free-surface-ready normal source: it is re-projected on demand here
+because the projected field goes stale every time the mesh deforms. No
+per-mesh-class normal code is used. Nodes whose projected normal is
+degenerate (box corners, or an occasional unlocatable vertex) are pinned
+rather than slipped.
+"""
+
+import numpy as np
+
+import underworld3 as uw
+
+# Validated OT-mover constants (2026-05-23/24 investigation). These are
+# deliberately *not* exposed on the public OT_adapt signature — they are the
+# settled production point, not user dials.
+_OT_N_OUTER = 5
+_OT_RELAX = 0.1
+_OT_STEP_FRAC = 0.3
+
+
+def _is_radial_coords(mesh) -> bool:
+    """True for coordinate systems with a radial boundary (the snap-back
+    target is a fixed ``|r|``). Cartesian boundaries are flat — zeroing the
+    normal displacement keeps nodes on the face, so no snap-back is needed."""
+    from underworld3.coordinates import CoordinateSystemType as CT
+
+    return mesh.CoordinateSystem.coordinate_type in (
+        CT.CYLINDRICAL2D,
+        CT.CYLINDRICAL3D,
+        CT.SPHERICAL,
+        CT.GEOGRAPHIC,
+    )
+
+
+def _auto_grad_smoothing_length(mesh):
+    """The mesh's characteristic (uniform) cell size — mean edge length,
+    parallel-safe — returned as a unit-aware length when the mesh carries
+    coordinate units, else a bare (non-dimensional) float. Used as the
+    default ``grad_smoothing_length`` so gradient de-noising is on by
+    default at a scale comparable to the grid (the validated production
+    setting); ``None`` turns it off."""
+    from underworld3.meshing.smoothing import _edge_pairs
+
+    ep = _edge_pairs(mesh.dm)
+    X = np.asarray(mesh.X.coords)
+    if ep.shape[0]:
+        h0 = float(np.linalg.norm(
+            X[ep[:, 1]] - X[ep[:, 0]], axis=1).mean())
+    else:
+        h0 = 1.0
+    if uw.mpi.size > 1:
+        h0 = uw.mpi.comm.allreduce(h0) / uw.mpi.size
+    units = getattr(mesh.X, "units", None)
+    return h0 if units is None else h0 * units
+
+
+def _boundary_centre(mesh, boundary_coords: np.ndarray) -> np.ndarray:
+    """Parallel-safe centroid of the boundary node coordinates (the centre
+    used for the radial snap-back)."""
+    n_loc = int(boundary_coords.shape[0])
+    s_loc = (boundary_coords.sum(axis=0)
+             if n_loc else np.zeros(mesh.cdim))
+    if uw.mpi.size > 1:
+        from mpi4py import MPI as _MPI
+
+        s = uw.mpi.comm.allreduce(s_loc, op=_MPI.SUM)
+        n = uw.mpi.comm.allreduce(n_loc, op=_MPI.SUM)
+    else:
+        s, n = s_loc, n_loc
+    return s / max(n, 1)
+
+
+def _slip_normals(mesh, boundary_coords: np.ndarray):
+    """Unit outward normals at ``boundary_coords`` from the projected
+    boundary-normal field.
+
+    Re-projects ``mesh._projected_normals`` (``mesh.Gamma_P1``) first so the
+    normals reflect the mesh's *current* coordinates — the projected field is
+    stale after any deform. Returns ``(normals, valid)`` where ``normals`` is
+    ``(k, cdim)`` and ``valid`` is a boolean mask; ``valid`` is ``False`` for
+    nodes with a degenerate (zero / non-finite) normal (e.g. box corners
+    where opposing face normals cancel, or an occasional unlocatable vertex).
+    Such nodes should be pinned, not slipped.
+    """
+    cdim = mesh.cdim
+    n = np.zeros((boundary_coords.shape[0], cdim))
+    try:
+        mesh._update_projected_normals()
+        n = np.asarray(
+            uw.function.evaluate(mesh.Gamma_P1, boundary_coords)
+        ).reshape(-1, cdim)
+    except Exception:
+        # Projection unavailable / degenerate on this mesh — fall back to
+        # all-pinned boundaries (valid stays all-False below).
+        n = np.zeros((boundary_coords.shape[0], cdim))
+    mag = np.linalg.norm(n, axis=1)
+    valid = np.isfinite(mag) & (mag > 0.5)
+    out = np.zeros_like(n)
+    out[valid] = n[valid] / mag[valid, None]
+    return out, valid
+
+
+def _ot_adapt_step(
+    mesh,
+    field,
+    *,
+    refinement=3.0,
+    coarsening="auto",
+    grad_smoothing_length="auto",
+    metric_choice="front-following",
+    fields_to_remap=None,
+    fields_to_zero=None,
+    skip_threshold=None,
+    reference_coords=None,
+    verbose=False,
+) -> bool:
+    r"""Run one OT-reset adapt event. Returns ``True`` if the mesh moved,
+    ``False`` if the skip-on-aligned check short-circuited.
+
+    See the module docstring for the algorithm. ``field`` is the scalar
+    MeshVariable whose gradient drives refinement; it is always FE-remapped
+    onto the adapted mesh. ``reference_coords`` overrides the reset target
+    for this call only (defaults to ``mesh._ot_adapt_reference_coords``).
+
+    ``grad_smoothing_length`` de-noises ``|∇field|`` before the metric is
+    built: ``"auto"`` (default) ≈ the mesh's uniform cell size — the
+    validated setting that keeps the metric clean at production refinement;
+    ``None`` turns it off; a number or Pint length sets it explicitly
+    (user-supplied lengths are unit-aware via the projection's
+    non-dimensionalisation).
+    """
+    cdim = mesh.cdim
+    ref_R = float(refinement)
+    coar = coarsening
+    if coar != "auto":
+        coar = float(coar)
+    # Resolve the gradient de-noising length: "auto" ≈ uniform grid size.
+    if isinstance(grad_smoothing_length, str):
+        if grad_smoothing_length.strip().lower() != "auto":
+            raise ValueError(
+                "grad_smoothing_length string must be 'auto'; got "
+                f"{grad_smoothing_length!r}. Pass None (off) or a "
+                "unit-aware length.")
+        grad_smoothing_length = _auto_grad_smoothing_length(mesh)
+    # R for the alignment clamp matches follow_metric: max(refine, coarsen).
+    coar_val = (ref_R ** (1.0 / cdim)) if coar == "auto" else float(coar)
+    R_clamp = max(ref_R, coar_val)
+
+    if reference_coords is not None:
+        ref_X = np.asarray(reference_coords)
+    else:
+        ref_X = np.asarray(mesh._ot_adapt_reference_coords)
+
+    old_X = np.asarray(mesh.X.coords).copy()
+
+    # Fields to FE-remap: `field` is always remapped; append extras (deduped).
+    remap = [field]
+    for f in (fields_to_remap or []):
+        if f is not field and f not in remap:
+            remap.append(f)
+    old_data = {f: np.asarray(f.data).copy() for f in remap}
+
+    # For radial coordinate systems (where boundary slip is used), create the
+    # projected-normal field up front — before the metric builder / OT mover
+    # set up any solver DM. Creating that MeshVariable mid-mover would stale
+    # those DM handles (see project_uw3_smoother_footguns). Cartesian meshes
+    # pin their boundary (no slip), so no normal field is needed there.
+    if _is_radial_coords(mesh):
+        try:
+            mesh._update_projected_normals()
+        except Exception:
+            pass
+
+    # --- skip-on-aligned -------------------------------------------------
+    if skip_threshold is not None:
+        rho_now = uw.meshing.metric_density_from_gradient(
+            mesh, field, refinement=ref_R, coarsening=coar,
+            metric_choice=metric_choice,
+            gradient_smoothing_length=grad_smoothing_length,
+            degree=1, name="ot_adapt_skip")
+        mm = uw.meshing.mesh_metric_mismatch(
+            mesh, rho_now, resolution_ratio=R_clamp)
+        if mm["misalignment"] < float(skip_threshold):
+            if verbose:
+                uw.pprint(
+                    f"  OT_adapt: skip — misalignment "
+                    f"{mm['misalignment']:.3f} < {float(skip_threshold):.3f}")
+            return False
+
+    # --- step 1: capture `field` at the reference-mesh DOF positions -----
+    mesh._deform_mesh(ref_X)
+    ref_field_coords = np.asarray(field.coords).copy()
+    mesh._deform_mesh(old_X)
+    field.data[...] = old_data[field]
+    field_at_ref = np.asarray(
+        uw.function.evaluate(field.sym[0], ref_field_coords)).reshape(-1)
+
+    # --- step 2: load the reference (clean) mesh with the remapped field -
+    mesh._deform_mesh(ref_X)
+    field.data[:, 0] = field_at_ref
+
+    # --- step 3: build the gradient metric + run the OT mover ------------
+    rho = uw.meshing.metric_density_from_gradient(
+        mesh, field, refinement=ref_R, coarsening=coar,
+        metric_choice=metric_choice,
+        gradient_smoothing_length=grad_smoothing_length,
+        degree=1, name="ot_adapt")
+    uw.meshing.smooth_mesh_interior(
+        mesh, metric=rho, method="ot", boundary_slip=True,
+        method_kwargs=dict(n_outer=_OT_N_OUTER, relax=_OT_RELAX,
+                           step_frac=_OT_STEP_FRAC),
+        verbose=verbose)
+    new_X = np.asarray(mesh.X.coords).copy()
+
+    # --- step 4: FE-remap all fields from old_X onto the adapted mesh ----
+    # The metric-canvas write to `field` (step 2) is discarded here by
+    # design: every remapped field is re-derived from its *original*
+    # (old_X) data, so the final field is the true physical field carried
+    # onto the new positions.
+    new_coords = {f: np.asarray(f.coords).copy() for f in remap}
+    mesh._deform_mesh(old_X)
+    for f in remap:
+        f.data[...] = old_data[f]
+    remapped = {}
+    for f in remap:
+        val = np.asarray(uw.function.evaluate(f.sym, new_coords[f]))
+        remapped[f] = val.reshape(np.asarray(f.data).shape)
+    mesh._deform_mesh(new_X)
+    for f in remap:
+        f.data[...] = remapped[f]
+    for f in (fields_to_zero or []):
+        f.data[...] = 0.0
+
+    return True
diff --git a/src/underworld3/meshing/smoothing.py b/src/underworld3/meshing/smoothing.py
index c5038132..64aa2a96 100644
--- a/src/underworld3/meshing/smoothing.py
+++ b/src/underworld3/meshing/smoothing.py
@@ -17,11 +17,45 @@
 neighbours for vertices on the rank partition boundary, producing
 visibly wrong updates along the rank cut.
 
+Two operators:
+  - ``metric=None`` (default): the graph-Laplacian Jacobi sweeps
+    described above — equalises connectivity, makes cells equant.
+  - ``metric=<expr>``: an **elastic-spring network** relaxed
+    toward equilibrium (the default metric path). Every edge is a
+    spring whose *rest length* is ``∝ ρ_tgt^(-1/dim)`` (finer where
+    ``ρ_tgt = metric`` is large), normalised so the mean rest
+    length equals the current mean edge length (scale preserved —
+    pure redistribution). A position-based Jacobi relaxation moves
+    interior nodes toward rest-length-consistent positions; a
+    coherent global signed-area backtrack prevents inversion. A
+    Lagrangian density (``r0`` set once to the original radius,
+    then ``f(r0.sym)``) keeps the rest lengths fixed per material
+    point. **Status: under development** — the fixed-topology
+    Jacobi relaxation currently reaches only weak grading
+    (deep/near ≈ 1.03 for an 8× target) and can stall against the
+    tangle guard; a proper equilibrium solve / preconditioning is
+    being investigated.
+
+The optimal-transport / Monge–Ampère mesh-potential approach
+(``_winslow_elliptic``, preserved, not the default) was
+exhaustively investigated 2026-05-16 and found to cap at the same
+~1.07 for every variant (linear / recovered-Hessian / convex-branch
+BFO / outer composition). That *every* dissimilar method
+(graph-Laplacian, weighted-Laplacian, MA-all-variants, elastic
+spring) converges to deep/near ≈ 1.03–1.07 while the *exact*
+equidistribution at the same fixed topology is ~10× points to a
+common missing ingredient (large coherent long-range node
+transport is throttled by pinned-boundary + tangle-guard local
+relaxation). Open investigation: elastic-spring redistribution as
+a *preconditioner* for the MA solve. See ``scripts/ma_*.py`` and
+the project memory.
+
 Future extensions (separate PRs):
   - PR B: nicer pinning API (per-boundary explicit lists, callable
     masks)
-  - PR C: non-uniform metric (swarm-anchored target spacing,
-    mirroring ``mesh.adapt`` semantics)
+  - parallel-exact spring forces (cross-rank edge-force assembly,
+    mirroring the Jacobi-path adjacency Mat); currently the spring
+    path is serial-exact (rank-boundary nodes under-count forces)
 """
 
 from typing import Optional, Sequence
@@ -35,6 +69,68 @@
 # Rebuilt automatically when the mesh topology changes.
 _ADJ_CACHE: dict = {}
 
+# Cache of the **original** (undeformed) state per mesh,
+# captured the first time follow_metric is called on that mesh:
+#   h0           — mean edge length
+#   rest_coords  — vertex positions (the spring's pull-back target)
+# Subsequent calls reuse these references instead of measuring the
+# (already-refined) current mesh, otherwise the spring's reference
+# state shrinks at every adapt and the refinement compounds,
+# crashing the CFL-bound dt by 2× per adapt step.
+# Keyed by id(mesh).
+_FOLLOW_METRIC_H0_CACHE: dict = {}
+_FOLLOW_METRIC_REST_CACHE: dict = {}
+
+
+# Named adaptation strategies (off / vlow / low / med / high /
+# extreme). Each maps to a coherent set of (amp, percentile
+# window, power, R, skip_threshold) values. Use the
+# ``strategy=`` kwarg on :func:`metric_density_from_gradient`
+# and :func:`smooth_mesh_interior` to dial intensity; individual
+# kwargs still work and override the strategy choice where given.
+ADAPT_STRATEGIES = {
+    "off":     dict(amp=0.0, lo_percentile=0.0,
+                    hi_percentile=100.0, power=1.0,
+                    resolution_ratio=1.0,
+                    skip_threshold=None,
+                    description="no adaptation (no-op)"),
+    "vlow":    dict(amp=4.0, lo_percentile=80.0,
+                    hi_percentile=99.0, power=1.0,
+                    resolution_ratio=1.2,
+                    skip_threshold=0.9,
+                    description="hardly any refinement; "
+                                "top 20% gradient cells only"),
+    "low":     dict(amp=6.0, lo_percentile=70.0,
+                    hi_percentile=97.0, power=1.0,
+                    resolution_ratio=1.3,
+                    skip_threshold=0.9,
+                    description="gentle front bunching"),
+    "med":     dict(amp=7.0, lo_percentile=60.0,
+                    hi_percentile=97.0, power=1.0,
+                    resolution_ratio=1.4,
+                    skip_threshold=0.9,
+                    description="moderate front bunching "
+                                "(default)"),
+    "high":    dict(amp=8.0, lo_percentile=50.0,
+                    hi_percentile=97.0, power=1.0,
+                    resolution_ratio=1.5,
+                    skip_threshold=0.9,
+                    description="front-following — historical "
+                                "production point"),
+    "extreme": dict(amp=8.0, lo_percentile=50.0,
+                    hi_percentile=97.0, power=1.5,
+                    resolution_ratio=2.0,
+                    skip_threshold=0.9,
+                    description="midway to gradient-uniform; "
+                                "near the danger zone for the "
+                                "mover — use deliberately"),
+}
+
+# Sentinel used to detect whether a kwarg was explicitly set by
+# the caller versus left at the function default. Lets us layer
+# strategy defaults beneath explicit user overrides cleanly.
+_UNSET = object()
+
 
 def _auto_pinned_labels(mesh) -> tuple:
     """All non-sentinel geometric boundary labels on the mesh.
@@ -165,11 +261,8 @@ def _build_adjacency_matrix(mesh):
         Clone of ``mesh.dm`` with a 1-dof-per-vertex section. Owns
         the parallel layout for the Mat and any vectors of the same
         shape.
-    local_to_global_owned : numpy.ndarray, shape (n_owned,)
-        ``local_to_global_owned[i]`` is the offset (in the *local*
-        owned portion of the global Vec) at which the ``i``-th
-        OWNED local vertex appears. Use this to pack/unpack between
-        ``coords[is_owned, d]`` and ``vec.array``.
+    gsection : PETSc.Section
+        Global section of ``dm_scalar`` — the owned-vertex numbering.
     """
     from petsc4py import PETSc
     dm = mesh.dm
@@ -201,6 +294,2260 @@ def gidx(p):
     return A, dm_scalar, gsection
 
 
+# Cached spring-smoother topology state keyed by (mesh-id,
+# pinned-labels, topology): the edge vertex-index pairs and per-node
+# incident-edge degree. Rebuilt automatically on a topology change
+# (remesh / adapt / repartition), which produces a new cache key.
+_SPRING_CACHE: dict = {}
+
+
+def _min_incident_edge(dm, coords):
+    """Per-vertex minimum incident edge length (local-chart
+    v-pStart order). Used as an optional secondary per-node cap on
+    the spring step (the primary tangle guard is the coherent global
+    signed-area backtrack in ``_winslow_spring``)."""
+    pStart, pEnd = dm.getDepthStratum(0)
+    eStart, eEnd = dm.getDepthStratum(1)
+    h = np.full(pEnd - pStart, np.inf)
+    for e in range(eStart, eEnd):
+        cone = dm.getCone(e)
+        if len(cone) != 2:
+            continue
+        v0, v1 = cone[0], cone[1]
+        if not (pStart <= v0 < pEnd and pStart <= v1 < pEnd):
+            continue
+        i0, i1 = v0 - pStart, v1 - pStart
+        L = float(np.linalg.norm(coords[i0] - coords[i1]))
+        if L < h[i0]:
+            h[i0] = L
+        if L < h[i1]:
+            h[i1] = L
+    return h
+
+
+def _tri_cells(dm):
+    """Triangle vertex-index triples (local-chart, v-pStart order).
+
+    Returns an ``(n_tri, 3)`` int array, or ``None`` if the mesh is
+    not all-triangle (then the global signed-area backtrack is
+    skipped and only the optional per-node edge cap guards against
+    tangling).
+    """
+    cStart, cEnd = dm.getHeightStratum(0)
+    pStart, pEnd = dm.getDepthStratum(0)
+    tris = []
+    for c in range(cStart, cEnd):
+        closure = dm.getTransitiveClosure(c)[0]
+        vs = [p - pStart for p in closure if pStart <= p < pEnd]
+        if len(vs) != 3:
+            return None
+        tris.append(vs)
+    if not tris:
+        return None
+    return np.asarray(tris, dtype=np.int64)
+
+
+def _signed_areas(coords, tris):
+    """Signed area of each triangle (sign = orientation)."""
+    a = coords[tris[:, 0]]
+    b = coords[tris[:, 1]]
+    c = coords[tris[:, 2]]
+    return 0.5 * ((b[:, 0] - a[:, 0]) * (c[:, 1] - a[:, 1])
+                  - (b[:, 1] - a[:, 1]) * (c[:, 0] - a[:, 0]))
+
+
+def mesh_metric_mismatch(mesh, metric, resolution_ratio=None):
+    r"""Geometric mismatch between the current mesh and what the
+    equidistribution rule would prescribe from ``metric``.
+
+    Per cell compute the equidistribution-prescribed area
+    ``A_target = A_total · (1/ρ_c) / Σ(1/ρ)`` (the conservation
+    law of §1 in ``mesh-adaptation-formulation.md``). When the
+    mover's eigen-clamp ``[h0/R, h0·R]`` is in play, clip the
+    target so it represents what the mover can *actually*
+    achieve, not the unbounded ideal. Then
+
+    .. math::
+
+        \delta_c = \tfrac12\,\log\!\Big(
+            \frac{A_{\mathrm{actual},c}}{A_{\mathrm{target},c}}\Big)
+
+    (signed, log-space symmetric: a 2× refine needed = +0.35;
+    a 2× coarsen needed = -0.35). Scale-invariant under
+    ``ρ → αρ``.
+
+    Returns ``{"rms": ..., "max": ..., "median_abs": ...}``
+    summarising ``|δ|`` over cells. A mesh already at the
+    mover's achievable equidistribution gives ~0; the
+    pre-adapted mesh against a strongly-peaked metric gives
+    O(1) or larger.
+
+    Cheap: one ``metric`` evaluate at cell centroids + a few
+    NumPy reductions. Used by
+    :func:`smooth_mesh_interior(skip_threshold=...)` to skip
+    adapting when the mesh is already aligned with the target.
+
+    Parameters
+    ----------
+    mesh : underworld3.discretisation.Mesh
+        Triangle mesh (only 2-D for now).
+    metric : sympy / UW expression
+        The target *density* ρ (larger ⇒ finer cells) — same
+        object you would pass to ``smooth_mesh_interior``.
+    resolution_ratio : float, optional
+        The mover's eigen-clamp ``R``. When given, the
+        equidistribution target areas are clipped to
+        ``[A_mean / R², A_mean · R²]`` — the achievable band the
+        mover honours — so a perfectly-adapted mesh measures
+        ``δ ≈ 0``. Without it, mismatch is measured against the
+        unbounded equidistribution target (so even a
+        perfectly-adapted mesh has ``δ ≠ 0`` against the
+        unreachable ideal).
+
+    Returns
+    -------
+    dict
+        ``{"rms": float, "max": float, "median_abs": float}``.
+    """
+    import underworld3 as _uw
+
+    coords = np.asarray(mesh.X.coords)
+    tris = _tri_cells(mesh.dm)
+    if tris is None:
+        raise NotImplementedError(
+            "mesh_metric_mismatch: triangle mesh required")
+    A_actual = np.abs(_signed_areas(coords, tris))
+    centroids = coords[tris].mean(axis=1)
+    rho = np.asarray(_uw.function.evaluate(
+        metric, centroids)).reshape(-1)
+    rho = np.maximum(rho, 1.0e-12)   # guard
+    inv_rho = 1.0 / rho
+    A_target = A_actual.sum() * inv_rho / inv_rho.sum()
+    if resolution_ratio is not None:
+        R = float(resolution_ratio)
+        A_mean = A_actual.sum() / len(A_actual)
+        # Clip target areas to the mover's achievable band
+        # [A_mean/R², A_mean·R²] (h in [h0/R, h0·R] ⇒
+        # A in [h0²/R², h0²·R²] = [A_mean/R², A_mean·R²]).
+        A_target = np.clip(A_target, A_mean / R ** 2,
+                           A_mean * R ** 2)
+    delta = 0.5 * np.log(A_actual / A_target)
+    abs_delta = np.abs(delta)
+
+    # Alignment — Pearson r of log(1/A_c) with log(ρ_c).
+    # Equidistribution gives log(1/A) ∝ (1/d)·log(ρ) ⇒ r → 1.
+    # Uniform mesh has nearly-zero sd(log A) ⇒ r ≈ 0.
+    # An over-aggressive mover that overshoots in proportional
+    # fashion still has r ≈ 1 (just with the wrong slope), so r
+    # measures whether cell density is *aligned with* the metric,
+    # independent of grading magnitude. This is the right signal
+    # for "is this mesh built around this metric?" — and the
+    # appropriate skip-or-adapt criterion in a dynamic loop.
+    log_density = -np.log(A_actual)
+    log_rho = np.log(rho)
+    if log_density.std() > 1.0e-12 and log_rho.std() > 1.0e-12:
+        alignment = float(np.corrcoef(log_density, log_rho)[0, 1])
+    else:
+        alignment = 0.0
+    # Misalignment: 0 = perfectly aligned, 1 = orthogonal.
+    misalignment = float(
+        np.sqrt(max(0.0, 1.0 - max(0.0, alignment) ** 2)))
+
+    return dict(rms=float(np.sqrt(np.mean(delta ** 2))),
+                max=float(abs_delta.max()),
+                median_abs=float(np.median(abs_delta)),
+                alignment=alignment,
+                misalignment=misalignment)
+
+
+def _edge_pairs(dm):
+    """``(n_edge, 2)`` int array of edge endpoint vertex indices in
+    local-chart (v - pStart) order — the spring network's bars.
+
+    Skips edges whose endpoints are not both in the local vertex
+    stratum (rank-ghost incomplete edges); the spring path is
+    serial-exact (see module docstring)."""
+    pStart, pEnd = dm.getDepthStratum(0)
+    eStart, eEnd = dm.getDepthStratum(1)
+    pairs = []
+    for e in range(eStart, eEnd):
+        cone = dm.getCone(e)
+        if len(cone) != 2:
+            continue
+        v0, v1 = cone[0], cone[1]
+        if not (pStart <= v0 < pEnd and pStart <= v1 < pEnd):
+            continue
+        pairs.append((v0 - pStart, v1 - pStart))
+    if not pairs:
+        return np.zeros((0, 2), dtype=np.int64)
+    return np.asarray(pairs, dtype=np.int64)
+
+
+def _winslow_spring(mesh, metric, pinned_labels, verbose,
+                    n_sweeps=300, relax=None, step_frac=None,
+                    boundary_slip=False, shape_w=1.0, size_w=8.0):
+    r"""Metric-driven mesh grading by elastic-spring equilibrium.
+
+    Every mesh edge is a linear spring whose *rest length* is set
+    from the target density,
+
+    .. math::
+
+        L^0_{ij} \;\propto\; \rho_{\mathrm{tgt}}^{-1/d},
+
+    scaled once so the total rest length equals the total current
+    edge length (overall scale preserved — pure redistribution).
+    The interior nodes are moved to the **mechanical equilibrium**
+    by *minimising the truss energy*
+
+    .. math::
+
+        E(\mathbf{x}) \;=\; \tfrac12 \sum_{e}
+        \big(\,|\mathbf{x}_i-\mathbf{x}_j| - L^0_e\,\big)^2
+
+    over the free (non-pinned) nodes with **nonlinear conjugate
+    gradients** (Polak–Ribière⁺) and an Armijo line search whose
+    trial step is rejected if any cell would invert. Solving the
+    equilibrium — rather than creeping with damped Jacobi sweeps,
+    which stall against a per-sweep global tangle freeze — is what
+    lets the absolute rest-length target actually grade the mesh
+    toward spacing ``∝ ρ_tgt^{-1/d}``.
+
+    ``ρ_tgt`` is Lagrangian (``metric = f(r0)`` with ``r0`` a frozen
+    mesh variable), so the rest lengths are fixed per material node
+    (computed once) and the *design* grading is restored even after
+    the mesh deformed. Uniform ``ρ_tgt`` ⇒ all rest lengths equal
+    the mean edge length ⇒ only a benign mild regularisation toward
+    uniform spacing (no grading change).
+
+    ``n_sweeps`` caps the CG iterations (CG converges far faster
+    than the old Jacobi sweep budget). ``relax`` / ``step_frac`` are
+    unused on the equilibrium path (the CG line search controls the
+    step and the inversion guard) and are kept only for signature
+    stability. ``n_iters`` / ``alpha`` do not apply.
+    """
+    pinned_labels = tuple(pinned_labels)
+    dm = mesh.dm
+    pStart, pEnd = dm.getDepthStratum(0)
+    cStart, cEnd = dm.getHeightStratum(0)
+    cone_size = dm.getConeSize(cStart) if cEnd > cStart else 0
+    n_verts = pEnd - pStart
+    key = (id(mesh), pinned_labels,
+           n_verts, cEnd - cStart, cone_size)
+
+    cache = _SPRING_CACHE.get(key)
+    if cache is None:
+        edges = _edge_pairs(dm)
+        if edges.shape[0] == 0:
+            return
+        deg = np.bincount(
+            edges.ravel(), minlength=n_verts).astype(np.double)
+        deg[deg == 0.0] = 1.0
+        _SPRING_CACHE[key] = (edges, deg)
+    else:
+        edges, deg = cache
+
+    is_bnd = _pinned_mask(dm, pinned_labels)
+    tris = _tri_cells(dm)
+    cdim = mesh.cdim
+    v0 = edges[:, 0]
+    v1 = edges[:, 1]
+
+    coords = np.asarray(mesh.X.coords, dtype=np.double).copy()
+
+    # Boundary tangential slip. Fully locking every boundary node
+    # freezes the rim's angular distribution, so near a feature the
+    # interior must distort (the "touchy"/anisotropic refinement).
+    # Instead let boundary nodes SLIDE ALONG the boundary while
+    # staying EXACTLY ON it: each ring gets its OWN centre (robust
+    # if rings are not perfectly concentric) and every slip node is
+    # snapped back to its original distance from that centre after
+    # each step — so a slip node can change θ but can NEVER move
+    # off / away from the surface (the radial DOF is removed, not
+    # just penalised). One node per ring is a hard anchor (kills
+    # the ring's rigid-rotation gauge). The global inversion guard
+    # also blocks a slip node overtaking a neighbour (boundary
+    # self-tangle). TODO: a general deformed / free-surface
+    # boundary needs projection onto the boundary polyline, not a
+    # per-ring radius — circular form is exact for the Annulus.
+    if boundary_slip and is_bnd.any():
+        bc = np.nonzero(is_bnd)[0]
+        c0 = coords[bc].mean(axis=0)
+        rg = np.round(np.linalg.norm(coords[bc] - c0, axis=1), 6)
+        is_anchor = np.zeros(n_verts, dtype=bool)
+        slip_center = np.zeros((n_verts, cdim))
+        slip_rtarget = np.zeros(n_verts)
+        for rv in np.unique(rg):
+            grp = bc[rg == rv]
+            rc = coords[grp].mean(axis=0)        # this ring's centre
+            is_anchor[grp[np.argmax(
+                (coords[grp] - rc)[:, 0])]] = True
+            slip_center[grp] = rc
+            slip_rtarget[grp] = np.linalg.norm(
+                coords[grp] - rc, axis=1)
+        is_slip = is_bnd & ~is_anchor
+        is_pinned = is_anchor
+        sidx = np.nonzero(is_slip)[0]
+        s_ctr = slip_center[sidx]
+        s_rad = slip_rtarget[sidx]
+
+        def _project(Y):
+            v = Y[sidx] - s_ctr
+            nrm = np.linalg.norm(v, axis=1)
+            nrm = np.where(nrm > 1.0e-30, nrm, 1.0)
+            Y[sidx] = s_ctr + v * (s_rad / nrm)[:, None]
+            return Y
+    else:
+        is_pinned = is_bnd
+        is_slip = np.zeros(n_verts, dtype=bool)
+
+        def _project(Y):
+            return Y
+
+    free = ~is_pinned
+
+    # ===== Volumetric spring network (shape ⟂ size, decoupled) ====
+    # EQUAL edge springs (uniform rest length L̄ = current mean
+    # edge) are a pure SHAPE regulariser → equant cells, resists
+    # the slivers/degeneracy the graded-edge form produced. The
+    # SIZE grading lives entirely in a per-CELL area ("volumetric")
+    # constraint: each triangle's area is driven to a target
+    # A0 ∝ 1/ρ_tgt (scaled so ΣA0 = Σ(initial area) ⇒ total area
+    # conserved, pure redistribution). Both energy terms are
+    # written as *relative* squared errors so the shape/size
+    # weights (shape_w, size_w) are pure dimensionless knobs.
+    e_vec = coords[v1] - coords[v0]
+    L_cur = np.linalg.norm(e_vec, axis=1)
+    sum_L = float(L_cur.sum())
+    n_e = float(L_cur.size)
+    if uw.mpi.size > 1:
+        sum_L = uw.mpi.comm.allreduce(sum_L)
+        n_e = uw.mpi.comm.allreduce(n_e)
+    Lbar = sum_L / max(n_e, 1.0)          # uniform edge rest length
+    L0 = np.full_like(L_cur, Lbar)
+    L0_mean = Lbar
+
+    # Per-cell target area from ρ_tgt at the (initial) centroid.
+    # Lagrangian metric ⇒ computed ONCE (rides material points).
+    if tris is not None:
+        ca = coords[tris[:, 0]]
+        cb = coords[tris[:, 1]]
+        cc = coords[tris[:, 2]]
+        cent = (ca + cb + cc) / 3.0
+        rho_c = np.asarray(
+            uw.function.evaluate(metric, cent)).reshape(-1)
+        rho_c = np.maximum(rho_c, 1.0e-30)
+        a_init = np.abs(_signed_areas(coords, tris))
+        inv = 1.0 / rho_c
+        sA = float(a_init.sum())
+        sI = float(inv.sum())
+        if uw.mpi.size > 1:
+            sA = uw.mpi.comm.allreduce(sA)
+            sI = uw.mpi.comm.allreduce(sI)
+        A0 = (sA / max(sI, 1.0e-30)) * inv     # ΣA0 = Σa_init
+        A0 = np.maximum(A0, 1.0e-30)
+        ti0, ti1, ti2 = tris[:, 0], tris[:, 1], tris[:, 2]
+    else:
+        A0 = None
+
+    # ---- Solve the truss EQUILIBRIUM, not Jacobi creep ----------
+    # Minimise the spring energy  E(x) = ½ Σ_e (|x_i−x_j| − L0_e)²
+    # over the interior nodes (boundary pinned) by nonlinear
+    # conjugate gradients (Polak–Ribière⁺) with an Armijo line
+    # search whose trial step is REJECTED if any cell would invert
+    # — the tangle guard is inside the optimiser, so it converges to
+    # the true equilibrium instead of stalling against a per-sweep
+    # global freeze (the Jacobi relaxation's failure mode).
+    free_idx = np.nonzero(free)[0]
+    n_free = int(free_idx.size)
+    if n_free == 0:
+        mesh._deform_mesh(coords)
+        return
+
+    if tris is not None:
+        orient = np.sign(np.median(_signed_areas(coords, tris)))
+        orient = orient if orient != 0.0 else 1.0
+
+    def _allsum(s):
+        if uw.mpi.size > 1:
+            return uw.mpi.comm.allreduce(float(s))
+        return float(s)
+
+    def _feasible(X):
+        if tris is None:
+            return True
+        amin = float((_signed_areas(X, tris) * orient).min())
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            amin = uw.mpi.comm.allreduce(amin, op=_MPI.MIN)
+        return amin > 0.0
+
+    have_area = (A0 is not None) and (cdim == 2)
+
+    def _tri_signed(X):
+        a, b, c = X[ti0], X[ti1], X[ti2]
+        return 0.5 * ((b[:, 0] - a[:, 0]) * (c[:, 1] - a[:, 1])
+                      - (b[:, 1] - a[:, 1]) * (c[:, 0] - a[:, 0]))
+
+    def _energy(X):
+        ev = X[v1] - X[v0]
+        L = np.sqrt((ev * ev).sum(axis=1))
+        re = (L - Lbar) / Lbar               # relative edge error
+        E = shape_w * _allsum((re * re).sum())
+        if have_area:
+            area = orient * _tri_signed(X)
+            ra = (area - A0) / A0            # relative area error
+            E += size_w * _allsum((ra * ra).sum())
+        return E
+
+    def _energy_grad(X):
+        ev = X[v1] - X[v0]
+        L = np.sqrt((ev * ev).sum(axis=1))
+        Ls = np.maximum(L, 1.0e-30)
+        re = (L - Lbar) / Lbar
+        E = shape_w * _allsum((re * re).sum())
+        G = np.zeros_like(X)
+        # equal-spring shape term: 2·shape_w·re/(Lbar·L)·ev
+        ce = (2.0 * shape_w * re / (Lbar * Ls))[:, None]
+        np.add.at(G, v1, ce * ev)
+        np.add.at(G, v0, -ce * ev)
+        if have_area:
+            a, b, c = X[ti0], X[ti1], X[ti2]
+            S = 0.5 * ((b[:, 0] - a[:, 0]) * (c[:, 1] - a[:, 1])
+                       - (b[:, 1] - a[:, 1]) * (c[:, 0] - a[:, 0]))
+            area = orient * S
+            ra = (area - A0) / A0
+            E += size_w * _allsum((ra * ra).sum())
+            # ∂(area)/∂· = orient · ∂S/∂· (signed-area vertex grads)
+            fac = (2.0 * size_w * ra / A0 * orient)[:, None]
+            gA = np.empty_like(a)
+            gB = np.empty_like(a)
+            gC = np.empty_like(a)
+            gA[:, 0] = 0.5 * (b[:, 1] - c[:, 1])
+            gA[:, 1] = 0.5 * (c[:, 0] - b[:, 0])
+            gB[:, 0] = 0.5 * (c[:, 1] - a[:, 1])
+            gB[:, 1] = 0.5 * (a[:, 0] - c[:, 0])
+            gC[:, 0] = 0.5 * (a[:, 1] - b[:, 1])
+            gC[:, 1] = 0.5 * (b[:, 0] - a[:, 0])
+            np.add.at(G, ti0, fac * gA)
+            np.add.at(G, ti1, fac * gB)
+            np.add.at(G, ti2, fac * gC)
+        G[~free] = 0.0
+        return E, G
+
+    # Jacobi (diagonal) preconditioner: the truss Hessian is
+    # graph-Laplacian-structured (cond ~ (1/h)²), so plain CG crawls
+    # for fine meshes. M⁻¹ = diag(1/deg) — the Laplacian diagonal
+    # scale, free here since `deg` is already cached — clusters the
+    # spectrum and gives the order-of-magnitude convergence speed-up
+    # that turns "stuck at ~1.04" into the true graded minimum.
+    invdeg = (1.0 / deg)[:, None]
+
+    X = _project(coords.copy())
+    E, G = _energy_grad(X)
+    g0 = max(_allsum((G * G).sum()) ** 0.5, 1.0e-30)
+    r = -G
+    s = r * invdeg
+    s[~free] = 0.0
+    d = s.copy()
+    delta_new = _allsum((r * s).sum())
+    dmax = max(float(np.linalg.norm(d[free_idx], axis=1).max()),
+               1.0e-30)
+    if uw.mpi.size > 1:
+        from mpi4py import MPI as _MPI
+        dmax = uw.mpi.comm.allreduce(dmax, op=_MPI.MAX)
+    t0 = 0.5 * L0_mean / dmax
+    c_arm = 1.0e-4
+    max_iter = int(n_sweeps)
+    for it in range(max_iter):
+        gnorm = _allsum((G * G).sum()) ** 0.5
+        if gnorm <= 1.0e-8 * g0:
+            break
+        slope = _allsum((G * d).sum())       # = −(r·d)
+        if slope >= 0.0:                     # not descent → restart
+            d = s.copy()
+            slope = _allsum((G * d).sum())
+            if slope >= 0.0:
+                break
+        t = t0
+        accepted = False
+        for _ls in range(50):
+            Xt = X.copy()
+            Xt[free_idx] += t * d[free_idx]
+            Xt = _project(Xt)                # slip nodes → boundary
+            if _feasible(Xt):
+                Et = _energy(Xt)
+                if Et <= E + c_arm * t * slope:
+                    accepted = True
+                    break
+            t *= 0.5
+        if not accepted:
+            break                            # at equilibrium / stuck
+        Et, Gt = _energy_grad(Xt)
+        r_new = -Gt
+        s_new = r_new * invdeg
+        s_new[~free] = 0.0
+        delta_old = delta_new
+        delta_mid = _allsum((r_new * s).sum())
+        delta_new = _allsum((r_new * s_new).sum())
+        beta = max(0.0, (delta_new - delta_mid)
+                   / max(delta_old, 1.0e-30))   # preconditioned PR⁺
+        X, E, G = Xt, Et, Gt
+        d = s_new + beta * d
+        s = s_new
+        t0 = min(2.0 * t, 100.0 * t0)        # grow but stay sane
+
+        if verbose and (it % 25 == 0 or it == max_iter - 1):
+            ev = X[v1] - X[v0]
+            L = np.sqrt((ev * ev).sum(axis=1))
+            rms = (_allsum(((L - L0) ** 2).sum())
+                   / max(_allsum(L0.size), 1.0)) ** 0.5
+            uw.pprint(
+                f"  spring PCG iter {it+1}/{max_iter}: "
+                f"E={E:.4e}  rms(L-L0)/L0="
+                f"{rms / max(L0_mean, 1e-30):.3e}  |g|={gnorm:.2e}")
+
+    coords = X
+    mesh._deform_mesh(coords)
+
+
+# ======================================================================
+#  Monge–Ampère mesh-equidistribution machinery (PRESERVED, not the
+#  default metric path). Exhaustively investigated 2026-05-16: every
+#  FE-MA-potential variant (linear / recovered-Hessian smoothed &
+#  variational / BFO convex-branch + damping / outer composition)
+#  caps at deep/near ≈ 1.07 for an 8× target vs an exact ~10× — see
+#  the project memory and scripts/ma_*.py. Kept because (a) the
+#  "bit-identical across variants" result suggests a common missing
+#  ingredient worth understanding, and (b) the elastic-spring
+#  redistribution may work as a *preconditioner* for the MA solve
+#  (a graded starting mesh might let MA escape the weak branch) —
+#  an open investigation. Call _winslow_elliptic() directly to use.
+# ======================================================================
+
+# Cached MA solver state keyed by (mesh-id, pinned-labels, topology):
+# the φ Poisson, the variational Hessian-recovery solver, ∇φ
+# projector, the ρ_cur proxy field. Rebuilt on a topology change.
+_WINSLOW_CACHE: dict = {}
+
+# Sign of the BFO source vs UW3's SNES_Poisson convention
+# (SNES_Poisson F0 = -f, strong form Δφ = -ps.f). With this sign the
+# validated linear first iterate Δφ = (c/ρ_tgt - 1) grades the right
+# way (nodes toward high target density).
+_EQUIDIST_SIGN = -1.0
+
+_HESSIAN_CLASS = None
+
+# Cached anisotropic-mover state keyed by (mesh-id, pinned-labels,
+# topology, solver, φ-order, slip): the ∇ρ projector, the
+# eigen-clamped metric-tensor field D, and the cdim displacement
+# Poisson solvers (all sharing the tensor operator _c = D). Rebuilt
+# on a topology change (a new key).
+_ANISO_CACHE: dict = {}
+
+# Cached state for the OT-improvement-step path (one weighted-
+# Poisson per call). Keyed like the other movers; same lifetime.
+_OT_CACHE: dict = {}
+
+# Per-(mesh,config) running state for the equidistribution
+# normaliser's temporal damping: the EMA of ln G carried across
+# adaptation events (same key as _ANISO_CACHE). Empty ⇒ first
+# event seeds it. Only touched in the resolution_ratio>1 regime.
+_GEMA_STATE: dict = {}
+
+
+def _use_direct_solver(solver, singular=False):
+    r"""Force a cached MA sub-solver onto a sparse **direct** factorisation
+    (MUMPS LU) instead of the UW3 default GMRES + GAMG.
+
+    Why this is the dominant MA-efficiency lever (profiled 2026-05-17,
+    res-16 Annulus, AMP=8, warm re-call): the Picard loop fixes the
+    mesh, so the φ-Poisson Laplacian and the Hessian-recovery SPD mass
+    matrix are *constant operators* re-solved ~40× with only the RHS
+    changing. With GAMG, every ``solve()`` pays a full multigrid
+    **setup** (the constant near-nullspace re-attach forces it) — the
+    Hessian solve alone was ~0.93 s/iter ≈ 37 s. These problems are
+    tiny (≲10⁴ DOF); MUMPS factorises in milliseconds and the per-iter
+    cost collapses to a back-substitution. A direct solve is also
+    *exact* (machine precision, tighter than the GMRES rtol), so the
+    Picard fixed point — hence the grading/quality — is unchanged.
+
+    ``singular=True`` (the pure-Neumann φ Poisson): MUMPS null-pivot
+    detection (ICNTL(24)=1) handles the rank-1-deficient operator; the
+    ``constant_nullspace`` hook still removes the constant mode from
+    the RHS/solution, so the result is the same consistent solution
+    the iterative path produced — but it also eliminates the
+    GAMG-on-pure-Neumann ``DIVERGED_LINEAR_SOLVE`` re-solve pathology.
+    """
+    o = solver.petsc_options
+    # These three sub-problems are *linear* (φ Poisson with the Hessian
+    # source frozen; the SPD Hessian-recovery mass system; the ∇φ
+    # projection) → one KSP solve, no Newton line-search / 2nd iterate
+    # (which was doubling work and emitting spurious
+    # ``DIVERGED_LINEAR_SOLVE`` after 2 iters).
+    o["snes_type"] = "ksponly"
+    # ksponly does exactly ONE linear KSP solve (no Newton). Default
+    # snes_max_it leaves snes->iter=0, so if a converged-reason
+    # viewer is on (a user's global -snes_converged_reason, an outer
+    # debug flag, …) PETSc mislabels the *successful* linear solve
+    # as "DIVERGED_MAX_IT iterations 0" and floods the log with
+    # phantom failures. snes_max_it=1 ⇒ the single solve counts as
+    # one converged iteration ⇒ reason = CONVERGED, not a fake
+    # DIVERGED. Numerically inert (the KSP solve is identical) —
+    # purely stops these linear sub-solves masquerading as failures.
+    o["snes_max_it"] = 1
+    # The Picard loop fixes the mesh, so the operator is **constant**
+    # across the ~40 inner solves — only the RHS changes. Lag the
+    # Jacobian (compute once, reuse) and the preconditioner (factorise
+    # once, reuse): every subsequent inner solve collapses to a MUMPS
+    # back-substitution. A fresh ``solver.solve()`` after
+    # ``_deform_mesh`` rebuilds the SNES (is_setup=False) so the lag
+    # counter resets and the operator is correctly re-factorised on the
+    # first solve of the next call — the reuse is confined to the loop
+    # where the mesh genuinely does not move.
+    o["snes_lag_jacobian"] = -2
+    o["snes_lag_preconditioner"] = -2
+    o["ksp_type"] = "preonly"
+    o["pc_type"] = "lu"
+    o["pc_factor_mat_solver_type"] = "mumps"
+    if singular:
+        o["mat_mumps_icntl_24"] = 1   # null-pivot detection
+        o["mat_mumps_icntl_25"] = 0   # one solution of the singular sys
+    # GAMG-only keys are inert once pc_type≠gamg; drop them so the
+    # effective option set is exactly what is documented.
+    for k in ("pc_gamg_type", "pc_gamg_repartition", "pc_mg_type",
+              "pc_gamg_agg_nsmooths", "mg_levels_ksp_max_it",
+              "mg_levels_ksp_converged_maxits"):
+        try:
+            o.delValue(k)
+        except Exception:
+            pass
+
+
+def _use_iterative_solver(solver, singular=False, elliptic=True):
+    r"""Parallel-scalable alternative to ``_use_direct_solver``: keep
+    the *same factor/setup-once-reuse pattern* (the real efficiency
+    lever) but with an **iterative** PC so it scales beyond the
+    serial / modest-size regime where sparse direct factorisation is
+    viable (this PETSc build has only MUMPS + serial builtin LU — no
+    hypre / SuperLU_DIST).
+
+    The Picard loop fixes the mesh ⇒ the operator is constant across
+    the ~25 inner solves; ``snes_lag_jacobian=-2`` /
+    ``snes_lag_preconditioner=-2`` build the PC **once per
+    ``_winslow_elliptic`` call** and reuse it for every inner solve
+    (the GAMG hierarchy / Jacobi diagonal is *not* rebuilt per
+    iteration — that per-iter GAMG re-setup was the original ~0.9 s
+    Hessian cost). ``_deform_mesh`` resets ``is_setup`` so the lag
+    counter resets and the PC is correctly rebuilt on the next call's
+    first solve. Combined with a Krylov **warm start** from the
+    previous Picard φ (caller passes ``zero_init_guess=False``), the
+    inner solves are a handful of CG iterations on an already-built
+    hierarchy.
+
+    ``elliptic=True`` (the φ-Poisson Laplacian): CG + GAMG with the
+    constant near-nullspace (already attached via
+    ``constant_nullspace`` — GAMG needs it for the pure-Neumann
+    operator). ``elliptic=False`` (the SPD Hessian-recovery / ∇φ mass
+    systems): a mass matrix is spectrally trivial — CG + Jacobi
+    converges in a few iterations with **no** hierarchy setup, fully
+    parallel; GAMG there would be wasted setup.
+
+    Numerics: an iterative solve to a tight ``ksp_rtol`` reproduces
+    the BFO Picard fixed point — hence the grading — to well within
+    its 4-dp precision (validated against the direct path); it is a
+    *cost/parallelism* change, not a formulation change.
+    """
+    o = solver.petsc_options
+    o["snes_type"] = "ksponly"
+    # See _use_direct_solver: snes_max_it=1 stops a converged-reason
+    # viewer mislabelling these linear ksponly sub-solves as
+    # "DIVERGED_MAX_IT iterations 0". Numerically inert.
+    o["snes_max_it"] = 1
+    o["snes_lag_jacobian"] = -2
+    o["snes_lag_preconditioner"] = -2
+    # Krylov choice is per-operator (set in the branches below):
+    #  * elliptic φ-Poisson → FGMRES. The UW3 DMPlex-FEM assembly +
+    #    Neumann/nullspace handling does not guarantee an *exactly*
+    #    symmetric operator, and the GAMG **SOR smoother is
+    #    non-symmetric**, so the preconditioner is non-SPD — CG's
+    #    assumptions are violated (it only "worked" here by
+    #    robustness margin). FGMRES tolerates a non-symmetric
+    #    operator *and* a varying/non-symmetric preconditioner.
+    #  * mass systems (Hessian recovery, ∇φ projection) → CG: a
+    #    consistent mass matrix with a Jacobi PC is provably SPD and
+    #    symmetric, so CG is correct and the cheapest option.
+    # Inner solve inside an outer BFO Picard — it tolerates inexact
+    # inner solves (inexact-Picard); 1e-7 is far tighter than the
+    # Picard increment near convergence (~1e-4) so the fixed point —
+    # hence the grading — is unchanged, at a fraction of the iters a
+    # direct-path-matching 1e-10 would need.
+    o["ksp_rtol"] = 1.0e-7
+    o["ksp_atol"] = 1.0e-12
+    o["pc_factor_mat_solver_type"] = ""   # not a direct solve
+    try:
+        o.delValue("pc_factor_mat_solver_type")
+        o.delValue("mat_mumps_icntl_24")
+        o.delValue("mat_mumps_icntl_25")
+    except Exception:
+        pass
+    if elliptic:
+        # P3 pure-Neumann Laplacian: plain agg-GAMG with a weak
+        # Jacobi/Chebyshev smoother needs ~280 iters here. A stronger
+        # SOR smoother with more sweeps + smoothed aggregation cuts
+        # that ~4×; the hierarchy is still built only once per call
+        # (lagged), so the extra setup is amortised over the ~25
+        # reused inner solves. SOR ⇒ non-symmetric PC ⇒ FGMRES.
+        o["ksp_type"] = "fgmres"
+        o["ksp_gmres_restart"] = 100      # > the ~75-iter solve
+        o["pc_type"] = "gamg"
+        o["pc_gamg_type"] = "agg"
+        o["pc_gamg_agg_nsmooths"] = 1
+        o["pc_gamg_threshold"] = 0.02
+        o["mg_levels_ksp_type"] = "richardson"
+        o["mg_levels_pc_type"] = "sor"
+        o["mg_levels_ksp_max_it"] = 4
+        o["mg_coarse_pc_type"] = "lu"
+        o["mg_coarse_pc_factor_mat_solver_type"] = "mumps"
+    else:
+        o["ksp_type"] = "cg"              # consistent mass = SPD
+        o["pc_type"] = "jacobi"           # mass matrix → trivial
+        for k in ("ksp_gmres_restart", "pc_gamg_type",
+                  "pc_gamg_agg_nsmooths", "pc_gamg_threshold",
+                  "mg_levels_ksp_type", "mg_levels_pc_type",
+                  "mg_levels_ksp_max_it", "mg_coarse_pc_type",
+                  "mg_coarse_pc_factor_mat_solver_type"):
+            try:
+                o.delValue(k)
+            except Exception:
+                pass
+
+
+def _patch_volumes(tris, coords, n_verts):
+    """Per-vertex dual-patch area: a node's share (1/3) of every
+    incident triangle's |area|. ρ_cur ∝ 1/patch for the (opt-in,
+    n_outer>1) outer MA composition; at equidistribution
+    ``patch · ρ_tgt`` is uniform. Serial-exact (parallel under-counts
+    at rank-partition boundaries — acceptable for serial validation).
+    """
+    area = np.abs(_signed_areas(coords, tris)) / 3.0
+    patch = np.zeros(n_verts, dtype=np.double)
+    for k in range(3):
+        np.add.at(patch, tris[:, k], area)
+    patch[patch <= 0.0] = patch[patch > 0.0].mean()
+    return patch
+
+
+def _hessian_recovery_class():
+    r"""Lazily build (and memoise) the variationally-consistent
+    Hessian-recovery solver class.
+
+    Recovers ``H_ij ≈ ∂²φ/∂x_i∂x_j`` from an external scalar field
+    ``φ`` by the *weak* (integrated-by-parts) form — the plan's
+    :math:`R_H`: ``∫H_ij τ_ij + ∫(∂φ/∂x_i)(∂τ_ij/∂x_j) = 0`` ⇒
+    ``H_ij = ∂²φ/∂x_i∂x_j``. Only **first** derivatives of ``φ``
+    appear (UW3 forbids second derivatives of mesh-variable
+    functions); the operator is the SPD mass matrix (no nullspace).
+    Defined lazily to avoid an import cycle (meshing→systems/cython).
+    """
+    global _HESSIAN_CLASS
+    if _HESSIAN_CLASS is not None:
+        return _HESSIAN_CLASS
+
+    import sympy
+    from underworld3.cython.generic_solvers import SNES_MultiComponent
+    from underworld3.utilities._api_tools import Template
+
+    class _HessianRecovery(SNES_MultiComponent):
+        def __init__(self, mesh, phi_field, degree=2, verbose=False):
+            self._phi = phi_field
+            super().__init__(
+                mesh, n_components=mesh.cdim * mesh.cdim,
+                degree=degree, verbose=verbose)
+            self._smoothing = sympy.sympify(0)
+            self._constitutive_model = (
+                uw.constitutive_models.Constitutive_Model(
+                    self.Unknowns))
+
+        def _hessian_source(self):
+            cdim = self.mesh.cdim
+            X = self.mesh.CoordinateSystem.X
+            phi = self._phi.sym[0]
+            rows = []
+            for i in range(cdim):
+                for j in range(cdim):
+                    row = [sympy.Integer(0)] * cdim
+                    row[j] = phi.diff(X[i])
+                    rows.append(row)
+            return sympy.Matrix(rows)
+
+        F0 = Template(
+            r"f_0\left(\mathbf{u}\right)",
+            lambda self: self.u.sym,
+            "Hessian-recovery mass term: f_0 = H.")
+
+        F1 = Template(
+            r"\mathbf{F}_1\left(\mathbf{u}\right)",
+            lambda self: self._hessian_source(),
+            "Hessian-recovery weak source: F_1 = e_j ∂φ/∂x_i.")
+
+    _HESSIAN_CLASS = _HessianRecovery
+    return _HESSIAN_CLASS
+
+
+def _winslow_elliptic(mesh, metric, pinned_labels, verbose,
+                      n_outer=1, n_picard=25, relax=1.0,
+                      step_frac=None, picard_relax=0.4,
+                      outer_tol=1.0e-3, boundary_slip=False,
+                      linear_solver="direct", phi_degree=2,
+                      move_anisotropy=None,
+                      target_side_rho=False):
+    r"""Metric-driven mesh equidistribution — Benamou–Froese–Oberman
+    convex-branch Monge–Ampère (PRESERVED; not the default path).
+
+    Solves ``det(I+D²φ)=g``, ``g=c·ρ_cur/ρ_tgt``, by a damped Picard
+    on the convex-branch source
+    ``Δφ = √((φxx−φyy)²+4φxy²+4g) − 2`` (the +√ selects the Brenier
+    branch), with the variationally-consistent recovered Hessian
+    (``_hessian_recovery_class``) and the pure-Neumann
+    ``constant_nullspace`` φ Poisson. ``n_outer>1`` composes maps
+    (recompute ρ_cur from patch volumes each step). Moves nodes by
+    ∇φ with a coherent global signed-area backtrack.
+
+    Efficiency (2026-05-17): the φ Poisson and the SPD Hessian-recovery
+    mass system are *constant operators* within the Picard loop (the
+    mesh is fixed; only the RHS changes). ``_use_direct_solver`` puts
+    both on MUMPS LU with a lagged (compute-once) factorisation, so the
+    inner iterations are back-substitutions — see that function's
+    docstring. ``n_picard`` defaults to 25: the deep/near grading is
+    flat from iter ≈20 (4-dp identical at AMP 8 & 20), so 40 was pure
+    overhead. Net: ~10× faster, grading/quality bit-for-bit unchanged.
+
+    ``phi_degree`` defaults to **2** (was 3). The deep/near grading
+    is set by the φ *order*, not the solver: P2 ≡ P3 to ~3 dp across
+    AMP 0/2/8/20 (matches the recorded baseline; AMP=0 no-op exact;
+    no tangle) while P2 halves the cost (smaller matrices — also
+    helps the direct factorisation scale). P1 is **not**
+    grading-equivalent (≈1.40 vs 1.71 at AMP=8 — ~18 % weaker); P2
+    is the floor. ``linear_solver="gamg"`` is an experimental,
+    documented-fragile parallel prototype (P3 was a major GAMG
+    confound; even at P2 GAMG re-solve is erratic — see the design
+    doc); ``"direct"`` (MUMPS, MPI-parallel) is the validated path.
+
+    Grading: redistribution with a fixed node count reaches deep/near
+    ≈1.5–1.8× for an 8–20× density target (the exact OT ~10× needs
+    *more nodes* — a topology change, not this smoother). ``n_outer=1``
+    is the safe default (AMP=0 exact no-op, never tangles). See the
+    project memory + scripts/ma_*.py / ma_cost_grading.py.
+    """
+    import sympy
+
+    pinned_labels = tuple(pinned_labels)
+    dm = mesh.dm
+    pStart, pEnd = dm.getDepthStratum(0)
+    cStart, cEnd = dm.getHeightStratum(0)
+    cone_size = dm.getConeSize(cStart) if cEnd > cStart else 0
+    if linear_solver not in ("direct", "gamg"):
+        raise ValueError(
+            f"linear_solver must be 'direct' or 'gamg', "
+            f"got {linear_solver!r}")
+    phi_degree = int(phi_degree)
+    aux_degree = max(1, phi_degree - 1)   # ∇φ / recovered-Hessian
+    key = (id(mesh), pinned_labels,
+           pEnd - pStart, cEnd - cStart, cone_size,
+           linear_solver, phi_degree)
+
+    cdim = mesh.cdim
+
+    cache = _WINSLOW_CACHE.get(key)
+    if cache is None:
+        if linear_solver == "gamg":
+            def _wire(s, singular=False, elliptic=True):
+                _use_iterative_solver(s, singular, elliptic)
+        else:
+            def _wire(s, singular=False, elliptic=True):
+                _use_direct_solver(s, singular)
+        phi = uw.discretisation.MeshVariable(
+            f"winslow_phi_{id(mesh)}", mesh,
+            vtype=uw.VarType.SCALAR, degree=phi_degree,
+            continuous=True)
+        ps = uw.systems.Poisson(mesh, phi)
+        ps.constitutive_model = uw.constitutive_models.DiffusionModel
+        ps.constitutive_model.Parameters.diffusivity = 1.0
+        ps.constant_nullspace = True
+        _wire(ps, singular=True, elliptic=True)
+        hsolver = _hessian_recovery_class()(
+            mesh, phi, degree=aux_degree, verbose=False)
+        hsolver.tolerance = 1.0e-6
+        _wire(hsolver, elliptic=False)
+        vol_field = uw.discretisation.MeshVariable(
+            f"winslow_vol_{id(mesh)}", mesh,
+            vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+        gradphi = uw.discretisation.MeshVariable(
+            f"winslow_gphi_{id(mesh)}", mesh,
+            vtype=uw.VarType.VECTOR, degree=aux_degree,
+            continuous=True)
+        gproj = uw.systems.Vector_Projection(mesh, gradphi)
+        gproj.smoothing = 0.0
+        _wire(gproj, elliptic=False)
+        _WINSLOW_CACHE[key] = (
+            phi, ps, gradphi, gproj, hsolver, vol_field)
+    else:
+        phi, ps, gradphi, gproj, hsolver, vol_field = cache
+
+    X = mesh.CoordinateSystem.X
+    grad_phi = sympy.Matrix(
+        [phi.sym[0].diff(X[i]) for i in range(cdim)]).T
+    Hf = hsolver.u.sym
+    Hmat = sympy.Matrix(cdim, cdim,
+                        lambda i, j: Hf[i * cdim + j])
+    gproj.uw_function = grad_phi
+    omega = float(picard_relax)
+
+    for outer in range(n_outer):
+        dm = mesh.dm
+        is_bnd = _pinned_mask(dm, pinned_labels)
+        tris = _tri_cells(dm)
+        pStart, pEnd = dm.getDepthStratum(0)
+        n_verts = pEnd - pStart
+        old_coords = np.asarray(mesh.X.coords).copy()
+        _cdim = mesh.cdim
+
+        # Boundary tangential slip (same per-ring radius projection
+        # as the spring). MA's natural Neumann BC (∇φ·n̂=0) already
+        # makes ∇φ tangential at the boundary, so letting boundary
+        # nodes move by ∇φ then snapping back to their ring radius
+        # is the redistribution the formulation naturally wants —
+        # fully pinning them discards it. Nodes provably stay on
+        # the surface (radial DOF removed; drift ~machine ε). One
+        # node/ring anchors the rotation gauge.
+        _slip_mode = boundary_slip
+        if isinstance(_slip_mode, str):
+            _slip_mode = _slip_mode.lower()
+            if _slip_mode not in ("ring", "box", "axes", "axis"):
+                raise ValueError(
+                    f"boundary_slip must be False/True/'ring'/'box', "
+                    f"got {boundary_slip!r}")
+            if _slip_mode in ("axes", "axis"):
+                _slip_mode = "box"
+        elif _slip_mode is True:
+            _slip_mode = "ring"
+        if _slip_mode and is_bnd.any():
+            bc = np.nonzero(is_bnd)[0]
+            if _slip_mode == "ring":
+                c0 = old_coords[bc].mean(axis=0)
+                rg = np.round(
+                    np.linalg.norm(old_coords[bc] - c0, axis=1),
+                    6)
+                is_anchor = np.zeros(n_verts, dtype=bool)
+                slip_center = np.zeros((n_verts, _cdim))
+                slip_rtarget = np.zeros(n_verts)
+                for rv in np.unique(rg):
+                    grp = bc[rg == rv]
+                    rc = old_coords[grp].mean(axis=0)
+                    is_anchor[grp[np.argmax(
+                        (old_coords[grp] - rc)[:, 0])]] = True
+                    slip_center[grp] = rc
+                    slip_rtarget[grp] = np.linalg.norm(
+                        old_coords[grp] - rc, axis=1)
+                is_slip = is_bnd & ~is_anchor
+                is_pinned = is_anchor
+                _sidx = np.nonzero(is_slip)[0]
+                _sctr = slip_center[_sidx]
+                _srad = slip_rtarget[_sidx]
+
+                def _project(Y):
+                    v = Y[_sidx] - _sctr
+                    nrm = np.linalg.norm(v, axis=1)
+                    nrm = np.where(nrm > 1.0e-30, nrm, 1.0)
+                    Y[_sidx] = _sctr + v * (_srad / nrm)[:, None]
+                    return Y
+            else:  # "box" — axis-aligned edge slip
+                # Pin corners (on 2 box edges); allow other
+                # boundary nodes to slide along their single
+                # edge. Detect edges from boundary coord extents.
+                bc_coords = old_coords[bc]
+                xmin = bc_coords[:, 0].min()
+                xmax = bc_coords[:, 0].max()
+                ymin = bc_coords[:, 1].min()
+                ymax = bc_coords[:, 1].max()
+                if uw.mpi.size > 1:
+                    from mpi4py import MPI as _MPI
+                    xmin = uw.mpi.comm.allreduce(
+                        float(xmin), op=_MPI.MIN)
+                    xmax = uw.mpi.comm.allreduce(
+                        float(xmax), op=_MPI.MAX)
+                    ymin = uw.mpi.comm.allreduce(
+                        float(ymin), op=_MPI.MIN)
+                    ymax = uw.mpi.comm.allreduce(
+                        float(ymax), op=_MPI.MAX)
+                tol = 1.0e-9 * max(xmax - xmin, ymax - ymin, 1.0)
+                on_xmin = np.abs(bc_coords[:, 0] - xmin) < tol
+                on_xmax = np.abs(bc_coords[:, 0] - xmax) < tol
+                on_ymin = np.abs(bc_coords[:, 1] - ymin) < tol
+                on_ymax = np.abs(bc_coords[:, 1] - ymax) < tol
+                on_x_edge = on_xmin | on_xmax
+                on_y_edge = on_ymin | on_ymax
+                is_corner_loc = on_x_edge & on_y_edge
+                is_anchor = np.zeros(n_verts, dtype=bool)
+                is_anchor[bc[is_corner_loc]] = True
+                is_slip = is_bnd & ~is_anchor
+                is_pinned = is_anchor
+                # For each slip node, record which axis is fixed
+                # and the target value on that axis.
+                fixed_axis = np.full(n_verts, -1, dtype=np.int8)
+                fixed_val = np.zeros(n_verts)
+                xfix = on_x_edge & ~is_corner_loc
+                yfix = on_y_edge & ~is_corner_loc
+                fixed_axis[bc[xfix]] = 0
+                fixed_val[bc[xfix]] = bc_coords[xfix, 0]
+                fixed_axis[bc[yfix]] = 1
+                fixed_val[bc[yfix]] = bc_coords[yfix, 1]
+                _sidx = np.nonzero(is_slip)[0]
+                _sax = fixed_axis[_sidx]
+                _sval = fixed_val[_sidx]
+                _ix0 = _sidx[_sax == 0]
+                _ix1 = _sidx[_sax == 1]
+                _v0 = _sval[_sax == 0]
+                _v1 = _sval[_sax == 1]
+
+                def _project(Y):
+                    Y[_ix0, 0] = _v0
+                    Y[_ix1, 1] = _v1
+                    return Y
+        else:
+            is_pinned = is_bnd
+
+            def _project(Y):
+                return Y
+
+        if tris is not None and n_outer > 1:
+            patch = _patch_volumes(tris, old_coords, n_verts)
+            patch /= float(np.mean(patch))
+        else:
+            patch = np.ones(n_verts, dtype=np.double)
+        _va = vol_field.array
+        _va[...] = patch.reshape(_va.shape)
+
+        rho_t = np.asarray(
+            uw.function.evaluate(metric, old_coords)).reshape(-1)
+        b = rho_t * patch
+        inv_sqrt_b_mean = float(np.mean(1.0 / np.sqrt(b)))
+        if uw.mpi.size > 1:
+            inv_sqrt_b_mean = uw.mpi.comm.allreduce(
+                inv_sqrt_b_mean) / uw.mpi.size
+        c = 1.0 / (inv_sqrt_b_mean ** 2)
+
+        # Target-side ρ evaluation: substitute X[i] → X[i] +
+        # gradphi.sym[i] so ρ is queried at the moving target
+        # x + ∇φ(x), not the source x. Removes the phase error
+        # where refinement-by-size is transported away from the
+        # feature location by ∇φ. gradphi.sym values are updated
+        # each Picard iter (gproj.solve below) so the source self-
+        # consistently tracks the current map estimate.
+        if target_side_rho:
+            metric_target = metric.subs(
+                [(X[i], X[i] + gradphi.sym[i])
+                 for i in range(cdim)])
+        else:
+            metric_target = metric
+        g = c / (metric_target * vol_field.sym[0])
+        if cdim == 2:
+            Hxx = Hf[0]
+            Hxy = (Hf[1] + Hf[2]) / 2
+            Hyy = Hf[3]
+            f_src = sympy.sqrt(
+                (Hxx - Hyy) ** 2 + 4 * Hxy ** 2 + 4 * g) - 2
+        else:
+            f_src = (g - 1.0) - Hmat.det()
+        ps.f = sympy.Matrix([[_EQUIDIST_SIGN * f_src]])
+
+        hsolver.u.array[...] = 0.0
+
+        # The GAMG path warm-starts the Krylov solve from the previous
+        # Picard φ (it changes slowly under ω-relaxation) → a handful
+        # of CG iters on the once-built hierarchy. The exact direct
+        # path is indifferent to the initial guess.
+        _zig = (linear_solver != "gamg")
+        prev_change = None
+        # If target-side ρ is on, gradphi needs to be tracking the
+        # current φ inside the Picard loop (it's used by ps.f via
+        # the X→X+gradphi substitution). Initialise to zero so the
+        # first ps.solve sees ρ at source (= identity map estimate).
+        if target_side_rho:
+            gradphi.array[...] = 0.0
+        for it in range(n_picard):
+            phi_prev = np.asarray(phi.array).copy()
+            ps.solve(zero_init_guess=_zig)
+            phi.array[...] = ((1.0 - omega) * phi_prev
+                              + omega * np.asarray(phi.array))
+            hsolver.solve()
+            if target_side_rho:
+                gproj.solve()   # update target-side ρ for next iter
+            change = float(np.abs(
+                np.asarray(phi.array) - phi_prev).max())
+            if uw.mpi.size > 1:
+                from mpi4py import MPI as _MPI
+                change = uw.mpi.comm.allreduce(
+                    change, op=_MPI.MAX)
+            if prev_change is not None and change < 1.0e-6:
+                break
+            prev_change = change
+
+        if not target_side_rho:
+            gproj.solve()
+        disp = np.asarray(
+            uw.function.evaluate(gradphi.sym, old_coords)
+        ).reshape(old_coords.shape)
+
+        # Directional move-weighting (approach (2), opt-in): the
+        # annulus node budget is anisotropic — radial is scarce and
+        # pinned, tangential is abundant and free ("spare" angular
+        # nodes). A scalar equidistribution is isotropic and cannot
+        # express "prefer tangential"; here we rescale the realised
+        # displacement in the local radial/tangential frame
+        # (move_anisotropy=(w_r, w_θ)) so the same metric is met
+        # mostly by sliding nodes around rather than crushing
+        # radially. This is the BFO-consistent lightweight version
+        # (the φ-Poisson operator / BFO branch algebra is untouched
+        # — only the move is reweighted). Centre = mesh centroid
+        # (origin for a centred annulus). Default None ⇒ unchanged.
+        if move_anisotropy is not None and cdim == 2:
+            w_r, w_t = (float(move_anisotropy[0]),
+                        float(move_anisotropy[1]))
+            ctr = old_coords.mean(axis=0)
+            rv = old_coords - ctr
+            rn = np.linalg.norm(rv, axis=1)
+            ok = rn > 1.0e-30
+            rhat = np.zeros_like(rv)
+            rhat[ok] = rv[ok] / rn[ok, None]
+            that = np.stack([-rhat[:, 1], rhat[:, 0]], axis=1)
+            d_r = (disp * rhat).sum(axis=1)
+            d_t = (disp * that).sum(axis=1)
+            disp = (w_r * d_r[:, None] * rhat
+                    + w_t * d_t[:, None] * that)
+
+        step = relax * disp
+        if step_frac is not None and np.isfinite(step_frac):
+            h = _min_incident_edge(dm, old_coords)
+            mag = np.linalg.norm(step, axis=1)
+            cap = step_frac * h
+            clip = np.isfinite(cap) & (mag > cap) & (mag > 0.0)
+            sc = np.ones_like(mag)
+            sc[clip] = cap[clip] / mag[clip]
+            step = step * sc[:, None]
+
+        free = ~is_pinned
+        scale = 1.0
+        new_coords = old_coords.copy()
+        if tris is not None:
+            a0 = _signed_areas(old_coords, tris)
+            orient = np.sign(np.median(a0)) or 1.0
+            for _bt in range(10):
+                trial = old_coords.copy()
+                trial[free] += scale * step[free]
+                trial = _project(trial)      # slip → ring (∥ only)
+                a1min = float(
+                    (_signed_areas(trial, tris) * orient).min())
+                if uw.mpi.size > 1:
+                    from mpi4py import MPI as _MPI
+                    a1min = uw.mpi.comm.allreduce(
+                        a1min, op=_MPI.MIN)
+                if a1min > 0.0:
+                    new_coords = trial
+                    break
+                scale *= 0.5
+            else:
+                scale = 0.0
+                new_coords = old_coords.copy()
+        else:
+            new_coords[free] += step[free]
+            new_coords = _project(new_coords)
+
+        mesh._deform_mesh(new_coords)
+
+        d = float(np.linalg.norm(
+            new_coords - old_coords, axis=1).max())
+        if uw.mpi.size > 1:
+            d = uw.mpi.comm.allreduce(d ** 2) ** 0.5
+        if verbose:
+            uw.pprint(
+                f"  equidistribute MA outer {outer+1}/{n_outer}: "
+                f"c={c:.4f}  scale={scale:.3f}  max|Δx|={d:.3e}")
+        if d < outer_tol:
+            break
+
+
+def _winslow_equidistribute(mesh, metric, pinned_labels, verbose,
+                             n_outer=1, relax=1.0,
+                             step_frac=0.3,
+                             outer_tol=1.0e-4,
+                             boundary_slip=False,
+                             linear_solver="direct", phi_degree=2):
+    r"""OT-improvement step: one (or a few) weighted-Poisson
+    equidistribution flow iterations.
+
+    Solves on the *current* mesh
+
+    .. math::
+
+        \nabla\!\cdot(\rho\,\nabla\phi)
+            \;=\;-\,\rho\,\log\!\bigl(V\rho/K\bigr),
+        \quad K=\exp(\langle\rho\log(V\rho)\rangle/\langle\rho\rangle),
+        \quad \nabla\phi\cdot\hat{n}=0,
+
+    and moves nodes by ``relax · ∇φ``. ``V_i`` is the dual patch
+    area at vertex ``i``; the source vanishes identically at
+    equidistribution ``V_i\,\rho_i\equiv K``.
+
+    Semantics: this is a *single OT improvement step* w.r.t. the
+    current mesh — the input mesh has no special status (it is
+    whatever you currently have). Calling it again from the
+    deformed mesh applies another improvement step. Compose
+    freely with spring / smoothing / anisotropic.
+
+    Differences from ``_winslow_elliptic`` (the convex-branch
+    BFO Picard):
+
+    * Linear: one weighted-Poisson per outer iter, no inner
+      Picard, no Hessian recovery, no convex-branch radical.
+    * The source uses the *current* mesh's patch volumes; the
+      formulation is identically zero at equidistribution, so
+      iterations are self-stabilising (no over-correction).
+    * ρ at the current node positions (no source-vs-target
+      asymmetry; the iteration is on the current mesh, ρ is at
+      its physical position).
+
+    Parameters mirror ``_winslow_elliptic`` where they apply.
+    ``n_outer`` composes outer improvement steps; the source
+    drives toward zero so the per-iter motion naturally
+    diminishes.
+    """
+    import sympy
+
+    pinned_labels = tuple(pinned_labels)
+    dm = mesh.dm
+    pStart, pEnd = dm.getDepthStratum(0)
+    cStart, cEnd = dm.getHeightStratum(0)
+    cone_size = dm.getConeSize(cStart) if cEnd > cStart else 0
+    if linear_solver not in ("direct", "gamg"):
+        raise ValueError(
+            f"linear_solver must be 'direct' or 'gamg', "
+            f"got {linear_solver!r}")
+    phi_degree = int(phi_degree)
+    aux_degree = max(1, phi_degree - 1)
+    cdim = mesh.cdim
+    if cdim != 2:
+        raise NotImplementedError(
+            "_winslow_equidistribute: 2D meshes only for now.")
+
+    # Boundary slip uses the projected boundary-normal field
+    # (mesh.Gamma_P1). This is reliable only for *radial* coordinate
+    # systems (cylindrical / spherical / geographic), where mesh.Gamma is
+    # the coordinate-derived radial field and evaluates cleanly at vertices.
+    # For Cartesian boundaries the vertex-evaluated facet normal is
+    # degenerate (0/0), so we pin the boundary instead of slipping with a
+    # garbage normal. 'ring'/'box'/'axes' are legacy aliases for slip-on.
+    from underworld3.meshing._ot_adapt import _is_radial_coords as _isr
+    if isinstance(boundary_slip, str):
+        _slip_req = boundary_slip.strip().lower() in (
+            "ring", "box", "axes", "axis", "true", "on", "1")
+    else:
+        _slip_req = bool(boundary_slip)
+    _slip_on = _slip_req and _isr(mesh)
+    if _slip_on:
+        # Create / refresh the projected normals ONCE here, before the OT
+        # Poisson solver's DM is built — creating the _n_proj MeshVariable
+        # mid-mover would stale that DM handle (project_uw3_smoother_footguns).
+        try:
+            mesh._update_projected_normals()
+        except Exception:
+            _slip_on = False
+
+    key = (id(mesh), pinned_labels,
+           pEnd - pStart, cEnd - cStart, cone_size,
+           linear_solver, phi_degree)
+
+    cache = _OT_CACHE.get(key)
+    if cache is None:
+        if linear_solver == "gamg":
+            def _wire(s, singular=False, elliptic=True):
+                _use_iterative_solver(s, singular, elliptic)
+        else:
+            def _wire(s, singular=False, elliptic=True):
+                _use_direct_solver(s, singular)
+        phi = uw.discretisation.MeshVariable(
+            f"ot_phi_{id(mesh)}", mesh,
+            vtype=uw.VarType.SCALAR, degree=phi_degree,
+            continuous=True)
+        ps = uw.systems.Poisson(mesh, phi)
+        ps.constitutive_model = uw.constitutive_models.DiffusionModel
+        # weighted diffusion: D(x) = ρ(x). Updated each outer iter
+        # via the symbolic metric expression (evaluated at the
+        # current mesh's quad pts).
+        ps.constitutive_model.Parameters.diffusivity = metric
+        ps.constant_nullspace = True
+        _wire(ps, singular=True, elliptic=True)
+        vol_field = uw.discretisation.MeshVariable(
+            f"ot_vol_{id(mesh)}", mesh,
+            vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+        gradphi = uw.discretisation.MeshVariable(
+            f"ot_gphi_{id(mesh)}", mesh,
+            vtype=uw.VarType.VECTOR, degree=aux_degree,
+            continuous=True)
+        gproj = uw.systems.Vector_Projection(mesh, gradphi)
+        gproj.smoothing = 0.0
+        _wire(gproj, elliptic=False)
+        X = mesh.CoordinateSystem.X
+        gradphi_sym = sympy.Matrix(
+            [phi.sym[0].diff(X[i]) for i in range(cdim)]).T
+        gproj.uw_function = gradphi_sym
+        _OT_CACHE[key] = (phi, ps, gradphi, gproj, vol_field)
+    else:
+        phi, ps, gradphi, gproj, vol_field = cache
+
+    _zig = (linear_solver != "gamg")
+
+    for outer in range(n_outer):
+        dm = mesh.dm
+        is_bnd = _pinned_mask(dm, pinned_labels)
+        tris = _tri_cells(dm)
+        pStart, pEnd = dm.getDepthStratum(0)
+        n_verts = pEnd - pStart
+        old_coords = np.asarray(mesh.X.coords).copy()
+        _cdim = mesh.cdim
+
+        # --- boundary slip via projected normals (mesh.Gamma_P1) ------
+        # Unified, geometry-agnostic slip (replaces the old box/ring
+        # special cases). Boundary nodes slide tangentially — we zero the
+        # projected-normal component of their displacement — and, for
+        # curved (radial) coordinate systems, snap back to their reference
+        # |r| so they stay on the surface. The normal comes from
+        # mesh.Gamma_P1 (the symbolic mesh.Gamma projected to a P1 field),
+        # which is valid for every geometry and is the same source used for
+        # free surfaces. Nodes with a degenerate projected normal (box
+        # corners where opposing face normals cancel, or an occasional
+        # unlocatable vertex) are pinned rather than slipped. `boundary_slip`
+        # is a bool; legacy 'ring'/'box'/'axes' strings are accepted as
+        # aliases for slip-on.
+        from underworld3.meshing._ot_adapt import (
+            _slip_normals, _boundary_centre, _is_radial_coords)
+
+        if _slip_on and is_bnd.any():
+            bidx = np.nonzero(is_bnd)[0]
+            bcoords = old_coords[bidx]
+            n_hat, valid = _slip_normals(mesh, bcoords)
+            slip_b = bidx[valid]
+            is_pinned = np.zeros(n_verts, dtype=bool)
+            is_pinned[bidx[~valid]] = True   # degenerate-normal nodes pinned
+            _n_slip = n_hat[valid]
+            _old_slip = old_coords[slip_b]
+            _radial = _is_radial_coords(mesh)
+            if _radial:
+                _centre = _boundary_centre(mesh, bcoords)
+                _r_target = np.linalg.norm(_old_slip - _centre, axis=1)
+
+            def _project(Y):
+                # tangential slide: remove the normal component of the
+                # boundary-node displacement
+                disp = Y[slip_b] - _old_slip
+                dn = (disp * _n_slip).sum(axis=1, keepdims=True)
+                Y[slip_b] = _old_slip + (disp - dn * _n_slip)
+                # snap curved boundaries back onto the surface (fixed |r|)
+                if _radial:
+                    v = Y[slip_b] - _centre
+                    nrm = np.linalg.norm(v, axis=1)
+                    nrm = np.where(nrm > 1.0e-30, nrm, 1.0)
+                    Y[slip_b] = _centre + v * (_r_target / nrm)[:, None]
+                return Y
+        else:
+            is_pinned = is_bnd
+
+            def _project(Y):
+                return Y
+
+        # --- compute V (patch volumes) on current mesh ---------
+        if tris is None:
+            patch = np.ones(n_verts, dtype=np.double)
+        else:
+            patch = _patch_volumes(tris, old_coords, n_verts)
+        # Normalise so the mean over the domain is the cell mean.
+        patch_mean = float(np.mean(patch))
+        if uw.mpi.size > 1:
+            patch_mean = uw.mpi.comm.allreduce(patch_mean) / uw.mpi.size
+        # Write current V values into the MeshVariable.
+        _va = vol_field.array
+        _va[...] = (patch / max(patch_mean, 1e-30)).reshape(_va.shape)
+
+        # --- compute K = exp(<ρ log(Vρ)> / <ρ>) ----------------
+        rho_at_y = np.asarray(uw.function.evaluate(
+            metric, old_coords)).reshape(-1)
+        Vrho = (patch / max(patch_mean, 1e-30)) * rho_at_y
+        # weighted geometric mean (zero-mean Neumann compat
+        # condition) — guard against Vrho≤0:
+        Vrho_pos = np.clip(Vrho, 1e-30, None)
+        wnum = float(np.sum(rho_at_y * np.log(Vrho_pos)))
+        wden = float(np.sum(rho_at_y))
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            wnum = uw.mpi.comm.allreduce(wnum, op=_MPI.SUM)
+            wden = uw.mpi.comm.allreduce(wden, op=_MPI.SUM)
+        ln_K = wnum / max(wden, 1e-30)
+        K_val = float(np.exp(ln_K))
+
+        # --- source: f = -ρ · log(V·ρ / K) ---------------------
+        # SNES_Poisson convention: F0 = -f, strong form ∇·(D∇u)
+        # = -ps.f. We want ∇·(ρ∇φ) = -ρ·log(V·ρ/K) ⇒ ps.f =
+        # ρ·log(V·ρ/K).
+        f_src = metric * sympy.log(
+            metric * vol_field.sym[0] / sympy.Float(K_val))
+        ps.f = sympy.Matrix([[f_src]])
+
+        # --- solve weighted Poisson ----------------------------
+        ps.solve(zero_init_guess=_zig)
+        gproj.solve()
+        disp = np.asarray(uw.function.evaluate(
+            gradphi.sym, old_coords)
+        ).reshape(old_coords.shape)
+
+        step = float(relax) * disp
+
+        # Per-vertex displacement cap: |step_i| ≤ step_frac · h_i,
+        # where h_i is the shortest edge incident on vertex i.
+        # This prevents the OT step from creating LOCAL cell folds
+        # near features (where the source is sharp) without killing
+        # the global motion (the way the global signed-area
+        # backtrack does).
+        if step_frac is not None and np.isfinite(step_frac):
+            h = _min_incident_edge(dm, old_coords)
+            mag = np.linalg.norm(step, axis=1)
+            cap = float(step_frac) * h
+            clip = np.isfinite(cap) & (mag > cap) & (mag > 0.0)
+            sc = np.ones_like(mag)
+            sc[clip] = cap[clip] / mag[clip]
+            step = step * sc[:, None]
+
+        # --- coherent global signed-area backtrack -------------
+        free = ~is_pinned
+        scale = 1.0
+        new_coords = old_coords.copy()
+        if tris is not None:
+            a0 = _signed_areas(old_coords, tris)
+            orient = np.sign(np.median(a0)) or 1.0
+            for _bt in range(10):
+                trial = old_coords.copy()
+                trial[free] += scale * step[free]
+                trial = _project(trial)
+                a1min = float(
+                    (_signed_areas(trial, tris) * orient).min())
+                if uw.mpi.size > 1:
+                    from mpi4py import MPI as _MPI
+                    a1min = uw.mpi.comm.allreduce(
+                        a1min, op=_MPI.MIN)
+                if a1min > 0.0:
+                    new_coords = trial
+                    break
+                scale *= 0.5
+            else:
+                scale = 0.0
+                new_coords = old_coords.copy()
+        else:
+            new_coords[free] += step[free]
+            new_coords = _project(new_coords)
+
+        mesh._deform_mesh(new_coords)
+
+        d = float(np.linalg.norm(
+            new_coords - old_coords, axis=1).max())
+        if uw.mpi.size > 1:
+            d = uw.mpi.comm.allreduce(d ** 2) ** 0.5
+
+        # Per-iter "imbalance" diagnostic — std of log(V·ρ/K).
+        imb = float(np.std(np.log(Vrho_pos) - ln_K))
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            imb_sq = uw.mpi.comm.allreduce(imb * imb, op=_MPI.SUM)
+            cnt = uw.mpi.comm.allreduce(int(Vrho_pos.size),
+                                         op=_MPI.SUM)
+            imb = (imb_sq / max(cnt, 1)) ** 0.5
+
+        if verbose:
+            uw.pprint(
+                f"  OT-improve outer {outer+1}/{n_outer}: "
+                f"K={K_val:.4f}  imb={imb:.3e}  "
+                f"scale={scale:.3f}  max|Δx|={d:.3e}")
+        if d < outer_tol:
+            break
+
+
+def _winslow_anisotropic(mesh, metric, pinned_labels, verbose,
+                         n_outer=12, relax=0.2, beta=200.0,
+                         resolution_ratio=1.0,
+                         geom_mean_smoothing=0.25,
+                         aniso_to_base=False,
+                         aniso_cap=2.0, coarsen_cap=1.0,
+                         boundary_slip=False,
+                         linear_solver="direct", phi_degree=2,
+                         move_anisotropy=None, metric_role="M",
+                         outer_tol=1.0e-4,
+                         rest_size_cap_max=None,
+                         rest_size_cap_min=None,
+                         rest_spring_K=1.0,
+                         h0_override=None,
+                         rest_coords_override=None,
+                         metric_refresh_per_iter=False):
+    r"""Anisotropic metric-tensor mesh redistribution — approach (3).
+
+    The settled scalar equidistribution paths (``_winslow_spring``,
+    ``_winslow_elliptic``) cannot do coherent *anisotropic* bulk
+    transport on a fixed topology — a scalar potential is isotropic,
+    so an annulus radial feature over-collapses one pinned-boundary
+    sliver layer while the tangential edges sit frozen (see the
+    project memory + the design doc's angular-OT section). This is
+    the **tensor** mover: it solves the M-weighted Laplace smooth of
+    the coordinate map with an *anisotropic* metric tensor, so cells
+    are reshaped (short across the feature, long along it) and the
+    slivers / wasted isotropic resolution are removed.
+
+    Construction (verified — ``scripts/ma_metric_tensor_viz.py``):
+    from a scalar density ``ρ`` (typically Lagrangian
+    ``f(r0.sym)``), the *projected* gradient ``∇ρ`` (a first
+    derivative only — UW3-clean) builds, per node,
+
+    .. math::
+
+        M \;=\; \tfrac1{h_0^2}\!\left[\,I
+              + \beta\,\hat g\hat g^{\mathsf T}
+                (|\nabla\rho|/\nabla\rho_{\mathrm{ref}})^2\right],
+
+    eigen-clamped so the spacing ratio ``≤ aniso_cap`` (``≤8:1`` by
+    default). The eigenframe **auto-aligns to the feature** from the
+    Cartesian ``∇ρ`` alone — no ``(r,θ)`` frame is specified.
+
+    Mover: solve, per physical coordinate component ``c``, the
+    displacement form of the M-weighted Laplace (Winslow) map
+
+    .. math::
+
+        \nabla\!\cdot(D\,\nabla u_c) \;=\;
+            -\,\nabla\!\cdot(D\,e_c)
+          \;=\; -\textstyle\sum_j \partial_j D_{jc},
+        \qquad u_c = 0 \text{ on the pinned boundary},
+
+    with ``D = M`` (the eigen-clamped metric). Then
+    ``ψ_c = x_c + u_c`` is exactly the M-harmonic coordinate map
+    ``∇·(D∇ψ_c)=0``, ``ψ=x`` on the boundary; the direct Winslow
+    smoother clusters nodes where ``D`` is large (fine spacing), so
+    ``D = M`` grades the mesh toward the metric. The two components
+    share the **same** tensor operator (``_c = D``, the
+    ``_CofDiff``-style ``DiffusionModel`` pattern) and the
+    factor-once-reuse direct solver. **Linear** — one solve per
+    component per outer step, no Picard (much cheaper than the BFO
+    ``_winslow_elliptic``). Homogeneous Dirichlet ``u=0`` on the
+    pinned boundary makes the per-component operator non-singular —
+    no ``constant_nullspace``, side-stepping the GAMG-pure-Neumann
+    fragility entirely (``boundary_slip=True`` falls back to the
+    pure-Neumann + ring-projection treatment of
+    ``_winslow_elliptic``). ``n_outer`` composes the map (re-project
+    ``∇ρ`` / rebuild ``D`` on the moved mesh — the standard MMPDE
+    outer iteration). Reuses ``_winslow_elliptic``'s coherent global
+    signed-area backtrack, ``boundary_slip`` and ``move_anisotropy``.
+
+    .. warning::
+
+       (3) improves cell **alignment / quality** and removes the
+       slivers + wasted isotropic resolution; it does **not** beat
+       the fixed node-count grading cap (≈1.5–1.8× for an 8–20×
+       density target — that needs ``mesh.adapt``, a topology
+       change). For a *separable* feature the explicit 1-D OT
+       (``scripts/ma_analytic_check.py`` /
+       ``ma_angular_ot_target.py``) is exact and strictly cheaper;
+       (3) earns its keep on the general **non-separable** case.
+       Validate with anisotropy-aware diagnostics
+       (radial/tangential edge split + minA/meanA, *not* the
+       anisotropy-blind d/n).
+
+    Parameters mirror ``_winslow_elliptic`` where shared.
+
+    The **decoupled direct** Winslow form (each physical coordinate
+    M-harmonic, independently) has no Rado–Kneser–Choquet
+    non-folding guarantee, so its stable regime is bounded by the
+    metric anisotropy/contrast. Empirically (interior radial
+    feature, the validation arc) there is a clean Pareto frontier:
+
+    * ``aniso_cap=2``, ``relax≈0.1–0.2`` → minA/meanA ≈ 0.5 (a
+      near-pristine, valid, feature-aligned mesh — cleaner than the
+      isotropic MA ≈0.18 / spring ≈0.25 which sliver), modest 2:1
+      cell alignment. **The robust default.**
+    * higher ``aniso_cap`` is only stable with a *gentler* ``relax``
+      + more ``n_outer`` (cap 4 needs relax ≈0.05, n_outer ≳25 →
+      minA ≈0.35, sharper alignment). ``aniso_cap ≳ 6`` folds the
+      decoupled map regardless — it would need the coupled / inverse
+      Winslow (the heavy MMPDE, out of this prototype's scope).
+
+    **Single-knob model (`resolution_ratio` R).** The gradient-only
+    metric ``M ⪰ base·I`` is *refine-only* (keeps only ``∇ρ``,
+    discards ρ's magnitude ⇒ flat cells pinned at ``h0``, cannot
+    release nodes, the steepest feature scavenges the budget). The
+    fix makes the isotropic density a genuinely **equidistributed**
+    field ``s = base·ρ/G`` (``G`` = geometric mean of ρ on the
+    near-uniform undeformed D mesh ⇒ ``⟨ln s⟩=ln base``, node budget
+    centred). Refine (``s>base``) and coarsen (``s<base``) are then
+    **complementary by the conservation law itself** — there is no
+    coarsening parameter. ``R`` only sets the safety eigen-clamp
+    ``[base/R², base·R²]`` (cells ∈ ``[h0/R, h0·R]``); M-harmonic
+    scale-invariance makes the normalisation constant irrelevant, so
+    the geometric-mean centring merely places the band symmetrically
+    around the bulk. ``R=1`` ⇒ exact refine-only no-op (every prior
+    result bit-preserved); ``R≈2`` is the validated production
+    point. The legacy two-knob ``aniso_cap``/``coarsen_cap`` clamp
+    is retained only as a bit-for-bit expert override when ``R≤1``.
+    ``G`` is recomputed from the *instantaneous* field every
+    adaptation event; in a violent transient that sloshes the whole
+    ``ρ/G`` distribution across the fixed clamp band → mass
+    clamp-saturation → a visible mesh "wobble".
+    ``geom_mean_smoothing`` (``a``, default 0.25) low-passes ``ln
+    G`` across events (``lnG←a·lnG_now+(1−a)·lnG_prev``; ``a=1`` ⇒
+    off/instantaneous, ``a≈0.25`` ⇒ strongly damped) so the band
+    stays centred — smoothing **only the global intensity scalar**
+    (the spatial ρ pattern still tracks the field every event, so
+    the user-facing API stays single-knob; one scalar is carried in
+    ``_GEMA_STATE`` across events). ``relax`` (default 0.2)
+    under-relaxes the per-step displacement;
+    ``n_outer`` (default 12) composes the damped steps toward the
+    fixed-D M-harmonic map. ``beta`` (default 200) sets how fast the
+    metric saturates the ``aniso_cap`` eigen-clamp (the clamp, not
+    ``beta``, is the binding anisotropy lever). ``metric_role``
+    (``"M"`` default, or ``"Minv"``) is an experimental knob — the
+    overall scale of ``D`` is irrelevant to ``∇·(D∇u)=src`` (both
+    sides scale together); only the anisotropy + spatial variation
+    matter.
+    """
+    import sympy
+
+    pinned_labels = tuple(pinned_labels)
+    cdim = mesh.cdim
+    if cdim != 2:
+        raise NotImplementedError(
+            "_winslow_anisotropic: 2D triangle meshes only "
+            "(the eigen-clamp + Annulus diagnostics are 2D)")
+    if linear_solver not in ("direct", "gamg"):
+        raise ValueError(
+            f"linear_solver must be 'direct' or 'gamg', "
+            f"got {linear_solver!r}")
+    if metric_role not in ("M", "Minv"):
+        raise ValueError(
+            f"metric_role must be 'M' or 'Minv', got {metric_role!r}")
+
+    dm = mesh.dm
+    pStart, pEnd = dm.getDepthStratum(0)
+    cStart, cEnd = dm.getHeightStratum(0)
+    cone_size = dm.getConeSize(cStart) if cEnd > cStart else 0
+    phi_degree = int(phi_degree)
+    aux_degree = max(1, phi_degree - 1)
+    key = (id(mesh), pinned_labels, pEnd - pStart, cEnd - cStart,
+           cone_size, linear_solver, phi_degree, bool(boundary_slip))
+
+    cache = _ANISO_CACHE.get(key)
+    if cache is None:
+        if linear_solver == "gamg":
+            def _wire(s, singular=False, elliptic=True):
+                _use_iterative_solver(s, singular, elliptic)
+        else:
+            def _wire(s, singular=False, elliptic=True):
+                _use_direct_solver(s, singular)
+
+        X = mesh.CoordinateSystem.X
+        # Projected ∇ρ — first derivative only (UW3-clean), the
+        # same construction verified in ma_metric_tensor_viz. ρ may
+        # be Lagrangian f(r0.sym): metric.diff(X) then differentiates
+        # through the frozen r0 field (FE ∂r0/∂x), so ∇ρ is
+        # re-evaluated on the moved mesh each outer step (MMPDE).
+        grho = uw.discretisation.MeshVariable(
+            f"aniso_grho_{id(mesh)}", mesh,
+            vtype=uw.VarType.VECTOR, degree=aux_degree,
+            continuous=True)
+        gproj = uw.systems.Vector_Projection(mesh, grho)
+        gproj.smoothing = 0.0
+        gproj.uw_function = sympy.Matrix(
+            [metric.diff(X[i]) for i in range(cdim)]).T
+        _wire(gproj, elliptic=False)
+
+        # Eigen-clamped metric tensor field D (filled numerically
+        # per outer step). Init to the identity so an unsolved D is
+        # a harmless isotropic operator.
+        Df = uw.discretisation.MeshVariable(
+            f"aniso_D_{id(mesh)}", mesh,
+            vtype=uw.VarType.TENSOR, degree=aux_degree,
+            continuous=True)
+        Df.array[:, 0, 0] = 1.0
+        Df.array[:, 1, 1] = 1.0
+        Df.array[:, 0, 1] = 0.0
+        Df.array[:, 1, 0] = 0.0
+        Dsym = Df.sym                      # 2×2 sympy Matrix (stable)
+
+        class _TensorDiff(uw.constitutive_models.DiffusionModel):
+            def _build_c_tensor(self):
+                self._c = Dsym
+
+        # boundary_slip ⇒ pure-Neumann per component (constant
+        # nullspace, ring-projected in the move — exactly the
+        # _winslow_elliptic slip treatment). Default (pinned) ⇒
+        # homogeneous Dirichlet u=0 → non-singular, no nullspace.
+        singular = bool(boundary_slip)
+        usolvers, ufields = [], []
+        for c in range(cdim):
+            uc = uw.discretisation.MeshVariable(
+                f"aniso_u{c}_{id(mesh)}", mesh,
+                vtype=uw.VarType.SCALAR, degree=phi_degree,
+                continuous=True)
+            ps = uw.systems.Poisson(mesh, uc)
+            ps.constitutive_model = _TensorDiff
+            # f_c = div(column c of D) = Σ_j ∂D_{jc}/∂x_j. UW3
+            # SNES_Poisson is F0=-f ⇒ strong form ∇·(D∇u)=-ps.f;
+            # we want ∇·(D∇u_c) = -div_c ⇒ ps.f = +div_c. (First
+            # derivative of the projected D field — UW3-legal.)
+            src = sympy.Integer(0)
+            for j in range(cdim):
+                src = src + Dsym[j, c].diff(X[j])
+            ps.f = sympy.Matrix([[src]])
+            if singular:
+                ps.constant_nullspace = True
+            else:
+                for lbl in pinned_labels:
+                    try:
+                        ps.add_dirichlet_bc(0.0, lbl)
+                    except Exception:
+                        pass
+            _wire(ps, singular=singular, elliptic=True)
+            usolvers.append(ps)
+            ufields.append(uc)
+
+        _ANISO_CACHE[key] = (grho, gproj, Df, usolvers, ufields)
+    else:
+        grho, gproj, Df, usolvers, ufields = cache
+
+    _zig = (linear_solver != "gamg")
+
+    # ---- build the eigen-clamped metric tensor field D ONCE ------
+    # on the *undeformed* mesh (the design metric), then hold it
+    # fixed and Lagrangian (the field rides material points through
+    # _deform_mesh, exactly as _winslow_spring computes its
+    # rest-lengths / A0 once). Re-projecting ∇ρ on the progressively
+    # distorted mesh inside the outer loop is a positive feedback —
+    # D blows up on squashed cells → catastrophic over-collapse
+    # (verified failure mode). With D fixed the outer loop is a
+    # *stable damped fixed-point iteration* of one linear operator
+    # toward the M-harmonic map; no feedback.
+    dm = mesh.dm
+    # `old0` is the SPRING REST reference — vertices get pulled
+    # toward these positions when a cell exceeds the size caps.
+    # If the caller passes `rest_coords_override`, use that
+    # (typically the truly-undeformed mesh coords captured at
+    # the first adapt). Falling back to the entry-state of THIS
+    # call makes the spring "preserve" each successive refined
+    # state instead of pulling back to undeformed — the third
+    # leg of the compounding-refinement bug (2026-05-22).
+    if rest_coords_override is not None:
+        old0 = np.asarray(rest_coords_override).copy()
+    else:
+        old0 = np.asarray(mesh.X.coords).copy()
+    gproj.solve()
+    Dcoords = np.asarray(Df.coords)
+    gvec = np.asarray(
+        uw.function.evaluate(grho.sym, Dcoords)).reshape(-1, cdim)
+    # h0 = undeformed mean edge length. If the caller passes
+    # `h0_override` (e.g. a value cached at the FIRST adapt on
+    # this mesh), use that — re-measuring from a deformed mesh
+    # makes h0 shrink as the mesh refines, which then shifts
+    # the eigenvalue clamps tighter and tighter and compounds
+    # refinement across repeated adapt cycles.
+    if h0_override is not None:
+        h0 = float(h0_override)
+    else:
+        ep = _edge_pairs(dm)
+        if ep.shape[0]:
+            h0 = float(np.linalg.norm(
+                old0[ep[:, 1]] - old0[ep[:, 0]], axis=1).mean())
+        else:
+            h0 = 1.0
+        if uw.mpi.size > 1:
+            h0 = uw.mpi.comm.allreduce(h0) / uw.mpi.size
+    gn = np.linalg.norm(gvec, axis=1)
+    gmax = float(gn.max()) if gn.size else 0.0
+    if uw.mpi.size > 1:
+        from mpi4py import MPI as _MPI
+        gmax = uw.mpi.comm.allreduce(gmax, op=_MPI.MAX)
+    # CRITICAL no-op guard: uniform ρ ⇒ ∇ρ ≡ 0, but the L2
+    # projection of the zero function leaves ~1e-18 round-off.
+    # Normalising by that noisy max would make (|∇ρ|/gref)² ~ O(1)
+    # from pure round-off → a fabricated huge anisotropy and a
+    # spurious move. Any *real* feature gradient is O(AMP/WIDTH)
+    # ~ O(1–100); g_eps=1e-9 is ~9 orders above projection noise
+    # and ~10 below the weakest meaningful feature, so AMP=0 is an
+    # exact isotropic no-op while AMP>0 is bit-identical to the
+    # verified ma_metric_tensor_viz construction.
+    g_eps = 1.0e-9
+    gref = gmax if gmax > g_eps else 1.0
+    base = 1.0 / h0 ** 2
+
+    # --- isotropic density: which redistribution model ------------
+    # Three regimes, in precedence order:
+    #
+    #  (1) ``resolution_ratio > 1`` → SINGLE-KNOB EQUIDISTRIBUTION
+    #      (the primary, documented API). The isotropic density is
+    #      ``s = base·ρ/G`` with ``G`` the geometric mean of ρ on
+    #      the (near-uniform, *undeformed*) D mesh, so
+    #      ``⟨ln s⟩ = ln base``: the node budget is centred and
+    #      refine ⇄ coarsen are **complementary by the conservation
+    #      law itself** — there is no coarsening parameter. The
+    #      eigen-clamp ``[base/R², base·R²]`` (cells ∈
+    #      ``[h0/R, h0·R]``) is a pure safety rail set by the one
+    #      knob ``R``. M-harmonic is scale-invariant, so the
+    #      normalisation *constant* is irrelevant to the realised
+    #      mesh — only ρ's spatial *ratio* and the clamp matter;
+    #      the geometric-mean centring just places the band
+    #      symmetrically so the clamp bites tails, not the bulk.
+    #
+    #  (2) ``coarsen_cap > 1`` (legacy expert override, not the
+    #      documented API) → the earlier ad-hoc
+    #      ``s = base·cc^(q-1)`` law. Preserved **bit-for-bit** so
+    #      every historical ``a16c*`` result still reproduces.
+    #
+    #  (3) otherwise → refine-only metric (``s ≡ base``),
+    #      **bit-identical** to the validated historical default.
+    #      ``resolution_ratio = 1`` (the default) lands here ⇒ an
+    #      exact no-op vs. all prior results.
+    def _build_M_tensor():
+        """Compute the metric tensor field Df from the current
+        metric and mesh state. Mutates Dout-equivalent into Df.
+        Called once before the iteration loop, and (when
+        metric_refresh_per_iter=True) also at the start of each
+        outer iteration to re-query the metric against the
+        deformed mesh."""
+        nonlocal Dcoords, gvec, gn, gmax, gref
+        Dcoords = np.asarray(Df.coords)  # picks up deformed mesh
+        gproj.solve()
+        gvec = np.asarray(
+            uw.function.evaluate(grho.sym, Dcoords)
+        ).reshape(-1, cdim)
+        gn = np.linalg.norm(gvec, axis=1)
+        gmax = float(gn.max()) if gn.size else 0.0
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            gmax = uw.mpi.comm.allreduce(gmax, op=_MPI.MAX)
+        gref = gmax if gmax > g_eps else 1.0
+        # Density branches (same as legacy code path)
+        if resolution_ratio > 1.0:
+            R_ = float(resolution_ratio)
+            rho_v_ = np.asarray(
+                uw.function.evaluate(metric, Dcoords)
+            ).reshape(-1)
+            s_log_ = np.log(np.clip(rho_v_, 1.0e-12, None))
+            if uw.mpi.size > 1:
+                from mpi4py import MPI as _MPI
+                tot = uw.mpi.comm.allreduce(
+                    float(s_log_.sum()), op=_MPI.SUM)
+                cnt = uw.mpi.comm.allreduce(
+                    int(s_log_.size), op=_MPI.SUM)
+                ln_g_ = tot / max(cnt, 1)
+            else:
+                ln_g_ = float(s_log_.mean())
+            a_ = float(geom_mean_smoothing)
+            if 0.0 < a_ < 1.0:
+                prev = _GEMA_STATE.get(key)
+                if prev is not None:
+                    ln_g_ = a_ * ln_g_ + (1.0 - a_) * prev
+                _GEMA_STATE[key] = ln_g_
+            iso_ = base * np.exp(s_log_ - ln_g_)
+            lam_lo_ = base / R_ ** 2
+            lam_hi_ = base * R_ ** 2
+            aniso_keyed_ = (np.full(Dcoords.shape[0], base)
+                            if aniso_to_base else iso_)
+        elif coarsen_cap > 1.0:
+            rho_v_ = np.asarray(
+                uw.function.evaluate(metric, Dcoords)
+            ).reshape(-1)
+            r_lo_ = float(np.percentile(rho_v_, 10.0))
+            r_hi_ = float(np.percentile(rho_v_, 90.0))
+            if uw.mpi.size > 1:
+                from mpi4py import MPI as _MPI
+                r_lo_ = uw.mpi.comm.allreduce(r_lo_, op=_MPI.MIN)
+                r_hi_ = uw.mpi.comm.allreduce(r_hi_, op=_MPI.MAX)
+            q_ = np.clip(
+                (rho_v_ - r_lo_) / max(r_hi_ - r_lo_, 1e-30),
+                0.0, 1.0)
+            iso_ = base * float(coarsen_cap) ** (q_ - 1.0)
+            lam_lo_ = base / float(coarsen_cap)
+            lam_hi_ = 1.0 / (h0 / np.sqrt(aniso_cap)) ** 2
+            aniso_keyed_ = np.full(Dcoords.shape[0], base)
+        else:
+            iso_ = np.full(Dcoords.shape[0], base)
+            lam_lo_ = base
+            lam_hi_ = 1.0 / (h0 / np.sqrt(aniso_cap)) ** 2
+            aniso_keyed_ = np.full(Dcoords.shape[0], base)
+        # Assemble M tensor and write to Df
+        Dout_ = np.empty((Dcoords.shape[0], 2, 2))
+        eye2_ = np.eye(2)
+        for ii in range(Dcoords.shape[0]):
+            g_ = gvec[ii]
+            gni_ = gn[ii]
+            bi_ = iso_[ii]
+            ai_ = aniso_keyed_[ii]
+            if gni_ > g_eps and gmax > g_eps:
+                gh_ = g_ / gni_
+                M_ = bi_ * eye2_ + ai_ * beta * (gni_ / gref) ** 2 \
+                     * np.outer(gh_, gh_)
+            else:
+                M_ = bi_ * eye2_
+            w_, V_ = np.linalg.eigh(M_)
+            w_ = np.clip(w_, lam_lo_, lam_hi_)
+            if metric_role == "Minv":
+                w_ = 1.0 / w_
+            Dout_[ii] = (V_ * w_) @ V_.T
+        Df.array[:, 0, 0] = Dout_[:, 0, 0]
+        Df.array[:, 0, 1] = Dout_[:, 0, 1]
+        Df.array[:, 1, 0] = Dout_[:, 1, 0]
+        Df.array[:, 1, 1] = Dout_[:, 1, 1]
+
+    if resolution_ratio > 1.0:
+        R = float(resolution_ratio)
+        rho_v = np.asarray(
+            uw.function.evaluate(metric, Dcoords)).reshape(-1)
+        s_log = np.log(np.clip(rho_v, 1.0e-12, None))
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            tot = uw.mpi.comm.allreduce(float(s_log.sum()),
+                                        op=_MPI.SUM)
+            cnt = uw.mpi.comm.allreduce(int(s_log.size),
+                                        op=_MPI.SUM)
+            ln_g = tot / max(cnt, 1)
+        else:
+            ln_g = float(s_log.mean())
+        # --- temporal damping of the normaliser G (EMA in log
+        # space) -------------------------------------------------
+        # G is recomputed from the *instantaneous* field every
+        # adaptation event; during a violent transient that lurches
+        # the whole ρ/G distribution sideways across the *fixed*
+        # eigen-clamp band → mass clamp-saturation → the mesh
+        # visibly "wobbles". Low-pass ln G across events (G is a
+        # geometric quantity ⇒ average in log space) so the band
+        # stays centred. This smooths **only the one global
+        # intensity scalar** — the spatial ρ(x) pattern still
+        # tracks the current field every event, so *where* it
+        # refines stays fully responsive. a=geom_mean_smoothing:
+        # a≥1 ⇒ no damping (instantaneous, the original behaviour);
+        # 0<a<1 ⇒ EMA, lnG_eff = a·lnG_now + (1−a)·lnG_prev (a≈0.25
+        # strong); the first event seeds the state (no history yet).
+        # Carried in _GEMA_STATE under the _ANISO_CACHE key so it
+        # persists across adaptation events but is per-run/per-mesh.
+        a = float(geom_mean_smoothing)
+        if 0.0 < a < 1.0:
+            prev = _GEMA_STATE.get(key)
+            if prev is not None:
+                ln_g = a * ln_g + (1.0 - a) * prev
+            _GEMA_STATE[key] = ln_g
+        # ρ̂ = ρ/G (geometric mean 1 ⇒ ⟨ln ρ̂⟩=0, budget-centred);
+        # iso = base·ρ̂ → refine where ρ̂>1, coarsen where ρ̂<1,
+        # the two complementary by construction (no coarsen knob).
+        iso = base * np.exp(s_log - ln_g)
+        lam_lo = base / R ** 2
+        lam_hi = base * R ** 2
+        # Anisotropic-bump magnitude. Default: ride the local
+        # density (M = iso·(I+β·bump) — the clean scale-invariant
+        # form). aniso_to_base=True keys it to constant `base`
+        # instead (M = iso·I + base·β·bump), matching the legacy
+        # cc=2 regime that produced a markedly solver-friendlier
+        # mesh: it stops a coarsened-near-front cell from being
+        # large AND strongly stretched (the clustered poor cells
+        # the equidist form makes during a violent transient).
+        aniso_keyed = (np.full(Dcoords.shape[0], base)
+                       if aniso_to_base else iso)
+    elif coarsen_cap > 1.0:
+        rho_v = np.asarray(
+            uw.function.evaluate(metric, Dcoords)).reshape(-1)
+        r_lo = float(np.percentile(rho_v, 10.0))
+        r_hi = float(np.percentile(rho_v, 90.0))
+        if uw.mpi.size > 1:
+            from mpi4py import MPI as _MPI
+            r_lo = uw.mpi.comm.allreduce(r_lo, op=_MPI.MIN)
+            r_hi = uw.mpi.comm.allreduce(r_hi, op=_MPI.MAX)
+        q = np.clip((rho_v - r_lo) / max(r_hi - r_lo, 1.0e-30),
+                    0.0, 1.0)
+        iso = base * float(coarsen_cap) ** (q - 1.0)   # q=1 → base
+        lam_lo = base / float(coarsen_cap)             # coarsest
+        lam_hi = 1.0 / (h0 / np.sqrt(aniso_cap)) ** 2  # finest
+        aniso_keyed = np.full(Dcoords.shape[0], base)
+    else:
+        iso = np.full(Dcoords.shape[0], base)
+        lam_lo = base                                  # coarsest
+        lam_hi = 1.0 / (h0 / np.sqrt(aniso_cap)) ** 2  # finest
+        aniso_keyed = np.full(Dcoords.shape[0], base)
+
+    Dout = np.empty((Dcoords.shape[0], 2, 2))
+    eye2 = np.eye(2)
+    for i in range(Dcoords.shape[0]):
+        g = gvec[i]
+        gni = gn[i]
+        bi = iso[i]
+        ai = aniso_keyed[i]
+        if gni > g_eps and gmax > g_eps:
+            gh = g / gni
+            # iso·I (equidistribution density) + anisotropic bump
+            # (regime 1: keyed to local iso ⇒ the whole metric is
+            # one scale-invariant density·shape field, clamp = rail;
+            # regimes 2/3: keyed to base ⇒ aniso_cap/beta retain
+            # their exact validated meaning).
+            M = bi * eye2 + ai * beta * (gni / gref) ** 2 \
+                * np.outer(gh, gh)
+        else:
+            M = bi * eye2
+        w, V = np.linalg.eigh(M)
+        w = np.clip(w, lam_lo, lam_hi)
+        if metric_role == "Minv":
+            w = 1.0 / w
+        Dout[i] = (V * w) @ V.T
+    Df.array[:, 0, 0] = Dout[:, 0, 0]
+    Df.array[:, 0, 1] = Dout[:, 0, 1]
+    Df.array[:, 1, 0] = Dout[:, 1, 0]
+    Df.array[:, 1, 1] = Dout[:, 1, 1]
+
+    # Pre-compute the undeformed-mesh median cell area, used by the
+    # backtrack's sliver guard. Captured ONCE before the iteration
+    # loop so the floor doesn't shrink as cells refine — the same
+    # absolute floor is enforced throughout.
+    _tris_for_a0 = _tri_cells(mesh.dm)
+    if _tris_for_a0 is not None and _tris_for_a0.size:
+        _a0_undeformed_med = float(np.median(np.abs(
+            _signed_areas(old0, _tris_for_a0))))
+    else:
+        _a0_undeformed_med = 0.0
+
+    for outer in range(n_outer):
+        dm = mesh.dm
+        pStart, pEnd = dm.getDepthStratum(0)
+        n_verts = pEnd - pStart
+        is_bnd = _pinned_mask(dm, pinned_labels)
+        tris = _tri_cells(dm)
+        old_coords = np.asarray(mesh.X.coords).copy()
+        _cdim = mesh.cdim
+
+        # If requested, re-query the metric at the deformed
+        # mesh state and rebuild M tensor. Default off
+        # preserves the legacy behaviour (M frozen at first
+        # iteration). Used to isolate whether Eulerian
+        # re-querying of the metric changes the outcome.
+        if metric_refresh_per_iter and outer > 0:
+            _build_M_tensor()
+
+        # Boundary tangential slip — identical per-ring radius
+        # projection to _winslow_elliptic (the radial DOF is
+        # removed, so slip nodes provably stay on their ring; one
+        # node/ring anchors the rotation gauge).
+        if boundary_slip and is_bnd.any():
+            bc = np.nonzero(is_bnd)[0]
+            c0 = old_coords[bc].mean(axis=0)
+            rg = np.round(
+                np.linalg.norm(old_coords[bc] - c0, axis=1), 6)
+            is_anchor = np.zeros(n_verts, dtype=bool)
+            slip_center = np.zeros((n_verts, _cdim))
+            slip_rtarget = np.zeros(n_verts)
+            for rv in np.unique(rg):
+                grp = bc[rg == rv]
+                rc = old_coords[grp].mean(axis=0)
+                is_anchor[grp[np.argmax(
+                    (old_coords[grp] - rc)[:, 0])]] = True
+                slip_center[grp] = rc
+                slip_rtarget[grp] = np.linalg.norm(
+                    old_coords[grp] - rc, axis=1)
+            is_slip = is_bnd & ~is_anchor
+            is_pinned = is_anchor
+            _sidx = np.nonzero(is_slip)[0]
+            _sctr = slip_center[_sidx]
+            _srad = slip_rtarget[_sidx]
+
+            def _project(Y):
+                v = Y[_sidx] - _sctr
+                nrm = np.linalg.norm(v, axis=1)
+                nrm = np.where(nrm > 1.0e-30, nrm, 1.0)
+                Y[_sidx] = _sctr + v * (_srad / nrm)[:, None]
+                return Y
+        else:
+            is_pinned = is_bnd
+
+            def _project(Y):
+                return Y
+
+        # D is fixed & Lagrangian (built once, above) — no
+        # re-projection feedback. The outer loop is a damped
+        # fixed-point iteration toward the fixed M-harmonic map.
+
+        # --- solve the cdim displacement components ----------------
+        disp = np.zeros_like(old_coords)
+        for c in range(cdim):
+            usolvers[c].solve(zero_init_guess=_zig)
+            disp[:, c] = np.asarray(
+                uw.function.evaluate(ufields[c].sym, old_coords)
+            ).reshape(-1)
+
+        # Directional move-weighting (opt-in; same frame + default
+        # None ⇒ unchanged as _winslow_elliptic).
+        if move_anisotropy is not None and cdim == 2:
+            w_r, w_t = (float(move_anisotropy[0]),
+                        float(move_anisotropy[1]))
+            ctr = old_coords.mean(axis=0)
+            rv = old_coords - ctr
+            rn = np.linalg.norm(rv, axis=1)
+            ok = rn > 1.0e-30
+            rhat = np.zeros_like(rv)
+            rhat[ok] = rv[ok] / rn[ok, None]
+            that = np.stack([-rhat[:, 1], rhat[:, 0]], axis=1)
+            d_r = (disp * rhat).sum(axis=1)
+            d_t = (disp * that).sum(axis=1)
+            disp = (w_r * d_r[:, None] * rhat
+                    + w_t * d_t[:, None] * that)
+
+        # --- per-cell Lagrangian rest-size spring -----------------
+        # When `rest_size_cap_max` / `rest_size_cap_min` are set,
+        # add a restoring force to each vertex that pulls it
+        # toward its rest position (`old0`, captured before the
+        # mover started) whenever an incident cell's edge would
+        # overshoot the cap under the proposed move.
+        #
+        # We use **max-edge** for the coarsening cap (a cell
+        # grew in *any* direction beyond `h0·coarsening`) and
+        # **min-edge** for the refinement cap (a cell shrunk
+        # in *any* direction below `h0/refinement`). Both
+        # measures are sliver-aware — they catch anisotropic
+        # cells that mean-edge wouldn't flag.
+        #
+        # Motivation: the metric-mover is a local graph-Laplacian
+        # — nodes cannot transport across high-gradient ridges,
+        # so cells *adjacent* to a refinement zone absorb most
+        # of the freed area while cells topologically isolated
+        # from the refinement stay near rest size. Without a
+        # spring, the adjacent cells over-coarsen by ~2× the cap
+        # and the BL cells over-refine to thin slivers (aspect
+        # ratios > 10). The spring restores both by literally
+        # pulling nodes back along the original positions,
+        # weighted by how much the local cell exceeds the cap.
+        if (rest_size_cap_max is not None
+                or rest_size_cap_min is not None):
+            proposed = old_coords + float(relax) * disp
+            p = proposed[tris]
+            e0 = np.linalg.norm(p[:, 1] - p[:, 0], axis=1)
+            e1 = np.linalg.norm(p[:, 2] - p[:, 1], axis=1)
+            e2 = np.linalg.norm(p[:, 0] - p[:, 2], axis=1)
+            # Sliver-aware per-cell extremes:
+            max_h = np.maximum(np.maximum(e0, e1), e2)
+            min_h = np.minimum(np.minimum(e0, e1), e2)
+            # Per-cell fractional excess vs cap. Both ≥ 0.
+            #   over  = max(any edge)/cap_max - 1       (coarsening
+            #     fault: at least one edge too long)
+            #   under = cap_min / min(any edge) - 1     (refinement
+            #     fault: at least one edge too short, i.e. sliver)
+            if rest_size_cap_max is not None:
+                over = np.maximum(
+                    max_h / float(rest_size_cap_max) - 1.0, 0.0)
+            else:
+                over = np.zeros_like(max_h)
+            if rest_size_cap_min is not None:
+                under = np.maximum(
+                    float(rest_size_cap_min)
+                    / np.maximum(min_h, 1.0e-30) - 1.0, 0.0)
+            else:
+                under = np.zeros_like(min_h)
+            # Per-vertex restoring weight ← Σ over incident cells,
+            # CAPPED AT 1. Without the cap, a vertex incident on
+            # several violating cells accumulates restore_w > 1
+            # and the spring overshoots its rest position
+            # (`new = old + restore_w · (rest - old)` lands past
+            # `rest`), pulling two vertices together and creating
+            # degenerate (near-zero-area) triangles. Capping at 1
+            # makes the worst-case per-iteration motion "exactly
+            # back to rest", never further.
+            restore_w = np.zeros(old_coords.shape[0])
+            cell_w = float(rest_spring_K) * (over + under)
+            np.add.at(restore_w, tris[:, 0], cell_w)
+            np.add.at(restore_w, tris[:, 1], cell_w)
+            np.add.at(restore_w, tris[:, 2], cell_w)
+            np.minimum(restore_w, 1.0, out=restore_w)
+            # Add the restoring contribution to disp. (Divide by
+            # relax so the downstream `step = relax · disp` gives
+            # the intended fraction restore_w · (rest - current).)
+            spring_disp = restore_w[:, None] * (old0 - old_coords)
+            disp = disp + spring_disp / max(float(relax), 1.0e-30)
+
+        # Damped MMPDE step. The *direct* Winslow form (physical
+        # coords as M-harmonic functions of themselves) has no
+        # Rado–Kneser–Choquet non-folding guarantee — applied as a
+        # single elliptic jump it overshoots and the signed-area
+        # backtrack thrashes into a degenerate sliver. The standard
+        # remedy is to integrate the mesh PDE as a damped gradient
+        # flow: under-relax the displacement and compose over
+        # n_outer steps (the metric is re-projected each step). This
+        # is the exact analogue of _winslow_elliptic's picard_relax
+        # (the BFO path needs ω≈0.4 or its Hessian grows unbounded).
+        step = float(relax) * disp
+
+        # --- coherent global signed-area backtrack + slip + move --
+        free = ~is_pinned
+        scale = 1.0
+        new_coords = old_coords.copy()
+        if tris is not None:
+            a0 = _signed_areas(old_coords, tris)
+            orient = np.sign(np.median(a0)) or 1.0
+            # Minimum acceptable cell area for the backtrack. The
+            # original test (`a1min > 0`) only catches *flipped*
+            # cells; near-degenerate cells with three near-collinear
+            # vertices pass it but produce invisible sliver
+            # triangles. Require min area > a fixed fraction of
+            # the **undeformed-mesh** median cell area
+            # (`_a0_undeformed_med`, captured before the iteration
+            # loop). A refinement of 3 in 2D legitimately shrinks
+            # cells by 3²=9× in area, so a floor at 1% of the
+            # undeformed median rejects degenerate slivers (which
+            # are 1000× smaller) without rejecting legitimate
+            # refinement.
+            a_min_floor = 0.01 * _a0_undeformed_med
+            for _bt in range(10):
+                trial = old_coords.copy()
+                trial[free] += scale * step[free]
+                trial = _project(trial)
+                a_signed = _signed_areas(trial, tris) * orient
+                a1min = float(a_signed.min())
+                if uw.mpi.size > 1:
+                    from mpi4py import MPI as _MPI
+                    a1min = uw.mpi.comm.allreduce(
+                        a1min, op=_MPI.MIN)
+                # Accept only if no cell flipped AND no cell
+                # collapsed below the area floor.
+                if a1min > a_min_floor:
+                    new_coords = trial
+                    break
+                scale *= 0.5
+            else:
+                scale = 0.0
+                new_coords = old_coords.copy()
+        else:
+            new_coords[free] += step[free]
+            new_coords = _project(new_coords)
+
+        mesh._deform_mesh(new_coords)
+
+        d = float(np.linalg.norm(
+            new_coords - old_coords, axis=1).max())
+        if uw.mpi.size > 1:
+            d = uw.mpi.comm.allreduce(d ** 2) ** 0.5
+        if verbose:
+            uw.pprint(
+                f"  anisotropic mover outer {outer+1}/{n_outer}: "
+                f"h0={h0:.3e}  scale={scale:.3f}  "
+                f"max|Δx|={d:.3e}")
+        if d < outer_tol:
+            break
+
+
 def _build_local_to_owned_map(dm, gsection, vec):
     """Compute, for each local owned vertex, its position in the
     rank's slice of the global Vec.
@@ -235,24 +2582,48 @@ def smooth_mesh_interior(
     pinned_labels: Optional[Sequence[str]] = None,
     n_iters: int = 5,
     alpha: float = 0.5,
+    metric=None,
+    method: str = "spring",
+    boundary_slip: bool = False,
+    method_kwargs: Optional[dict] = None,
     verbose: bool = False,
+    skip_threshold=_UNSET,
+    strategy: Optional[str] = None,
 ):
-    r"""Apply Winslow Jacobi smoothing to a mesh's interior vertices.
+    r"""Smooth a mesh's interior vertices, optionally toward a
+    spatially-varying target spacing.
 
-    Each interior vertex is replaced by a blend of its current
-    position and the unweighted mean of its edge-neighbour positions:
+    **Default (``metric=None``)** — graph-Laplacian Jacobi: each
+    interior vertex is blended toward the plain mean of its edge
+    neighbours,
 
     .. math::
 
         x_i^{n+1} = (1 - \alpha)\, x_i^n
                     + \alpha \cdot \frac{1}{|N(i)|}
-                    \sum_{j \in N(i)} x_j^n
+                    \sum_{j \in N(i)} x_j^n ,
+
+    over ``n_iters`` sweeps. Equalises connectivity → equant cells.
+
+    **With a ``metric``** — an elastic-spring network relaxed to
+    equilibrium. Every edge is a linear spring with rest length
+    ``∝ ρ_tgt^{-1/d}`` (``ρ_tgt = metric``), scaled so the mean rest
+    length equals the current mean edge length (overall scale
+    preserved — pure redistribution). Damped Jacobi force iteration
+    relaxes interior nodes to force balance, with a coherent global
+    signed-area backtrack guaranteeing no cell inverts. The rest
+    length is an *absolute* target, so the mesh genuinely grades
+    toward spacing ``∝ ρ_tgt^{-1/d}`` (a regime the weighted
+    Laplacian / Jacobi cannot reach). ``n_iters`` and ``alpha`` are
+    ignored on this path (it has its own internal sweep budget). A
+    Lagrangian density (``f(r0.sym)`` peaked at the original outer
+    radius) keeps the rest lengths fixed per material point, so the
+    *design* boundary-layer grading is restored even after
+    free-surface deformation.
 
     Vertices in any of ``pinned_labels`` are held fixed (preserves
     boundary geometry). The mesh's coordinate vector is updated in
-    place via ``mesh._deform_mesh`` once after all sweeps — so the
-    DM rebuild / cache invalidation cost is paid once rather than
-    per sweep.
+    place via ``mesh._deform_mesh`` once at the end.
 
     Parameters
     ----------
@@ -265,24 +2636,159 @@ def smooth_mesh_interior(
         stays put. Pass an explicit list to release some boundaries.
     n_iters : int, default 5
         Number of Jacobi sweeps. 5-10 is typical for surface-
-        deformation cleanup.
+        deformation cleanup. **Ignored when ``metric`` is given**
+        (the spring path has its own internal sweep budget).
     alpha : float, default 0.5
-        Under-relaxation in ``(0, 1]``. 1.0 is pure Jacobi; smaller
-        is more damped (slower but safer on irregular meshes).
+        Under-relaxation in ``(0, 1]`` for the Jacobi path. 1.0 is
+        pure Jacobi; smaller is more damped. **Ignored when
+        ``metric`` is given.**
+    metric : sympy / UW expression, optional
+        Target *density* :math:`\rho_{\mathrm{tgt}}` (larger ⇒
+        finer cells). Typically ``f(r0.sym)`` for a refinement
+        function ``f`` of a Lagrangian state variable ``r0`` (a
+        degree-1 scalar MeshVariable set once to the original
+        coordinate and never reassigned, so its value rides each
+        material point through deformation). Should be strictly
+        positive and finite. ``None`` (default) ⇒ the
+        graph-Laplacian Jacobi path, unchanged behaviour
+        bit-for-bit.
+    method : {"spring", "ma"}, default "spring"
+        Metric-grading solver (ignored when ``metric is None``):
+
+        * ``"spring"`` — *volumetric* elastic-spring equilibrium:
+          equal edge springs (shape regulariser, equant cells, no
+          slivers) + a per-cell area constraint
+          ``A0 ∝ 1/ρ_tgt`` (the size grading), minimised by
+          preconditioned nonlinear CG. **Fast** (~0.3 s on a
+          res-16 Annulus), robust, scales with the metric
+          amplitude; slightly anisotropic at sharp interior
+          features.
+        * ``"ma"`` — Benamou–Froese–Oberman convex-branch
+          **Monge–Ampère** equidistribution. Highest-fidelity
+          *isotropic* refinement and robust to the boundary
+          treatment, but ~60× costlier than the spring.
+        * ``"anisotropic"`` — **tensor** metric mover: an
+          M-weighted Laplace (Winslow) smooth of the coordinate
+          map with an eigen-clamped, gradient-derived *anisotropic*
+          metric tensor. Reshapes cells (short across a feature,
+          long along it) and removes the slivers / wasted isotropic
+          resolution the scalar paths leave near a boundary-peaked
+          feature. Linear (one solve/component/step — cheaper than
+          ``"ma"``). It improves cell **alignment / quality**, not
+          the grading magnitude (see the cap note below); for a
+          *separable* feature the explicit 1-D OT is exact and
+          cheaper — ``"anisotropic"`` earns its keep on the general
+          non-separable case.
+
+        With a fixed node count neither can exceed ≈1.3–1.8×
+        deep/near grading (the optimal-transport ≈10× needs *more
+        nodes* — a topology change, not this smoother). See
+        ``docs/developer/subsystems/mesh-metric-redistribution.md``.
+    boundary_slip : bool, default False
+        Let boundary nodes slide tangentially along their boundary
+        (snapped back to the boundary each step — they cannot leave
+        it; serial circular/spherical boundaries only). Strongly
+        helps the spring (+~10 % grading, faster); near-no-op for
+        ``ma`` (its natural Neumann BC already handles the
+        boundary). Off by default — for a free surface the boundary
+        is the moving surface, so sliding interacts with the
+        free-surface coupling; enable per use-context.
+    method_kwargs : dict, optional
+        Extra tuning forwarded to the chosen metric solver (ignored
+        when ``metric is None``). Keeps the shared signature clean
+        while exposing the per-method knobs. For
+        ``method="anisotropic"`` there is **one primary knob**:
+
+        * ``resolution_ratio`` (``R``, default **1.0 = exact
+          no-op**) — *the* tuneable. Cells may refine to ``h0/R``
+          and coarsen to ``h0·R``; the refine ⇄ coarsen split is
+          **not a parameter** — the isotropic density is
+          equidistribution-normalised (``s = base·ρ/G``, ``G`` the
+          geometric mean of ρ), so flat regions release exactly the
+          budget the fronts consume, *complementary by the
+          conservation law itself*. The eigen-clamp
+          ``[h0/R, h0·R]`` is just a safety rail. ``R=1`` ⇒
+          bit-identical to the refine-only historical default (an
+          exact no-op vs. every prior result). ``R≈2`` is the
+          validated production point (clean mesh through a full
+          convection lifecycle, ``minA/meanA``≈0.2, genuine
+          plume-reaching de-resolution, settled physics intact).
+          One number; complementary coarsening is automatic.
+        * ``geom_mean_smoothing`` (``a``, default 0.25) —
+          *internal* temporal damping of the equidistribution
+          normaliser ``G`` (not a grading knob; only acts when
+          ``R>1``). ``G`` is recomputed from the instantaneous
+          field every adaptation event; in a violent transient
+          that lurches the whole ``ρ/G`` distribution across the
+          fixed clamp band → clamp-saturation → the mesh visibly
+          "wobbles". An EMA in log space
+          (``lnG ← a·lnG_now+(1−a)·lnG_prev``) keeps the band
+          centred: ``a=1`` ⇒ no damping (instantaneous, the
+          original wobbly behaviour); ``a≈0.25`` ⇒ strong damping
+          of the startup over-reaction + steady-state contrast
+          pulse. It smooths **only the one global intensity
+          scalar** — the spatial ρ(x) pattern still tracks the
+          current field every event, so the API stays single-knob
+          (``R``); ``a`` carries one internal scalar across events.
+        * ``relax`` (0.2) / ``n_outer`` (12) — damped-MMPDE
+          under-relaxation + composed steps (early-exit
+          ``outer_tol``). ``linear_solver`` (``"direct"`` | MUMPS |
+          ``"gamg"``, bit-parity, parallel-scalable). ``beta``
+          (200) — anisotropic-bump saturation. ``move_anisotropy``
+          — optional radial/tangential move reweight.
+        * **Expert overrides (not the documented API; only honoured
+          when ``resolution_ratio≤1``):** ``aniso_cap`` (2.0) and
+          ``coarsen_cap`` (1.0) are the legacy two-knob clamp
+          (``h_min=h0/√aniso_cap``, ``h_max=h0·√coarsen_cap``,
+          ad-hoc ``s=base·cc^(q-1)``). Retained **bit-for-bit** so
+          historical scripts reproduce; superseded by
+          ``resolution_ratio``.
+
+        Example::
+
+            smooth_mesh_interior(
+                mesh, metric=rho, method="anisotropic",
+                method_kwargs=dict(resolution_ratio=2.0,
+                                   relax=0.05, n_outer=25))
     verbose : bool, default False
-        Print per-sweep RMS interior displacement.
+        Print per-sweep (Jacobi) or periodic (spring/MA) progress.
+    skip_threshold : float, optional
+        If set, evaluate the *misalignment* between current mesh
+        cell density and the metric (via
+        :func:`mesh_metric_mismatch`) and **skip the adapt** when
+        misalignment is below this threshold. Misalignment is
+        ``√(1 − r²)`` where ``r`` is the Pearson correlation of
+        ``log(1/A_cell)`` with ``log(ρ_cell)`` — a magnitude-free
+        measure of whether cell density is aligned with the
+        metric. 0 ⇒ perfectly aligned; 1 ⇒ orthogonal /
+        anti-aligned. Ignored when ``metric is None``. Calibration
+        from one of the R=1.5 stagnant-lid tests: a uniform mesh
+        gives misalignment ≈ 1.00 (r ≈ 0); a freshly-adapted mesh
+        gives misalignment ≈ 0.85 (r ≈ 0.52). So ``0.9`` is a
+        sensible "skip if reasonably aligned" default for an
+        adaptive convection loop; ``0.5`` is strict (only skip
+        when very well aligned); ``0`` ⇒ always adapt
+        (equivalent to ``None``). Cost: one ``metric`` evaluate
+        at cell centroids + a few NumPy reductions.
 
     Notes
     -----
-    **Parallel implementation**: the vertex-vertex adjacency is
-    assembled as a parallel PETSc AIJ matrix; each rank inserts
-    entries for every locally-visible edge using GLOBAL vertex
-    indices and ``mat.assemble()`` routes cross-rank contributions
-    so that owned-vertex rows are complete after assembly. The
-    per-sweep update is then a per-component ``A.mult`` followed by
-    a pointwise divide by the precomputed degree vector. Results
-    are bit-identical (to a single ULP) between serial and parallel
-    runs at any rank count.
+    **Parallel implementation (Jacobi path)**: the vertex-vertex
+    adjacency is assembled as a parallel PETSc AIJ matrix; each rank
+    inserts entries for every locally-visible edge using GLOBAL
+    vertex indices and ``mat.assemble()`` routes cross-rank
+    contributions so that owned-vertex rows are complete after
+    assembly. The per-sweep update is a per-component ``A.mult``
+    followed by a pointwise divide by the precomputed degree vector.
+    Results are bit-identical (to a single ULP) between serial and
+    parallel runs at any rank count.
+
+    **Spring path**: serial-exact. Edge forces are accumulated over
+    locally-visible edges only, so rank-partition-boundary nodes
+    under-count their incident forces in parallel (a future PR can
+    assemble the edge forces cross-rank like the Jacobi adjacency
+    Mat). The edge list and per-node degree are cached against the
+    topology key and rebuilt only on a topology change.
 
     **Topology preservation**: vertex IDs, DOF mappings, and the
     rank partition are unchanged. Only coordinates move. Anything
@@ -308,11 +2814,106 @@ def smooth_mesh_interior(
     Pin nothing (free-floating; rare — boundary will collapse)::
 
         smooth_mesh_interior(mesh, pinned_labels=[])
+
+    Restore a design grading via a Lagrangian refinement metric::
+
+        r0 = uw.discretisation.MeshVariable(
+            "r0", mesh, uw.VarType.SCALAR, degree=1)
+        X0 = np.asarray(mesh.X.coords)
+        r0.data[:, 0] = np.sqrt((X0 ** 2).sum(axis=1))   # set once
+        # ... deformation that crushes near-surface cells ...
+        f = 1 + 8 * sympy.exp(-((r0.sym[0] - 1.0) / 0.12) ** 2)
+        smooth_mesh_interior(mesh, metric=f)
     """
     if pinned_labels is None:
         pinned_labels = _auto_pinned_labels(mesh)
     pinned_labels = tuple(pinned_labels)
 
+    # Resolve strategy defaults — individual kwargs override.
+    # "off" → early-exit, mesh stays uniform.
+    if strategy is not None:
+        if strategy not in ADAPT_STRATEGIES:
+            raise ValueError(
+                f"unknown strategy {strategy!r}; choose from "
+                f"{list(ADAPT_STRATEGIES.keys())}")
+        if strategy == "off":
+            if verbose:
+                print("  smooth_mesh_interior: strategy='off' "
+                      "→ skipping", flush=True)
+            return
+        _s = ADAPT_STRATEGIES[strategy]
+        if skip_threshold is _UNSET:
+            skip_threshold = _s["skip_threshold"]
+        # method_kwargs: fill in resolution_ratio from strategy
+        # if caller didn't already set it.
+        if method_kwargs is None:
+            method_kwargs = {}
+        else:
+            method_kwargs = dict(method_kwargs)
+        method_kwargs.setdefault(
+            "resolution_ratio", _s["resolution_ratio"])
+    if skip_threshold is _UNSET:
+        skip_threshold = None
+
+    if metric is not None:
+        mk = dict(method_kwargs or {})
+        # Skip-if-good-enough: compare current cell sizes to what
+        # the metric would prescribe via equidistribution and bail
+        # out early when the mesh is already aligned. Cheap (one
+        # evaluate + a few NumPy reductions) — avoids a redundant
+        # mover call when the mesh hasn't drifted from its target.
+        # Mismatch is measured against the R-clamped achievable
+        # target (when the anisotropic mover's resolution_ratio is
+        # given), so a perfectly-adapted mesh measures ~0.
+        if skip_threshold is not None:
+            _R = mk.get("resolution_ratio", None)
+            mm = mesh_metric_mismatch(
+                mesh, metric, resolution_ratio=_R)
+            # `misalignment` = √(1 - r²) where r is Pearson of
+            # log(1/A_c) vs log(ρ_c). 0 ⇒ mesh density is
+            # perfectly aligned with the metric; 1 ⇒ uncorrelated.
+            # Skip when misalignment is below threshold.
+            if mm["misalignment"] < float(skip_threshold):
+                if verbose:
+                    print(f"  smooth_mesh_interior: skipping "
+                          f"(misalignment {mm['misalignment']:.3f} "
+                          f"< threshold {skip_threshold:.3f}; "
+                          f"alignment r={mm['alignment']:.3f})",
+                          flush=True)
+                return
+            if verbose:
+                print(f"  smooth_mesh_interior: adapting "
+                      f"(misalignment {mm['misalignment']:.3f} ≥ "
+                      f"threshold {skip_threshold:.3f}; "
+                      f"alignment r={mm['alignment']:.3f})",
+                      flush=True)
+        if method == "spring":
+            _winslow_spring(mesh, metric, pinned_labels, verbose,
+                            boundary_slip=boundary_slip, **mk)
+        elif method in ("ma", "monge-ampere", "monge_ampere"):
+            _winslow_elliptic(mesh, metric, pinned_labels, verbose,
+                              boundary_slip=boundary_slip, **mk)
+        elif method in ("ot", "equidistribute", "improve"):
+            _winslow_equidistribute(mesh, metric, pinned_labels,
+                                     verbose,
+                                     boundary_slip=boundary_slip,
+                                     **mk)
+        elif method in ("anisotropic", "aniso", "tensor"):
+            _winslow_anisotropic(mesh, metric, pinned_labels,
+                                 verbose,
+                                 boundary_slip=boundary_slip, **mk)
+        else:
+            raise ValueError(
+                f"smooth_mesh_interior: unknown method {method!r}; "
+                f"use 'spring' (default, fast volumetric), "
+                f"'ma' (Monge–Ampère, isotropic, ~60× costlier), "
+                f"'ot' / 'equidistribute' (linear OT-improvement "
+                f"step, composable) or "
+                f"'anisotropic' (tensor metric — reshapes cells / "
+                f"removes slivers; does not beat the node-count "
+                f"cap).")
+        return
+
     dm = mesh.dm
     pStart, pEnd = dm.getDepthStratum(0)
     cStart, cEnd = dm.getHeightStratum(0)
@@ -402,3 +3003,726 @@ def smooth_mesh_interior(
                 local_vec.array).reshape(-1, cdim)
 
     mesh._deform_mesh(coords)
+
+
+# Cached (∇field projector, |∇field| density) per (mesh, degree,
+# name, topology) so metric_density_from_gradient is cheap and
+# leak-free when called every step in an adaptive loop.
+_MDG_CACHE: dict = {}
+
+
+def metric_density_from_gradient(
+    mesh,
+    field,
+    *,
+    refinement=None,
+    coarsening="auto",
+    metric_choice: str = "front-following",
+    strategy: str = "med",
+    amp=_UNSET,
+    lo_percentile=_UNSET,
+    hi_percentile=_UNSET,
+    power=_UNSET,
+    mode: str = "percentile",
+    smoothing_length=None,
+    gradient_smoothing_length=None,
+    degree: int = 1,
+    name: Optional[str] = None,
+):
+    r"""Build a target-**density** metric ``ρ ∝ normalised |∇field|``
+    for the metric movers — the relative, fixed-node-budget
+    analogue of :func:`underworld3.adaptivity.metric_from_gradient`
+    (which maps ``|∇field|`` to an *absolute* target edge length
+    for the MMG re-mesher; the mover has a fixed node budget so it
+    redistributes *relatively* instead).
+
+    .. math::
+
+        \rho = (1 + \mathrm{amp}\cdot t)^{\mathrm{power}},\qquad
+        t = \mathrm{clip}\!\Big(
+            \frac{|\nabla\mathrm{field}| - g_{lo}}
+                 {g_{hi} - g_{lo}}, 0, 1\Big),
+
+    with ``g_lo, g_hi`` the lo/hi percentiles of ``|∇field|`` (the
+    same percentile-window idea as the adaptation metric).
+
+    **What the power knob does (strategic choice).** The mover
+    equidistributes ``ρ`` (cell area × ρ ≈ const). Combined with
+    ``A_c = h_c^d`` in ``d`` dimensions that gives
+    ``h_c ∝ ρ_c^{-1/d}``. For the linear ramp ``ρ ∝ |∇T|`` (i.e.
+    ``power=1``, the historical default) this means
+    ``h_c ∝ |∇T|^{-1/d}`` and the per-cell temperature change
+    ``ΔT_c ≈ |∇T|·h_c ∝ |∇T|^{1-1/d}`` — strong-gradient cells
+    still carry MORE temperature change than weak-gradient cells.
+    Choosing ``power = d`` (so ``ρ ∝ |∇T|^d``) gives
+    ``h_c ∝ 1/|∇T|`` and ``ΔT_c ≈ const`` — a **gradient-uniform
+    target**: every cell carries the same temperature change.
+    ``power = 1`` (default) targets "refinement of fronts /
+    boundaries" (mild grading concentrated where gradients are
+    strongest); ``power = d`` targets "uniform per-cell error in
+    a piecewise-linear T interpolant" (the natural goal for
+    advection-diffusion accuracy). Values in between blend the
+    two; ``power < 1`` softens grading further.
+    ``|∇field|`` is L2-projected (a *first* derivative — UW3-clean)
+    and the normalised ``t`` is stored in a **frozen Lagrangian
+    scalar field**, so the returned metric rides material points —
+    required by the movers, which build the metric once on the
+    undeformed mesh. Pass the result straight to
+    :func:`smooth_mesh_interior`::
+
+        rho = metric_density_from_gradient(mesh, T, amp=8.0)
+        smooth_mesh_interior(mesh, metric=rho,
+                             method="anisotropic")
+
+    The projector/fields are cached per ``(mesh, degree, name,
+    topology)``, so calling this **every step** in an adaptive loop
+    is cheap and does not leak MeshVariables. Each call re-projects
+    and re-freezes ``t`` at the *current* field state.
+
+    Parameters
+    ----------
+    mesh : underworld3 mesh
+    field : scalar MeshVariable or sympy scalar expression
+        The field whose gradient drives refinement (e.g. ``T``).
+    amp : float, default 8.0
+        Bunching intensity: ``ρ_max = (1 + amp)^power`` where
+        ``|∇field|`` is strongest. Larger ⇒ stronger
+        redistribution.
+    power : float, default 1.0
+        Exponent applied to the metric. ``1`` (default) =
+        front-following (``ρ ∝ |∇T|``, mild grading).
+        ``d`` (mesh dimension) = gradient-uniform
+        (``ρ ∝ |∇T|^d``, uniform per-cell ΔT). Values in
+        between blend; ``<1`` softens. The strategic choice is
+        between "refine the fronts" and "uniform per-cell
+        error", not a free dial — see the docstring math.
+    mode : {"percentile", "raw"}, default "percentile"
+        How the gradient drives the metric. ``"percentile"``
+        (default): ρ = (1 + amp·t)^power with t the
+        percentile-clipped normalised |∇field| — concentrates
+        budget into the steepest fronts, ignores values below
+        ``lo_percentile``. ``"raw"``: ρ = |∇field|^power
+        directly (no offset, no clipping, no amp). The mover's
+        equidistribution geometric-mean normalisation handles
+        the absolute scale; ``amp`` and ``lo/hi_percentile``
+        are ignored. Use ``"raw"`` to target gradient-uniform
+        per-cell ΔT cleanly; ``"percentile"`` to refine only
+        the top X% of gradient values.
+    lo_percentile, hi_percentile : float, default 50 / 97
+        ``|∇field|`` normalisation window (cf. the 5th/95th of
+        ``adaptivity.metric_from_gradient``). Raise ``lo`` to push
+        refinement only into the steepest fronts.
+    degree : int, default 1
+        Polynomial degree of the projected-gradient / density
+        fields (1 matches the anisotropic mover's default
+        ``aux_degree``).
+    name : str, optional
+        Cache disambiguator. Pass distinct names if you build
+        several independent gradient metrics on the *same* mesh
+        simultaneously (otherwise they share the cache slot).
+    smoothing_length : float or Pint Quantity, optional
+        Length-scale ``L`` for **field-side** screened-Poisson
+        smoothing applied to ``field`` BEFORE the gradient is
+        taken. Useful to suppress sub-grid noise in the source.
+        WARNING: at ``L ≳`` BL width this *erases* the
+        boundary-layer gradient — T's transition is spread over
+        ~L and the gradient peak ``T_active/h`` collapses to
+        ``T_active/L``. Prefer
+        ``gradient_smoothing_length`` when targeting features
+        with BL-like sub-h structure.
+    gradient_smoothing_length : float or Pint Quantity, optional
+        Length-scale ``L`` for **gradient-side** screened-Poisson
+        smoothing applied to the projected ``|∇field|`` field
+        (via the L2-projection's ``smoothing_length``). Peak
+        *location* of ``|∇T|`` is preserved (a BL still
+        concentrates near where T transitions); only the
+        spatial distribution / mesh-noise in the projection is
+        smoothed. This is the principled way to break the
+        metric/mesh feedback on adapted meshes without
+        destroying BL features. Set ``L ≈ h0`` (background
+        mean cell size) for mild de-noising;
+        ``L ≈ 2·h0`` for stronger.
+
+    Returns
+    -------
+    sympy expression
+        ``(1 + amp * t.sym[0])**power`` — Lagrangian, frozen at
+        call time.
+    """
+    import sympy
+
+    # Resolve strategy defaults — individual kwargs override.
+    if strategy not in ADAPT_STRATEGIES:
+        raise ValueError(
+            f"unknown strategy {strategy!r}; choose from "
+            f"{list(ADAPT_STRATEGIES.keys())}")
+    s = ADAPT_STRATEGIES[strategy]
+    if amp is _UNSET:
+        amp = s["amp"]
+    if lo_percentile is _UNSET:
+        lo_percentile = s["lo_percentile"]
+    if hi_percentile is _UNSET:
+        hi_percentile = s["hi_percentile"]
+    if power is _UNSET:
+        power = s["power"]
+
+    cdim = mesh.cdim
+    X = mesh.CoordinateSystem.X
+    dm = mesh.dm
+    pStart, pEnd = dm.getDepthStratum(0)
+    cStart, cEnd = dm.getHeightStratum(0)
+    tag = name or "mdg"
+    key = (id(mesh), int(degree), tag,
+           pEnd - pStart, cEnd - cStart)
+
+    cache = _MDG_CACHE.get(key)
+    if cache is None:
+        g = uw.discretisation.MeshVariable(
+            f"mdg_g_{id(mesh):x}_{tag}{degree}", mesh,
+            vtype=uw.VarType.VECTOR, degree=int(degree),
+            continuous=True)
+        gp = uw.systems.Vector_Projection(mesh, g)
+        gp.smoothing = 0.0
+        rho0 = uw.discretisation.MeshVariable(
+            f"mdg_rho_{id(mesh):x}_{tag}{degree}", mesh,
+            vtype=uw.VarType.SCALAR, degree=int(degree),
+            continuous=True)
+        # Optional pre-smoothing of the input field: a scalar
+        # screened-Poisson projection (u − L²∇²u = field) at
+        # smoothing_length L. Decouples the gradient computation
+        # from sub-L mesh structure, breaking the metric/mesh
+        # feedback loop.
+        f_smooth = uw.discretisation.MeshVariable(
+            f"mdg_fs_{id(mesh):x}_{tag}{degree}", mesh,
+            vtype=uw.VarType.SCALAR, degree=int(degree),
+            continuous=True)
+        fp = uw.systems.Projection(mesh, f_smooth)
+        _MDG_CACHE[key] = (g, gp, rho0, f_smooth, fp)
+    else:
+        g, gp, rho0, f_smooth, fp = cache
+
+    f_sym = (field.sym[0] if hasattr(field, "sym")
+             else sympy.sympify(field))
+    if smoothing_length is not None:
+        # Smooth the input field T at length L before computing
+        # ∇T. WARNING: at L ≳ BL width this *erases* the BL
+        # gradient — the screened-Poisson spreads T's transition
+        # layer over ~L and the gradient peak (T_active/h)
+        # collapses to T_active/L. For metric construction
+        # against a boundary-layer feature, prefer
+        # `gradient_smoothing_length` instead (smooths the
+        # projected gradient field rather than T).
+        fp.uw_function = f_sym
+        fp.smoothing_length = smoothing_length
+        fp.solve()
+        f_for_grad = f_smooth.sym[0]
+    else:
+        f_for_grad = f_sym
+    gp.uw_function = sympy.Matrix(
+        [f_for_grad.diff(X[i]) for i in range(cdim)]).T
+    # Apply screened-Poisson smoothing on the *gradient
+    # projection* — keeps peak location intact (where T
+    # transitions, ∇T peaks), just smooths the spatial
+    # distribution. This is the principled way to suppress
+    # mesh-induced noise in |∇T| without erasing BL features.
+    if gradient_smoothing_length is not None:
+        gp.smoothing_length = gradient_smoothing_length
+    else:
+        gp.smoothing = 0.0
+    gp.solve()
+    gmag = np.linalg.norm(np.asarray(uw.function.evaluate(
+        g.sym, rho0.coords)).reshape(-1, cdim), axis=1)
+    # Parallel-correct percentile window. np.percentile on the
+    # rank-LOCAL gmag gives each rank its *own subdomain*
+    # distribution, so the same physical |∇field| maps to a
+    # different density on different ranks — a partition-dependent
+    # metric ("refine the top X%" silently becomes "each rank's own
+    # top X%"). Gather the global gmag so g_lo/g_hi are computed
+    # once from the whole-domain distribution and are identical on
+    # every rank. Serial (size==1) takes the local array unchanged
+    # ⇒ bit-for-bit identical to the previous behaviour. (Partition-
+    # boundary DOFs are shared across ranks, so the gathered array
+    # slightly over-weights them in the percentile value — a
+    # second-order effect vs. the rank-local bug this fixes; exact
+    # owned-only de-duplication is a follow-up if ever needed.)
+    if uw.mpi.size > 1:
+        gmag_global = uw.utilities.gather_data(
+            gmag, bcast=True, dtype="float64")
+    else:
+        gmag_global = gmag
+    g_lo = float(np.percentile(gmag_global, lo_percentile))
+    g_hi = float(np.percentile(gmag_global, hi_percentile))
+    # No-op guard: a uniform field has |∇field| ≡ 0, but the L2
+    # projection leaves ~1e-18 round-off. Percentile-normalising
+    # that noise would fabricate a spurious [0,1] metric (the same
+    # failure the mover's own g_eps floor fixes). Any real field
+    # gradient is many orders above 1e-9 ⇒ a (near-)constant field
+    # yields ρ ≡ 1 (no refinement) exactly.
+
+    # NEW PATH: cell-size-envelope ansatz keyed by
+    # ``refinement`` (+ optional ``coarsening``).
+    #
+    # The mover's eigenvalue → cell-size map is ``h = h₀/√(ρ̂)``
+    # (after the mover's geometric-mean normalisation ρ̂ = ρ/G).
+    # So a literal envelope ``h ∈ [h₀/refinement, h₀·coarsening]``
+    # corresponds to ``ρ̂ ∈ [1/coarsening², refinement²]`` — note
+    # this is **dimension-independent** (the eigenvalue λ has
+    # units of 1/length², not 1/area).
+    #
+    # To make the mover's G normalisation land where we want, we
+    # build ρ with ``geomean(ρ) ≡ 1`` by construction. The cleanest
+    # form is piecewise-log-linear in the percentile rank ``pct``
+    # of |∇field|, with the break ρ=1 placed at
+    #
+    #     p* = log(refinement) / log(refinement · coarsening)
+    #
+    # which is exactly the fraction of cells that need to coarsen
+    # to ``free up`` the requested refinement at fixed node count.
+    #
+    # ``metric_choice`` selects the spatial *distribution*:
+    #
+    # * "front-following" — log(ρ) piecewise linear in pct rank.
+    #   Every 1% of cells contributes the same log(h) increment.
+    #   Mild, monotone grading concentrated on the high-gradient
+    #   tail.
+    # * "gradient-uniform" — ρ ∝ |∇field|², clipped to the
+    #   envelope. Targets uniform per-cell Δfield (the natural
+    #   goal for advection-diffusion accuracy).
+    #
+    # ``coarsening="auto"`` uses the budget-conserving minimum
+    # ``refinement^(1/d)`` — the smallest coarsening that
+    # geometrically "makes room" for the requested refinement at
+    # fixed node count.
+    #
+    # When the caller passes ``refinement=...``, this branch is
+    # taken and amp/lo_percentile/hi_percentile/mode/power are
+    # ignored — the envelope is determined directly.
+    if refinement is not None:
+        ref_val = float(refinement)
+        if ref_val < 1.0:
+            raise ValueError(
+                f"refinement must be >= 1.0, got {ref_val}")
+        # 'auto' coarsening = the budget-conserving minimum
+        if coarsening is None or coarsening == "auto":
+            coar_val = ref_val ** (1.0 / cdim)
+        else:
+            coar_val = float(coarsening)
+            if coar_val < 1.0:
+                raise ValueError(
+                    f"coarsening must be >= 1.0, got {coar_val}")
+        # Trivial-case shortcut: no refinement asked ⇒ ρ ≡ 1
+        if ref_val == 1.0 and coar_val == 1.0:
+            rho0.data[:, 0] = 1.0
+            return rho0.sym[0]
+        # Dimension-independent envelope (eigenvalue space)
+        log_rho_max = 2.0 * np.log(ref_val)   # ρ at the densest cells
+        log_rho_min = -2.0 * np.log(coar_val)  # ρ at the sparsest cells
+        N = max(int(gmag_global.size), 1)
+        if g_hi <= 1.0e-9:
+            # Uniform (or near-uniform) field ⇒ no refinement
+            rho0.data[:, 0] = 1.0
+            return rho0.sym[0]
+        g_sorted = np.sort(gmag_global)
+        ranks = np.linspace(0.0, 1.0, N)
+        pct = np.interp(gmag, g_sorted, ranks)
+        if metric_choice == "front-following":
+            # Piecewise log-linear in pct, with the break (log ρ=0)
+            # at p* = log(ref) / log(ref·coar). This makes
+            # geomean(ρ) = 1 by construction, so the mover's G
+            # normalisation passes ρ through unchanged and the
+            # eigenvalue clamps land on the literal envelope.
+            # Special-case: ref=1 ⇒ no refined half (pure coarsen);
+            #               coar=1 ⇒ no coarsened half (pure refine).
+            if ref_val == 1.0:
+                # Only coarsen
+                log_rho = log_rho_min * (1.0 - pct)
+            elif coar_val == 1.0:
+                # Only refine
+                log_rho = log_rho_max * pct
+            else:
+                p_star = (np.log(ref_val)
+                          / np.log(ref_val * coar_val))
+                log_rho = np.where(
+                    pct < p_star,
+                    log_rho_min * (1.0 - pct / p_star),
+                    log_rho_max * (pct - p_star)
+                    / max(1.0 - p_star, 1.0e-12),
+                )
+            rho0.data[:, 0] = np.exp(log_rho)
+        elif metric_choice == "gradient-uniform":
+            # ρ ∝ |∇field|² (dimension-independent), clipped to
+            # the envelope. The mover's G normalisation then
+            # centres this on whatever cell happens to have the
+            # geomean |∇field|, which is field-dependent (in
+            # contrast to front-following where ρ̄=1 by construction).
+            rho_raw = np.maximum(gmag, 1.0e-30) ** 2
+            rho0.data[:, 0] = np.clip(
+                rho_raw, np.exp(log_rho_min), np.exp(log_rho_max))
+        else:
+            raise ValueError(
+                f"metric_choice must be 'front-following' or "
+                f"'gradient-uniform', got {metric_choice!r}")
+        return rho0.sym[0]
+
+    if mode == "raw":
+        # Raw mode: ρ = |∇field|^power. Skip the percentile
+        # clip + (1+amp·t) wrap. Floor to a small positive so
+        # zero-gradient regions still get ρ > 0 (mover's geom-
+        # mean normaliser doesn't blow up).
+        floor = max(1.0e-12,
+                    float(np.max(gmag_global)) * 1.0e-6)
+        rho0.data[:, 0] = np.maximum(gmag, floor)
+        return rho0.sym[0] ** float(power)
+    if g_hi <= 1.0e-9:
+        rho0.data[:, 0] = 0.0
+    else:
+        rho0.data[:, 0] = np.clip(
+            (gmag - g_lo) / max(g_hi - g_lo, 1.0e-30), 0.0, 1.0)
+    return (1.0 + float(amp) * rho0.sym[0]) ** float(power)
+
+
+# =============================================================================
+# Public node-moving adapter
+# =============================================================================
+def follow_metric(
+    mesh,
+    field,
+    *,
+    refinement: float,
+    coarsening="auto",
+    metric: str = "front-following",
+    skip_threshold: float = 0.9,
+    gradient_smoothing_length=None,
+    polish_max_iters: int = 5,
+    polish_quality_target: float = 0.3,
+    polish_alpha: float = 0.2,
+    method_kwargs: Optional[dict] = None,
+    name: Optional[str] = None,
+    verbose: bool = False,
+) -> bool:
+    r"""Move the mesh's interior nodes so cell sizes follow a target
+    derived from ``|∇field|``.
+
+    Two-knob, cell-size-envelope API for the anisotropic node mover.
+    The user specifies how *fine* the densest cells can get and
+    (optionally) how *coarse* the sparsest can get; the function
+    derives the metric density and invokes the mover.
+
+    Cell-size envelope (approximate)
+    --------------------------------
+
+    The mover's eigenvalue → cell-size map is
+    :math:`h = h_0/\sqrt{\hat\rho}` (after the mover's
+    geometric-mean normalisation :math:`\hat\rho = \rho/G`), so
+    asking for the envelope
+
+    .. math::
+
+        h \;\in\; \bigl[\, h_0/\text{refinement},\;
+                          h_0\cdot\text{coarsening} \,\bigr]
+
+    corresponds to :math:`\hat\rho \in [1/\text{coarsening}^2,
+    \text{refinement}^2]` — note this is **dimension-
+    independent** (the eigenvalue λ has units of 1/length²).
+
+    Validation on a sharp-tanh annulus test problem shows:
+
+    * **Refinement side:** achieved :math:`h_\min` within ~5-10%
+      of :math:`h_0/\text{refinement}` for refinement ∈ [1.5, 3].
+    * **Coarsening side:** achieved :math:`h_\max` typically
+      ~2× the requested :math:`h_0\cdot\text{coarsening}`. The
+      mover's anisotropic cells and iterative deformation map
+      together don't honour the eigenvalue clamp on a per-cell
+      basis as tightly as the refinement side. This is a known
+      feature of the underlying mover, not of the new API.
+
+    The :func:`mesh_metric_mismatch` diagnostic is the right tool
+    for measuring how close the achieved mesh is to the requested
+    metric in practice.
+
+    Metric ansatz
+    -------------
+
+    Each cell's percentile rank :math:`p \in [0,1]` in the global
+    :math:`|\nabla\text{field}|` distribution maps to the
+    log-density via a piecewise-linear function with the break
+    :math:`\rho = 1` at
+
+    .. math::
+
+        p^{\ast} \;=\; \frac{\log\text{refinement}}
+                            {\log(\text{refinement}\cdot
+                                  \text{coarsening})} .
+
+    This break point makes :math:`\mathrm{geomean}(\rho) = 1`
+    by construction, so the mover's :math:`G`-normalisation
+    leaves :math:`\rho` unshifted and the eigenvalue clamps land
+    on the desired envelope. Concretely:
+
+    * "front-following" (default) — log-:math:`\rho` is linear
+      in percentile rank on each side of :math:`p^{\ast}`. Every
+      1% of cells contributes the same log(h) increment. Mild
+      grading; the budget is spread continuously across the
+      gradient distribution.
+    * "gradient-uniform" — :math:`\rho \propto |\nabla\text{field}|^2`,
+      clipped to the envelope. Targets uniform per-cell
+      :math:`\Delta\text{field}` (the natural goal for advection-
+      diffusion accuracy). The clipping makes the achieved
+      grading regress to the front-following profile when the
+      gradient distribution is concentrated.
+
+    Auto coarsening (the budget-conserving default)
+    -----------------------------------------------
+
+    With a fixed node count (no remeshing), refining one cell to
+    :math:`h_0/\text{refinement}` requires growing others by at
+    least
+
+    .. math::
+
+        \text{coarsening} \;=\; \text{refinement}^{\,1/d}
+
+    to absorb the freed cell area. ``coarsening="auto"`` (default)
+    picks exactly this minimum — anything less would mean the
+    mover can't actually deliver the requested refinement.
+    Pass an explicit ``coarsening>auto`` to free up more budget
+    for a smoother transition zone.
+
+    Adapt-on-demand
+    ---------------
+
+    Before invoking the mover, the current mesh is checked against
+    the requested target via
+    :func:`mesh_metric_mismatch`. If the alignment is already good
+    (misalignment below ``skip_threshold``), the mesh isn't
+    re-adapted — the function returns ``False`` and the caller can
+    keep stepping. This lets a per-step adapt cadence become
+    "adapt only when needed".
+
+    Parameters
+    ----------
+    mesh : underworld3 mesh
+        Modified in place if adaptation runs.
+    field : MeshVariable or sympy scalar expression
+        The field whose gradient drives refinement.
+    refinement : float, must be >= 1.0
+        Maximum local refinement, expressed as a multiplicative
+        factor on the background cell size:
+        :math:`h_\min = h_0 / \text{refinement}`. ``refinement=1``
+        is a no-op (uniform metric ⇒ background spacing).
+    coarsening : float or "auto", default "auto"
+        Maximum local coarsening,
+        :math:`h_\max = h_0 \cdot \text{coarsening}`. ``"auto"``
+        uses the budget-conserving minimum
+        :math:`\text{refinement}^{1/d}`. Larger values free more
+        budget for smoother grading at the cost of a wider
+        cell-size spread.
+    metric : {"front-following", "gradient-uniform"}, default "front-following"
+        Strategic equidistribution rule. ``"front-following"``
+        concentrates cells where the gradient is steepest (mild
+        grading). ``"gradient-uniform"`` aims for the same
+        per-cell field change everywhere (best for advection-
+        diffusion accuracy).
+    skip_threshold : float, default 0.9
+        Alignment threshold for the adapt-on-demand skip. If the
+        existing mesh's :func:`mesh_metric_mismatch` alignment is
+        ≥ this threshold, no adaptation happens and the function
+        returns ``False``.
+    gradient_smoothing_length : float or Pint Quantity, optional
+        Length scale for screened-Poisson smoothing of the
+        projected ``|∇field|`` before building the metric.
+        Suppresses sub-cell metric-mesh feedback noise without
+        destroying boundary-layer features. A useful default is
+        ``≈ 2 * h_0`` (background cell size).
+    polish_max_iters : int, default 5
+        Maximum Jacobi (graph-Laplacian) polish iterations
+        applied AFTER the anisotropic mover. The polish runs
+        adaptively: each iteration averages every interior
+        vertex toward the mean of its edge neighbours
+        (cell-quality cleanup), and the loop stops as soon as
+        the worst cell-shape quality exceeds
+        ``polish_quality_target``. ``polish_max_iters=0``
+        disables the polish entirely.
+    polish_quality_target : float, default 0.3
+        Adaptive-polish stopping criterion: target minimum
+        cell shape quality
+        :math:`q = 4\sqrt{3}\,A/(e_0^2+e_1^2+e_2^2)`. ``q=1``
+        is equilateral; ``q<0.3`` is the threshold below which
+        cells look like visible slivers. Lower values allow
+        more sliver-y cells through; higher values demand
+        more polish iterations.
+    polish_alpha : float, default 0.2
+        Under-relaxation in ``(0, 1]`` for each Jacobi
+        sweep. Lower = gentler.
+    name : str, optional
+        Cache disambiguator. Pass distinct names if you build
+        several independent metrics on the same mesh.
+    verbose : bool, default False
+        Verbose mover diagnostics.
+
+    Returns
+    -------
+    bool
+        ``True`` if the mesh was moved; ``False`` if the
+        skip-on-mismatch check short-circuited adaptation.
+
+    Examples
+    --------
+    Default usage on a stagnant-lid convection T field, with
+    coarsening picked automatically::
+
+        moved = uw.meshing.follow_metric(
+            mesh, T,
+            refinement=3.0,                  # h_min = h0/3
+        )                                    # coarsening = √3 ≈ 1.73 (2D auto)
+
+    Wider grading transition with explicit coarsening, gradient-
+    side smoothing, and the gradient-uniform rule for advection
+    accuracy::
+
+        uw.meshing.follow_metric(
+            mesh, T,
+            refinement=2.0, coarsening=2.0,
+            metric="gradient-uniform",
+            gradient_smoothing_length=2.0 * mesh._radii.mean(),
+        )
+
+    See Also
+    --------
+    metric_density_from_gradient : The underlying metric builder
+        (expert tool — exposes percentile / amp / power dials).
+    smooth_mesh_interior : The underlying mover (expert tool —
+        unaware of refinement/coarsening, takes a pre-built
+        metric expression).
+    mesh_metric_mismatch : The alignment / misalignment metric
+        used by the skip threshold.
+    """
+    rho = metric_density_from_gradient(
+        mesh,
+        field,
+        refinement=float(refinement),
+        coarsening=coarsening,
+        metric_choice=metric,
+        gradient_smoothing_length=gradient_smoothing_length,
+        name=name,
+    )
+    # Resolve auto coarsening
+    if coarsening is None or coarsening == "auto":
+        coar_val = float(refinement) ** (1.0 / mesh.cdim)
+    else:
+        coar_val = float(coarsening)
+    # Mover's `resolution_ratio` is a SYMMETRIC eigenvalue clamp
+    # (h ∈ [h0/R, h0·R]) — too loose for either side on its own.
+    # We pass R = max(refinement, coarsening) so the clamp doesn't
+    # bind tightly, then rely on the per-cell *rest-size spring*
+    # (below) to enforce the literal cell-size envelope.
+    R = max(float(refinement), coar_val)
+
+    # The spring caps refer to h0 — the **undeformed** mean edge
+    # length of the mesh. Critical: this must be captured ONCE
+    # (the first time follow_metric sees this mesh) and reused
+    # thereafter. Re-measuring it from a deformed (already-
+    # refined) mesh causes h0 to shrink each call, the spring
+    # caps to shrink with it, and refinement to compound at
+    # every adapt — the dt-crash bug surfaced 2026-05-22.
+    _key = id(mesh)
+    h0 = _FOLLOW_METRIC_H0_CACHE.get(_key)
+    rest_coords = _FOLLOW_METRIC_REST_CACHE.get(_key)
+    if h0 is None:
+        ep = _edge_pairs(mesh.dm)
+        coords = np.asarray(mesh.X.coords)
+        if ep.shape[0]:
+            h0 = float(np.linalg.norm(
+                coords[ep[:, 1]] - coords[ep[:, 0]],
+                axis=1).mean())
+        else:
+            h0 = 1.0
+        if uw.mpi.size > 1:
+            h0 = uw.mpi.comm.allreduce(h0) / uw.mpi.size
+        _FOLLOW_METRIC_H0_CACHE[_key] = h0
+        rest_coords = coords.copy()
+        _FOLLOW_METRIC_REST_CACHE[_key] = rest_coords
+        if verbose:
+            uw.pprint(f"  follow_metric: captured h0={h0:.4e}, "
+                      f"rest_coords (first call on this mesh)")
+
+    mover_kwargs = dict(
+        relax=0.2,
+        n_outer=12,
+        # Per-cell Lagrangian rest-size spring: literal cell-size
+        # cap enforced by pulling vertices back toward their
+        # rest positions when an incident cell exceeds the cap.
+        # h0 is the undeformed mean edge length.
+        rest_size_cap_max=h0 * coar_val,
+        rest_size_cap_min=h0 / float(refinement),
+        rest_spring_K=1.0,
+        # Override the mover's internal h0 measurement (which
+        # would otherwise re-measure on the already-deformed
+        # mesh and shrink each adapt — the second leg of the
+        # dt-crash bug surfaced 2026-05-22).
+        h0_override=h0,
+        # Override the spring's rest-coords (and the area-floor
+        # baseline) so they refer to the **truly-undeformed**
+        # mesh. Otherwise each adapt's "rest" is the previous
+        # adapt's output, the spring "preserves" each successive
+        # refinement, and refinement compounds — third leg of
+        # the dt-crash bug.
+        rest_coords_override=rest_coords,
+    )
+    if method_kwargs:
+        mover_kwargs.update(method_kwargs)
+
+    old_X = np.asarray(mesh.X.coords).copy()
+    smooth_mesh_interior(
+        mesh,
+        metric=rho,
+        method="anisotropic",
+        method_kwargs={**mover_kwargs, "resolution_ratio": R},
+        skip_threshold=skip_threshold,
+        verbose=verbose,
+    )
+    new_X = np.asarray(mesh.X.coords)
+    moved = not np.allclose(new_X, old_X)
+    # ADAPTIVE Jacobi polish: gentle graph-Laplacian smoothing
+    # of interior nodes toward neighbour-centroid average,
+    # repeated until the worst cell-shape quality
+    #
+    #     q = 4√3 · A / (e₀² + e₁² + e₂²)
+    #
+    # exceeds ``polish_quality_target`` (default 0.3 — the
+    # threshold below which cells look like visible slivers; an
+    # equilateral has q=1, a degenerate sliver q→0). Capped at
+    # ``polish_max_iters`` so pathological cases can't run away.
+    #
+    # The polish doesn't significantly undo the metric
+    # distribution (each step is averaging toward neighbours,
+    # not enforcing any spatial target), so the BL refinement
+    # stays intact while sliver cells get rounded out.
+    # `polish_max_iters=0` disables entirely.
+    if moved and polish_max_iters > 0:
+        tris_polish = _tri_cells(mesh.dm)
+        for _polish_iter in range(int(polish_max_iters)):
+            # Check current shape quality
+            p = np.asarray(mesh.X.coords)[tris_polish]
+            e0 = np.linalg.norm(p[:, 1] - p[:, 0], axis=1)
+            e1 = np.linalg.norm(p[:, 2] - p[:, 1], axis=1)
+            e2 = np.linalg.norm(p[:, 0] - p[:, 2], axis=1)
+            A = np.abs(_signed_areas(np.asarray(mesh.X.coords),
+                                       tris_polish))
+            q = (4.0 * np.sqrt(3.0) * A
+                 / (e0 * e0 + e1 * e1 + e2 * e2 + 1.0e-30))
+            q_min = float(q.min())
+            if uw.mpi.size > 1:
+                from mpi4py import MPI as _MPI
+                q_min = uw.mpi.comm.allreduce(
+                    q_min, op=_MPI.MIN)
+            if verbose:
+                uw.pprint(
+                    f"  follow_metric polish iter {_polish_iter}: "
+                    f"q_min={q_min:.3f} (target {polish_quality_target:.2f})")
+            if q_min >= float(polish_quality_target):
+                break
+            smooth_mesh_interior(
+                mesh, n_iters=1, alpha=float(polish_alpha))
+    return moved
+
diff --git a/src/underworld3/systems/solvers.py b/src/underworld3/systems/solvers.py
index 625cc4e9..4e2cf217 100644
--- a/src/underworld3/systems/solvers.py
+++ b/src/underworld3/systems/solvers.py
@@ -1948,15 +1948,75 @@ class SNES_Projection(SNES_Scalar):
     well-defined at mesh nodes (e.g., derivatives or flux components). More
     broadly, it is a projection from one basis to another.
 
-    The projection is implemented by solving:
+    Strong form (the screened-Poisson smoother)
+    -------------------------------------------
+
+    The projection is implemented by solving
 
     .. math::
 
-        -\nabla \cdot \underbrace{\left[ \alpha \nabla u \right]}_{\mathbf{F}}
-        - \underbrace{\left[ u - \tilde{f} \right]}_{\mathbf{h}} = 0
+        u - \nabla \cdot \left( \alpha \nabla u \right) = \tilde{f},
+
+    or equivalently, in the more familiar Helmholtz form
+
+    .. math::
 
-    The term :math:`\mathbf{F}` provides optional smoothing regularization.
-    Setting :math:`\alpha = 0` gives a pure L2 projection.
+        u - \alpha \, \nabla^{2} u \;=\; \tilde{f}.
+
+    With :math:`\alpha = 0` this is a pure pointwise L2 projection of
+    :math:`\tilde f` onto the discrete space of :math:`u`. With
+    :math:`\alpha > 0` it is a *screened-Poisson smoother*: the equation
+    enforces a balance between fidelity to :math:`\tilde f` and curvature of
+    :math:`u`. The natural length scale that emerges from this balance is
+
+    .. math::
+
+        L \;=\; \sqrt{\alpha},
+
+    so :math:`\alpha` has dimensions of **length squared**. The free-space
+    Green's function of the operator decays as
+    :math:`\exp(-r/L)` (in 2D it is
+    :math:`G(r) \propto K_{0}(r/L)/L^{2}`), so the solution behaves like a
+    Gaussian-like convolution of :math:`\tilde f` of width :math:`L` —
+    obtained implicitly by one elliptic solve, without ever assembling
+    the kernel. Features in :math:`\tilde f` of scale much smaller than
+    :math:`L` are attenuated; features much larger than :math:`L` pass
+    through essentially unchanged.
+
+    Weak form
+    ---------
+
+    Multiplying by a test function :math:`v` and integrating by parts gives
+    the symmetric weak form actually assembled by PETSc:
+
+    .. math::
+
+        \int_\Omega (u - \tilde f)\, v \; + \;
+        \int_\Omega \alpha \, \nabla u \cdot \nabla v
+        \;=\; 0,
+
+    which is exactly minimising
+    :math:`\tfrac{1}{2}\!\int (u-\tilde f)^2 + \tfrac{\alpha}{2}\!\int
+    |\nabla u|^2` — a Tikhonov-regularised L2 projection.
+
+    Setting the smoothing length
+    ----------------------------
+
+    Two equivalent accessors are provided:
+
+    * :attr:`smoothing` — set the coefficient :math:`\alpha` directly
+      (units of length²). Historically used with tiny values (e.g.
+      :math:`10^{-6}`) as a *numerical* regulariser, which corresponds to
+      a sub-grid :math:`L` and produces no physical smoothing.
+    * :attr:`smoothing_length` — set :math:`L` directly (length units,
+      unit-aware via :func:`underworld3.non_dimensionalise`). This is the
+      recommended path when you actually want the projection to act as
+      a low-pass filter of a chosen physical scale.
+
+    See Also
+    --------
+    SNES_Vector_Projection : Vector field projection.
+    SNES_Tensor_Projection : Tensor field projection.
 
     Parameters
     ----------
@@ -1972,11 +2032,6 @@ class SNES_Projection(SNES_Scalar):
         Name for the solver instance.
     verbose : bool, default=False
         Enable verbose output.
-
-    See Also
-    --------
-    SNES_Vector_Projection : Vector field projection.
-    SNES_Tensor_Projection : Tensor field projection.
     """
 
     @timing.routine_timer_decorator
@@ -2033,7 +2088,40 @@ def __init__(
 
     @property
     def smoothing(self):
-        """Smoothing regularization parameter for the projection."""
+        r"""Smoothing coefficient :math:`\alpha` of the screened-Poisson form.
+
+        The projection solves
+        :math:`u - \nabla\!\cdot\!(\alpha\,\nabla u) = \tilde f`, so
+        :math:`\alpha` has dimensions of **length²** and the implied
+        smoothing length is :math:`L = \sqrt{\alpha}`. The free-space
+        Green's function decays as :math:`\exp(-r/L)`, giving the
+        projection the action of a Gaussian-like convolution of width
+        :math:`L` — without ever forming the kernel.
+
+        See the class docstring for the full derivation.
+
+        Two usage patterns
+        ~~~~~~~~~~~~~~~~~~
+
+        * **Pure L2 projection** — ``smoothing = 0`` (or omit). No
+          regularisation; ``u`` is the best L2 fit to ``ũ``.
+        * **Genuine length-scale smoother** — set
+          :attr:`smoothing_length` to the desired physical length
+          (unit-aware), or equivalently
+          ``smoothing = L**2``. The output is ``ũ`` smoothed at
+          scale ``L``.
+
+        .. note::
+
+           Historical UW3 code occasionally sets ``smoothing`` to a
+           tiny number (e.g. ``1e-6``) as a *numerical* regulariser
+           against rank deficiency. That value corresponds to
+           :math:`L \approx 10^{-3}` in the problem's ND length
+           units — almost always well below the cell size, so it
+           does no useful smoothing. Use a true sub-grid value only
+           if you need that regularisation; for filtering, use
+           :attr:`smoothing_length`.
+        """
         return self._smoothing
 
     @smoothing.setter
@@ -2042,6 +2130,98 @@ def smoothing(self, smoothing_factor):
         self._needs_function_rewire = True
         self._smoothing = sympify(smoothing_factor)
 
+    @property
+    def smoothing_length(self):
+        r"""Smoothing length scale :math:`L` (length units, **unit-aware**).
+
+        L-valued view on :attr:`smoothing` with the convention
+        :math:`L = \sqrt{\alpha}`. Setting ``smoothing_length = L``
+        is equivalent to setting ``smoothing = L**2``, but is the
+        natural physical knob because :math:`L` is what the
+        smoother actually filters by.
+
+        Mathematical meaning
+        --------------------
+
+        The projection then solves the screened-Poisson equation
+
+        .. math::
+
+            u \;-\; L^{2}\,\nabla^{2} u \;=\; \tilde f,
+
+        whose Green's function decays as :math:`\exp(-r/L)`. In
+        practice this acts like a Gaussian convolution of width
+        :math:`L`:
+
+        * features in :math:`\tilde f` with spatial scale
+          :math:`\ll L` are damped (roughly as
+          :math:`1/(1+k^{2}L^{2})` for a wavenumber :math:`k`);
+        * features with scale :math:`\gg L` are preserved;
+        * features at :math:`\sim L` are attenuated by a factor
+          near ``1/2``.
+
+        Choosing :math:`L` smaller than the local mesh size has
+        essentially no effect (the field is already band-limited
+        by the discretisation). A useful default for *light*
+        de-noising is :math:`L \approx 1\!-\!2\,h`, where
+        :math:`h` is a representative cell size.
+
+        Units
+        -----
+
+        The setter accepts a plain number (assumed already
+        non-dimensional), a pint ``Quantity`` with length units,
+        or any unit-aware object understood by
+        :func:`underworld3.non_dimensionalise`. Internally the
+        squared non-dimensional value is stored in
+        ``self._smoothing`` (so ``smoothing`` and
+        ``smoothing_length`` stay consistent).
+
+        The getter returns a Pint ``Quantity`` with length units
+        when a scaling context is configured; otherwise the plain
+        non-dimensional float :math:`\sqrt{\alpha}`.
+        """
+        import sympy
+        s = self._smoothing
+        try:
+            sval = float(s)
+        except (TypeError, ValueError):
+            return sympy.sqrt(s)
+        if sval < 0:
+            return None
+        L_nd = sval ** 0.5
+        # Re-dimensionalise to length units if a scaling context
+        # is set; fall back to the plain ND float otherwise.
+        try:
+            return uw.scaling.dimensionalise(
+                L_nd, uw.scaling.units.meter)
+        except Exception:
+            return L_nd
+
+    @smoothing_length.setter
+    def smoothing_length(self, L):
+        """Set the smoothing length scale.
+
+        Accepts a Pint Quantity (with length units), a UnitAware
+        scalar, or a plain non-dimensional number. The value is
+        non-dimensionalised through the active scaling context
+        before being squared and stored as ``self._smoothing``.
+        """
+        self._needs_function_rewire = True
+        # Unit-aware: route through non_dimensionalise so the
+        # caller can pass `2.0 * uw.scaling.units.meter` or a
+        # plain float interchangeably.
+        try:
+            L_nd = uw.non_dimensionalise(L)
+        except Exception:
+            # Fall back to magnitude-or-float coercion if the
+            # value doesn't carry/expect units.
+            if hasattr(L, "magnitude"):
+                L_nd = L.magnitude
+            else:
+                L_nd = L
+        self._smoothing = sympify(L_nd) ** 2
+
     @property
     def uw_weighting_function(self):
         """Weighting function applied during projection."""
@@ -2068,23 +2248,39 @@ class SNES_Vector_Projection(SNES_Vector):
     r"""
     Vector projection solver for mapping vector functions to mesh variables.
 
-    Solves :math:`\mathbf{u} = \tilde{\mathbf{f}}` where :math:`\tilde{\mathbf{f}}`
-    is a vector function that can be evaluated within an element and
-    :math:`\mathbf{u}` is a vector mesh variable with associated shape functions.
+    Solves :math:`\mathbf{u} = \tilde{\mathbf{f}}` where
+    :math:`\tilde{\mathbf{f}}` is a vector function that can be evaluated
+    within an element and :math:`\mathbf{u}` is a vector mesh variable
+    with associated shape functions.
 
-    Typically used to obtain a continuous representation of a vector function
-    not well-defined at mesh nodes (e.g., gradient or flux vectors).
+    Typically used to obtain a continuous representation of a vector
+    function not well-defined at mesh nodes (e.g., gradient or flux
+    vectors), or as a length-scale-aware smoother of an existing vector
+    field.
 
-    The projection is implemented by solving:
+    Strong form (screened-Poisson, vector-valued)
+    ---------------------------------------------
 
     .. math::
 
-        -\nabla \cdot \underbrace{\left[ \alpha \nabla \mathbf{u}
-        \right]}_{\mathbf{F}} - \underbrace{\left[ \mathbf{u}
-        - \tilde{\mathbf{f}} \right]}_{\mathbf{h}} = 0
-
-    The term :math:`\mathbf{F}` provides optional smoothing regularization.
-    Setting :math:`\alpha = 0` gives a pure L2 projection.
+        \mathbf{u} \;-\; \nabla \cdot \left( \alpha\, \nabla \mathbf{u}
+        \right)
+        \;+\; \lambda \left( \nabla \cdot \mathbf{u} \right) \mathbf{I}
+        \;=\; \tilde{\mathbf{f}} .
+
+    The :math:`\alpha`-term is the same screened-Poisson smoother as
+    in :class:`SNES_Projection`, applied component-wise: it has the same
+    :math:`L = \sqrt{\alpha}` smoothing-length interpretation and the
+    same :math:`\exp(-r/L)` Green's function. The extra :math:`\lambda`
+    term is a divergence penalty (see :attr:`penalty`) — set it nonzero
+    when you want an approximately solenoidal projection of
+    :math:`\tilde{\mathbf{f}}`.
+
+    Setting :math:`\alpha = 0` (and :math:`\lambda = 0`) gives a pure
+    pointwise L2 projection. See :class:`SNES_Projection` for the full
+    mathematical context; the relationship between :attr:`smoothing`
+    (units length²) and :attr:`smoothing_length` (units length) is the
+    same as for the scalar projection.
 
     Parameters
     ----------
@@ -2099,7 +2295,7 @@ class SNES_Vector_Projection(SNES_Vector):
 
     See Also
     --------
-    SNES_Projection : Scalar field projection.
+    SNES_Projection : Scalar field projection (full mathematical detail).
     SNES_Tensor_Projection : Tensor field projection.
     """
 
@@ -2169,7 +2365,18 @@ def projection_problem_description(self):
 
     @property
     def smoothing(self):
-        """Smoothing regularization parameter for the projection."""
+        r"""Smoothing coefficient :math:`\alpha` (units **length²**).
+
+        Coefficient of the :math:`\nabla\!\cdot\!(\alpha\,\nabla
+        \mathbf u)` term in the vector screened-Poisson equation
+        (see the class docstring). Acts component-wise; the
+        smoothing length is :math:`L = \sqrt{\alpha}` and the
+        Green's function decays as :math:`\exp(-r/L)`.
+
+        Use :attr:`smoothing_length` for the L-valued, unit-aware
+        knob. See :attr:`SNES_Projection.smoothing` for the full
+        derivation and usage patterns.
+        """
         return self._smoothing
 
     @smoothing.setter
@@ -2178,9 +2385,64 @@ def smoothing(self, smoothing_factor):
         self._needs_function_rewire = True
         self._smoothing = sympify(smoothing_factor)
 
+    @property
+    def smoothing_length(self):
+        r"""Smoothing length :math:`L` (length units, **unit-aware**).
+
+        L-valued view on :attr:`smoothing`, with
+        :math:`L = \sqrt{\alpha}`. The projection then acts as a
+        component-wise screened-Poisson smoother
+        :math:`\mathbf u - L^{2}\,\nabla^{2}\mathbf u =
+        \tilde{\mathbf f}`, i.e. a Gaussian-like convolution of
+        width :math:`L` applied to each Cartesian component of
+        the input vector field.
+
+        Choose :math:`L \gtrsim h` (a cell size) for noticeable
+        smoothing; :math:`L < h` does essentially nothing because
+        the discretisation already band-limits at that scale.
+        See :attr:`SNES_Projection.smoothing_length` for the full
+        mathematical and units discussion.
+        """
+        import sympy
+        s = self._smoothing
+        try:
+            sval = float(s)
+        except (TypeError, ValueError):
+            return sympy.sqrt(s)
+        if sval < 0:
+            return None
+        L_nd = sval ** 0.5
+        try:
+            return uw.scaling.dimensionalise(
+                L_nd, uw.scaling.units.meter)
+        except Exception:
+            return L_nd
+
+    @smoothing_length.setter
+    def smoothing_length(self, L):
+        """Set the smoothing length scale (unit-aware)."""
+        self._needs_function_rewire = True
+        try:
+            L_nd = uw.non_dimensionalise(L)
+        except Exception:
+            if hasattr(L, "magnitude"):
+                L_nd = L.magnitude
+            else:
+                L_nd = L
+        self._smoothing = sympify(L_nd) ** 2
+
     @property
     def penalty(self):
-        """Divergence penalty parameter for incompressibility."""
+        r"""Divergence penalty :math:`\lambda` for (approx.) incompressibility.
+
+        Coefficient of the :math:`\lambda (\nabla\!\cdot\!\mathbf u)
+        \mathbf I` term in the vector projection. Large positive
+        values bias the projection toward
+        :math:`\nabla\!\cdot\!\mathbf u = 0`, i.e. a solenoidal
+        approximation of :math:`\tilde{\mathbf f}`. Has no length
+        interpretation — unlike :attr:`smoothing` it does not
+        introduce a filter scale.
+        """
         return self._penalty
 
     @penalty.setter
@@ -2206,23 +2468,33 @@ class SNES_Tensor_Projection(SNES_Projection):
     r"""
     Tensor projection solver for mapping tensor functions to mesh variables.
 
-    Solves :math:`\mathbf{u} = \tilde{\mathbf{f}}` where :math:`\tilde{\mathbf{f}}`
-    is a tensor-valued function that can be evaluated within an element and
-    :math:`\mathbf{u}` is a tensor mesh variable with associated shape functions.
+    Solves :math:`\mathbf{u} = \tilde{\mathbf{f}}` where
+    :math:`\tilde{\mathbf{f}}` is a tensor-valued function that can be
+    evaluated within an element and :math:`\mathbf{u}` is a tensor mesh
+    variable with associated shape functions.
+
+    Typically used to obtain a continuous representation of a tensor
+    function not well-defined at mesh nodes (e.g., stress or strain
+    tensors), with optional length-scale smoothing.
 
-    Typically used to obtain a continuous representation of a tensor function
-    not well-defined at mesh nodes (e.g., stress or strain tensors).
+    Strong form (screened-Poisson, applied component-wise)
+    ------------------------------------------------------
 
-    The projection is implemented by solving:
+    Internally the solve is decomposed into scalar sub-problems, one per
+    tensor component :math:`u_{ij}`; each sub-problem is
 
     .. math::
 
-        -\nabla \cdot \underbrace{\left[ \alpha \nabla \mathbf{u}
-        \right]}_{\mathbf{F}} - \underbrace{\left[ \mathbf{u}
-        - \tilde{\mathbf{f}} \right]}_{\mathbf{h}} = 0
+        u_{ij} \;-\; \nabla \cdot \left( \alpha\, \nabla u_{ij}
+        \right) \;=\; \tilde{f}_{ij},
 
-    The term :math:`\mathbf{F}` provides optional smoothing regularization.
-    Setting :math:`\alpha = 0` gives a pure L2 projection.
+    identical to :class:`SNES_Projection`. The smoothing length is
+    :math:`L = \sqrt{\alpha}` and the Green's function decays as
+    :math:`\exp(-r/L)`; setting :math:`\alpha = 0` gives a pure
+    pointwise L2 projection. See :class:`SNES_Projection` for the full
+    derivation, the choice between :attr:`smoothing` (length²) and
+    :attr:`smoothing_length` (length, unit-aware), and guidance on
+    picking :math:`L` relative to the cell size.
 
     Parameters
     ----------
@@ -2240,11 +2512,13 @@ class SNES_Tensor_Projection(SNES_Projection):
     Notes
     -----
     Currently implemented component-wise as there is no native solver
-    for tensor unknowns.
+    for tensor unknowns. Each component sees the same :math:`\alpha`,
+    so the effective smoothing length :math:`L` is uniform across the
+    tensor entries.
 
     See Also
     --------
-    SNES_Projection : Scalar field projection.
+    SNES_Projection : Scalar field projection (full mathematical detail).
     SNES_Vector_Projection : Vector field projection.
     """
 
@@ -2360,16 +2634,27 @@ class SNES_MultiComponent_Projection(SNES_MultiComponent):
     :class:`SNES_Tensor_Projection`, which tears down and rebuilds the
     PETSc DM on every inner iteration.
 
-    The projection is block-diagonal across components: each component
-    satisfies the scalar problem
+    Strong form (block-diagonal screened Poisson)
+    ---------------------------------------------
+
+    There is no cross-component coupling, so each component
+    :math:`u_k,\ k=1,\dots,N_c` satisfies the same scalar
+    screened-Poisson equation as :class:`SNES_Projection`,
 
     .. math::
 
-        -\nabla \cdot \left[ \alpha \nabla u_k \right]
-        - \left[ u_k - \tilde f_k \right] = 0
+        u_k \;-\; \nabla \cdot \left( \alpha\, \nabla u_k \right)
+        \;=\; \tilde f_k.
 
-    with no cross-component coupling. Setting :math:`\alpha = 0` gives a
-    pure L2 projection per component.
+    Setting :math:`\alpha = 0` gives a pure pointwise L2 projection per
+    component. The smoothing length is :math:`L = \sqrt{\alpha}` and the
+    Green's function decays as :math:`\exp(-r/L)` — the same Gaussian-like
+    convolution interpretation as the scalar case. All components share a
+    single :math:`\alpha`, so the smoothing scale :math:`L` is uniform
+    across the multi-component target. Use :attr:`smoothing` to set
+    :math:`\alpha` (units length²) or :attr:`smoothing_length` for the
+    L-valued, unit-aware knob. See :class:`SNES_Projection` for the full
+    derivation and guidance.
 
     Parameters
     ----------
@@ -2384,7 +2669,7 @@ class SNES_MultiComponent_Projection(SNES_MultiComponent):
 
     See Also
     --------
-    SNES_Projection : Scalar projection (Nc=1).
+    SNES_Projection : Scalar projection (Nc=1) — full mathematical detail.
     SNES_Tensor_Projection : Legacy per-component cycling projector.
     """
 
@@ -2430,7 +2715,18 @@ def __init__(
 
     @property
     def smoothing(self):
-        """Smoothing regularisation parameter."""
+        r"""Smoothing coefficient :math:`\alpha` (units **length²**).
+
+        Coefficient of the :math:`\nabla\!\cdot\!(\alpha\,\nabla
+        u_k)` term in each component's screened-Poisson sub-problem
+        (see the class docstring). One :math:`\alpha` is shared
+        across all :math:`N_c` components, so the implied smoothing
+        length :math:`L = \sqrt{\alpha}` is uniform.
+
+        Use :attr:`smoothing_length` for the L-valued, unit-aware
+        knob. See :attr:`SNES_Projection.smoothing` for the full
+        derivation and the Gaussian-like convolution interpretation.
+        """
         return self._smoothing
 
     @smoothing.setter
@@ -2438,6 +2734,46 @@ def smoothing(self, value):
         self._needs_function_rewire = True
         self._smoothing = sympify(value)
 
+    @property
+    def smoothing_length(self):
+        r"""Smoothing length :math:`L` (length units, **unit-aware**).
+
+        L-valued view on :attr:`smoothing`, with
+        :math:`L = \sqrt{\alpha}`. Each component then satisfies
+        :math:`u_k - L^{2}\,\nabla^{2} u_k = \tilde f_k`, i.e. a
+        Gaussian-like convolution of width :math:`L` applied
+        independently to each component of the multi-component
+        target. See :attr:`SNES_Projection.smoothing_length` for the
+        full mathematical and units discussion.
+        """
+        import sympy
+        s = self._smoothing
+        try:
+            sval = float(s)
+        except (TypeError, ValueError):
+            return sympy.sqrt(s)
+        if sval < 0:
+            return None
+        L_nd = sval ** 0.5
+        try:
+            return uw.scaling.dimensionalise(
+                L_nd, uw.scaling.units.meter)
+        except Exception:
+            return L_nd
+
+    @smoothing_length.setter
+    def smoothing_length(self, L):
+        """Set the smoothing length scale (unit-aware)."""
+        self._needs_function_rewire = True
+        try:
+            L_nd = uw.non_dimensionalise(L)
+        except Exception:
+            if hasattr(L, "magnitude"):
+                L_nd = L.magnitude
+            else:
+                L_nd = L
+        self._smoothing = sympify(L_nd) ** 2
+
     @property
     def uw_weighting_function(self):
         """Weighting function applied during projection."""
@@ -2829,7 +3165,7 @@ def delta_t(self, value):
         self._delta_t.sym = value
 
     @timing.routine_timer_decorator
-    def estimate_dt(self):
+    def estimate_dt(self, direction_aware: bool = False):
         r"""
         Estimate an appropriate timestep for the advection-diffusion solver.
 
@@ -2839,6 +3175,23 @@ def estimate_dt(self):
         - :math:`\delta t_{\textrm{diff}}`: typical time for diffusion across an element
         - :math:`\delta t_{\textrm{adv}}`: typical element-crossing time for a fluid parcel
 
+        Parameters
+        ----------
+        direction_aware : bool, default False
+            If True, the advective dt uses the per-cell extent
+            *along the local velocity direction* — `h_eff_c =
+            max_i(s_i) - min_i(s_i)` where `s_i = (x_i -
+            centroid) · v̂` over the cell vertices. This is the
+            distance material actually traverses through the cell
+            per unit |v|, and is **always ≥ the isotropic
+            mesh._radii estimate**, by 1.5–3× for equant cells
+            (geometric factor) and up to ~10× for cells that the
+            mover has stretched along the flow direction. On
+            adapted meshes the gain is substantial; on uniform
+            meshes it's the geometric factor only. Off by
+            default to preserve historical behaviour; safe to
+            enable everywhere once validated.
+
         Returns
         -------
         pint.Quantity or float
@@ -2929,11 +3282,40 @@ def estimate_dt(self):
         else:
             min_dt_diff_local = np.inf
 
-        # Per-element advective timestep
+        # Per-element advective timestep — either isotropic
+        # (mesh._radii / |v|) or direction-aware (v-aligned cell
+        # extent / |v|).
+        if direction_aware:
+            # Per-cell vertex indices (triangle / tet).
+            from underworld3.meshing.smoothing import _tri_cells
+            tris = _tri_cells(self.mesh.dm)
+            if tris is None:
+                # Fall back to isotropic for non-triangle meshes.
+                h_per_element = element_radii
+            else:
+                coords = np.asarray(self.mesh.X.coords)
+                centroids = coords[tris].mean(axis=1)
+                # v-hat per cell (use centroid v we already have)
+                vhat = np.where(
+                    vel_magnitudes[:, None] > 0,
+                    vel / np.maximum(vel_magnitudes[:, None],
+                                      1.0e-30),
+                    0.0)
+                D = coords[tris] - centroids[:, None, :]
+                # Signed projections along v̂ per cell vertex
+                s = np.einsum('cvd,cd->cv', D, vhat)
+                h_per_element = s.max(axis=1) - s.min(axis=1)
+                # Sanity-floor — for zero-velocity cells s=0
+                # ⇒ h_eff=0 ⇒ dt_adv=inf via the where below
+                h_per_element = np.maximum(
+                    h_per_element, 0.0)
+        else:
+            h_per_element = element_radii
+
         with np.errstate(divide='ignore', invalid='ignore'):
             dt_adv_per_element = np.where(
                 vel_magnitudes > 0,
-                element_radii / vel_magnitudes,
+                h_per_element / vel_magnitudes,
                 np.inf
             )
         min_dt_adv_local = np.min(dt_adv_per_element) if len(dt_adv_per_element) > 0 else np.inf
diff --git a/tests/test_0750_meshing_follow_metric.py b/tests/test_0750_meshing_follow_metric.py
new file mode 100644
index 00000000..90794f56
--- /dev/null
+++ b/tests/test_0750_meshing_follow_metric.py
@@ -0,0 +1,282 @@
+"""Locks the uw.meshing.follow_metric() public API.
+
+Two-knob, cell-size-envelope adapter for the anisotropic node mover.
+The user passes:
+
+    refinement  : factor by which the finest cells are smaller than h0
+    coarsening  : factor by which the coarsest cells are bigger than h0
+                  ("auto" = refinement**(1/d), the budget-conserving
+                  minimum)
+    metric      : "front-following" or "gradient-uniform"
+
+These tests pin the metric construction (refinement=1 is a no-op,
+ρ_min/ρ_max envelope is correct, geomean(ρ) ≈ 1 by construction so
+the mover's G normalisation is bypassed) and end-to-end that the
+mover actually moves the mesh and lands an approximate cell-size
+envelope.
+
+The mover's achieved cell-size envelope is APPROXIMATE — anisotropic
+cells + iterative deformation map mean the eigenvalue clamp doesn't
+literally bound the achieved h_max on a per-cell basis. Validation
+on a sharp-tanh annulus shows:
+
+* refinement side: achieved h_min within ~5-10% of h0/refinement
+* coarsening side: achieved h_max within ~2× of h0·coarsening
+
+The tests here use loose tolerances reflecting that empirical reality.
+"""
+import numpy as np
+import pytest
+import sympy
+
+import underworld3 as uw
+from underworld3.meshing import smoothing as _sm
+
+
+def _build_annulus_with_field():
+    """Return a fresh annulus + a sharp tanh T field that mimics a
+    thermal boundary layer near r=0.7."""
+    m = uw.meshing.Annulus(radiusInner=0.5, radiusOuter=1.0,
+                           cellSize=0.08, qdegree=3)
+    T = uw.discretisation.MeshVariable(
+        "T", m, vtype=uw.VarType.SCALAR, degree=2, continuous=True)
+    proj = uw.systems.Projection(m, T)
+    proj.smoothing = 0.0
+    x, y = m.X
+    proj.uw_function = 0.5 * (
+        1.0 + sympy.tanh(40.0 * (0.7 - sympy.sqrt(x*x + y*y))))
+    proj.solve()
+    return m, T
+
+
+def _cell_h_stats(mesh):
+    """Per-cell mean-edge h, returns (min, max, mean)."""
+    tris = _sm._tri_cells(mesh.dm)
+    p = np.asarray(mesh.X.coords)[tris]
+    h = (np.linalg.norm(p[:, 1] - p[:, 0], axis=1)
+         + np.linalg.norm(p[:, 2] - p[:, 1], axis=1)
+         + np.linalg.norm(p[:, 0] - p[:, 2], axis=1)) / 3.0
+    return float(h.min()), float(h.max()), float(h.mean())
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_refinement_one_is_noop_metric():
+    """refinement=1, coarsening=1: metric is uniformly 1 (no adapt)."""
+    m, T = _build_annulus_with_field()
+    rho = uw.meshing.metric_density_from_gradient(
+        m, T, refinement=1.0, coarsening=1.0, name="check_ref1")
+    rho_vals = np.asarray(
+        uw.function.evaluate(rho, m.X.coords)).reshape(-1)
+    assert np.allclose(rho_vals, 1.0)
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_envelope_in_rho_space_d_independent():
+    """For any (refinement, coarsening), the metric ρ has the
+    envelope [1/coarsening², refinement²] independent of d."""
+    m, T = _build_annulus_with_field()
+    for ref, coar in [(2.0, "auto"), (2.0, 1.5), (3.0, 2.0)]:
+        rho = uw.meshing.metric_density_from_gradient(
+            m, T, refinement=ref, coarsening=coar,
+            name=f"env_{ref}_{coar}")
+        rho_vals = np.asarray(
+            uw.function.evaluate(rho, m.X.coords)).reshape(-1)
+        coar_val = (ref ** 0.5) if coar == "auto" else float(coar)
+        expected_max = ref ** 2
+        expected_min = 1.0 / coar_val ** 2
+        # Tolerance: percentile clipping + FE interpolation noise
+        assert rho_vals.max() == pytest.approx(expected_max, rel=0.05), (
+            f"({ref}, {coar}): ρ_max = {rho_vals.max()}, "
+            f"expected ≈ {expected_max}")
+        assert rho_vals.min() == pytest.approx(expected_min, rel=0.05), (
+            f"({ref}, {coar}): ρ_min = {rho_vals.min()}, "
+            f"expected ≈ {expected_min}")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_geomean_rho_is_unity_front_following():
+    """The piecewise-log-linear ansatz has geomean(ρ) = 1 by
+    construction so the mover's G normalisation passes through."""
+    m, T = _build_annulus_with_field()
+    rho = uw.meshing.metric_density_from_gradient(
+        m, T, refinement=2.0, coarsening="auto",
+        metric_choice="front-following", name="g1")
+    rho_vals = np.asarray(
+        uw.function.evaluate(rho, m.X.coords)).reshape(-1)
+    log_geomean = float(np.mean(np.log(rho_vals)))
+    # Tight: the analytic break-point makes this exactly 0 in the
+    # absence of FE interpolation noise.
+    assert abs(log_geomean) < 0.05, (
+        f"geomean(ρ) log = {log_geomean}, expected ≈ 0")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_auto_coarsening_is_d_th_root_of_refinement():
+    """coarsening='auto' uses the budget-conserving minimum
+    refinement**(1/d) — verified by setting the equivalent
+    explicit coarsening and checking ρ is identical."""
+    m, T = _build_annulus_with_field()
+    d = m.cdim
+    ref = 3.0
+    rho_auto = uw.meshing.metric_density_from_gradient(
+        m, T, refinement=ref, coarsening="auto", name="auto")
+    rho_explicit = uw.meshing.metric_density_from_gradient(
+        m, T, refinement=ref, coarsening=ref ** (1.0/d),
+        name="explicit")
+    v_a = np.asarray(uw.function.evaluate(rho_auto, m.X.coords)).reshape(-1)
+    v_e = np.asarray(uw.function.evaluate(rho_explicit, m.X.coords)).reshape(-1)
+    assert np.allclose(v_a, v_e)
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_invalid_args():
+    """refinement<1 and coarsening<1 must raise."""
+    m, T = _build_annulus_with_field()
+    with pytest.raises(ValueError, match="refinement must be >= 1.0"):
+        uw.meshing.metric_density_from_gradient(
+            m, T, refinement=0.5, name="badref")
+    with pytest.raises(ValueError, match="coarsening must be >= 1.0"):
+        uw.meshing.metric_density_from_gradient(
+            m, T, refinement=2.0, coarsening=0.5, name="badcoar")
+    with pytest.raises(ValueError, match="metric_choice must be"):
+        uw.meshing.metric_density_from_gradient(
+            m, T, refinement=2.0, metric_choice="unknown",
+            name="badmc")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_follow_metric_moves_mesh():
+    """follow_metric returns True when the mesh is actually moved."""
+    m, T = _build_annulus_with_field()
+    old_X = np.asarray(m.X.coords).copy()
+    moved = uw.meshing.follow_metric(
+        m, T, refinement=2.0, skip_threshold=None)
+    new_X = np.asarray(m.X.coords)
+    assert moved is True
+    assert not np.allclose(new_X, old_X)
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_follow_metric_refinement_envelope_approximate():
+    """The per-cell mean-edge minimum stays at or above the
+    requested h_min = h0/refinement.
+
+    The min-edge spring caps the SHORTEST EDGE of any cell at
+    h0/refinement; that forces the cell's MEAN edge ≥ h0/refinement
+    (since mean ≥ min). On most cells this means a slight
+    over-spec (mean-edge slightly larger than asked), which is
+    the right side of the trade — the user asked for caps on
+    *extreme* refinement, accepting mild under-refinement.
+
+    Occasional slivers can still occur on cells with very strong
+    gradients (the spring can't perfectly counteract the metric
+    pull at every cell); the *mean-edge* guarantee is the robust
+    one.
+    """
+    m0, _ = _build_annulus_with_field()
+    _, _, h0 = _cell_h_stats(m0)
+
+    for ref in [1.5, 2.0]:
+        m, T = _build_annulus_with_field()
+        uw.meshing.follow_metric(
+            m, T, refinement=ref, skip_threshold=None)
+        h_min_cell, _, _ = _cell_h_stats(m)
+        target_h_min = h0 / ref
+        # The spring keeps mean-edge min at or close to target.
+        # Allow up to ~25% over-spec (under-refinement, the safe
+        # side) and ~15% under-spec (over-refinement, the unsafe
+        # side).
+        assert h_min_cell / target_h_min > 0.85, (
+            f"refinement={ref}: mean-edge h_min/target = "
+            f"{h_min_cell/target_h_min:.3f}, want > 0.85 "
+            f"(should not over-refine past spec)")
+        assert h_min_cell / target_h_min < 1.30, (
+            f"refinement={ref}: mean-edge h_min/target = "
+            f"{h_min_cell/target_h_min:.3f}, want < 1.30 "
+            f"(should not under-refine far past spec)")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_follow_metric_coarsening_envelope_approximate():
+    """Achieved h_max is within ~30% of h0·coarsening on the
+    sharp-tanh annulus problem.
+
+    Before the per-cell rest-size spring was added to the mover,
+    h_max overshot the spec by 50-130%. The spring restores
+    over-coarsened cells by pulling their nodes back toward the
+    rest positions, tightening the cap. The remaining ~15-30%
+    overshoot reflects the unavoidable trade-off between metric
+    fidelity (which the SNES wants) and cap enforcement (which
+    the spring wants); 30% is a wide enough tolerance that it's
+    not flaky on this fixture problem.
+    """
+    m0, _ = _build_annulus_with_field()
+    _, _, h0 = _cell_h_stats(m0)
+    for ref, coar in [(1.5, "auto"), (2.0, "auto"), (2.0, 2.0),
+                       (3.0, "auto")]:
+        m, T = _build_annulus_with_field()
+        uw.meshing.follow_metric(
+            m, T, refinement=ref, coarsening=coar,
+            skip_threshold=None)
+        _, h_max, _ = _cell_h_stats(m)
+        coar_val = (ref**0.5) if coar == "auto" else float(coar)
+        target_h_max = h0 * coar_val
+        # Allow up to 30% over spec
+        assert h_max / target_h_max < 1.30, (
+            f"ref={ref}, coar={coar}: achieved h_max/target = "
+            f"{h_max/target_h_max:.3f}, want < 1.30")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_follow_metric_no_slivers_after_adaptive_polish():
+    """The adaptive Jacobi polish loop should drive the worst
+    cell-shape quality above `polish_quality_target` (default 0.3)
+    on the sharp-tanh annulus problem. Cells with q < 0.3 are
+    visible slivers; q=1 is equilateral. This locks the contract
+    that follow_metric's built-in polish eliminates the slivers
+    its mover would otherwise leave behind."""
+    import numpy as np
+    for ref in [2.0, 3.0]:
+        m, T = _build_annulus_with_field()
+        uw.meshing.follow_metric(
+            m, T, refinement=ref, skip_threshold=None)
+        tris = _sm._tri_cells(m.dm)
+        p = np.asarray(m.X.coords)[tris]
+        e0 = np.linalg.norm(p[:, 1] - p[:, 0], axis=1)
+        e1 = np.linalg.norm(p[:, 2] - p[:, 1], axis=1)
+        e2 = np.linalg.norm(p[:, 0] - p[:, 2], axis=1)
+        A = np.abs(_sm._signed_areas(np.asarray(m.X.coords), tris))
+        q = 4.0 * np.sqrt(3.0) * A / (e0**2 + e1**2 + e2**2 + 1e-30)
+        # No slivers after polish (well below the equilateral q=1
+        # but well above the visible-sliver threshold q≈0.3).
+        assert q.min() >= 0.3, (
+            f"refinement={ref}: q_min={q.min():.3f} "
+            f"(want ≥ 0.3 — adaptive polish should have run "
+            f"more iterations)")
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_follow_metric_skip_threshold_skips_aligned_mesh():
+    """A mesh that's already aligned (here: any uniform mesh with
+    the field still in the natural shape) gets the skip-on-align
+    short-circuit when skip_threshold is permissive."""
+    m, T = _build_annulus_with_field()
+    # First adapt — mesh moves
+    moved1 = uw.meshing.follow_metric(
+        m, T, refinement=2.0, skip_threshold=None)
+    assert moved1 is True
+    # Second adapt with a relatively tight threshold — should skip
+    # because the mesh is already well-aligned to T.
+    moved2 = uw.meshing.follow_metric(
+        m, T, refinement=2.0, skip_threshold=0.9)
+    assert moved2 is False
diff --git a/tests/test_0760_mesh_ot_adapt.py b/tests/test_0760_mesh_ot_adapt.py
new file mode 100644
index 00000000..3c248ab0
--- /dev/null
+++ b/tests/test_0760_mesh_ot_adapt.py
@@ -0,0 +1,227 @@
+"""Locks the mesh.OT_adapt() public API.
+
+OT_adapt() runs the validated optimal-transport reset adapt as a method
+on the mesh: reset to a cached reference, FE-remap the driving field onto
+the clean canvas, build a gradient metric, run the OT mover, then FE-remap
+fields and zero a cold-restart set. These tests pin:
+
+* the mesh actually moves and the driving field's spatial pattern survives
+  the FE-remap (Annulus and Box);
+* boundary behaviour — radial (Annulus) boundary nodes slide tangentially
+  but stay on the circle; Cartesian (Box) boundaries are pinned (their
+  vertex-evaluated normal is degenerate, so we do not slip with it);
+* the lazy reference-coordinate cache and OT_adapt_reset_reference();
+* skip_threshold short-circuits an already-aligned mesh;
+* constrained-manifold meshes (dim != cdim) raise NotImplementedError.
+
+Behaviour is validated, not bit-for-bit: the unified Gamma_N-based slip
+is a different algorithm from the historical hardcoded box/ring slip.
+"""
+import numpy as np
+import pytest
+
+import underworld3 as uw
+from underworld3.meshing import smoothing as _sm
+
+
+def _annulus_with_field(slope=15.0, r_bl=0.75):
+    """Annulus + a tanh boundary-layer T field at radius r_bl (set
+    directly on the DOFs — no projection solve needed)."""
+    m = uw.meshing.Annulus(radiusInner=0.5, radiusOuter=1.0,
+                           cellSize=0.08, qdegree=3)
+    T = uw.discretisation.MeshVariable(
+        "T", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    feat = lambda c: np.tanh((np.linalg.norm(c, axis=1) - r_bl) * slope)
+    T.data[:, 0] = feat(np.asarray(T.coords))
+    return m, T, feat
+
+
+def _box_with_field(slope=15.0, y_bl=0.5):
+    m = uw.meshing.UnstructuredSimplexBox(
+        minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0),
+        cellSize=0.08, qdegree=3)
+    T = uw.discretisation.MeshVariable(
+        "Tb", m, vtype=uw.VarType.SCALAR, degree=3, continuous=True)
+    feat = lambda c: np.tanh((c[:, 1] - y_bl) * slope)
+    T.data[:, 0] = feat(np.asarray(T.coords))
+    return m, T, feat
+
+
+def _boundary_mask(mesh):
+    from underworld3.meshing.smoothing import (
+        _pinned_mask, _auto_pinned_labels)
+    return _pinned_mask(mesh.dm, tuple(_auto_pinned_labels(mesh)))
+
+
+# ---------------------------------------------------------------------------
+# Annulus: moves, preserves field, boundary stays on the circle
+# ---------------------------------------------------------------------------
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_moves_mesh_annulus():
+    m, T, _ = _annulus_with_field()
+    X0 = np.asarray(m.X.coords).copy()
+    moved = m.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+    assert moved is True
+    X1 = np.asarray(m.X.coords)
+    # interior nodes moved appreciably
+    assert float(np.linalg.norm(X1 - X0, axis=1).max()) > 1.0e-3
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_annulus_boundary_stays_on_circle():
+    m, T, _ = _annulus_with_field()
+    is_bnd = _boundary_mask(m)
+    X0 = np.asarray(m.X.coords).copy()
+    r0 = np.linalg.norm(X0[is_bnd], axis=1)
+    m.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+    X1 = np.asarray(m.X.coords)
+    r1 = np.linalg.norm(X1[is_bnd], axis=1)
+    # boundary nodes slid tangentially ...
+    bnd_disp = float(np.linalg.norm(X1[is_bnd] - X0[is_bnd], axis=1).max())
+    assert bnd_disp > 1.0e-4
+    # ... but stayed on their original radii (radial drift ~ rounding)
+    assert float(np.abs(r1 - r0).max()) < 1.0e-3
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_preserves_field_pattern_annulus():
+    m, T, feat = _annulus_with_field()
+    m.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+    # T at the adapted DOFs should match the analytic feature there
+    err = np.abs(np.asarray(T.data)[:, 0] - feat(np.asarray(T.coords))).max()
+    assert float(err) < 5.0e-2
+
+
+# ---------------------------------------------------------------------------
+# Box: interior moves, Cartesian boundary pinned, field preserved
+# ---------------------------------------------------------------------------
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_box_moves_interior_pins_boundary():
+    m, T, feat = _box_with_field()
+    is_bnd = _boundary_mask(m)
+    X0 = np.asarray(m.X.coords).copy()
+    moved = m.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+    assert moved is True
+    X1 = np.asarray(m.X.coords)
+    # Cartesian boundary is pinned (no slip): boundary nodes do not move
+    assert float(np.linalg.norm(X1[is_bnd] - X0[is_bnd], axis=1).max()) < 1.0e-12
+    # interior is refined
+    assert float(np.linalg.norm(X1[~is_bnd] - X0[~is_bnd], axis=1).max()) > 1.0e-3
+    # field pattern preserved within FE-remap tolerance
+    err = np.abs(np.asarray(T.data)[:, 0] - feat(np.asarray(T.coords))).max()
+    assert float(err) < 5.0e-2
+
+
+# ---------------------------------------------------------------------------
+# Reference-coordinate cache + reset
+# ---------------------------------------------------------------------------
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_reference_cache_and_reset():
+    m, T, _ = _annulus_with_field()
+    X0 = np.asarray(m.X.coords).copy()
+    # first call snapshots the (pristine) coords as the reset reference
+    m.OT_adapt(T, refinement=3.0, fields_to_remap=[T])
+    assert np.allclose(m._ot_adapt_reference_coords, X0)
+    # the mesh itself has moved away from the reference
+    assert not np.allclose(np.asarray(m.X.coords), X0)
+    # reset_reference(None) re-baselines to the current (moved) coords
+    m.OT_adapt_reset_reference()
+    assert np.allclose(m._ot_adapt_reference_coords,
+                       np.asarray(m.X.coords))
+    # explicit coords override
+    m.OT_adapt_reset_reference(coords=X0)
+    assert np.allclose(m._ot_adapt_reference_coords, X0)
+
+
+# ---------------------------------------------------------------------------
+# skip_threshold short-circuits an already-aligned mesh
+# ---------------------------------------------------------------------------
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_skip_threshold_returns_false():
+    m, T, _ = _annulus_with_field()
+    X0 = np.asarray(m.X.coords).copy()
+    # misalignment is in [0, 1]; threshold > 1 always skips
+    moved = m.OT_adapt(T, refinement=3.0, fields_to_remap=[T],
+                       skip_threshold=2.0)
+    assert moved is False
+    # mesh untouched
+    assert np.allclose(np.asarray(m.X.coords), X0)
+
+
+# ---------------------------------------------------------------------------
+# Constrained-manifold mesh (dim != cdim) is not supported
+# ---------------------------------------------------------------------------
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_manifold_raises():
+    try:
+        sm = uw.meshing.SegmentedSphericalSurface2D(
+            radius=1.0, cellSize=0.2, numSegments=6)
+    except Exception:
+        pytest.skip("manifold surface mesh not constructible in this build")
+    assert sm.dim != sm.cdim
+    T = uw.discretisation.MeshVariable(
+        "Tm", sm, vtype=uw.VarType.SCALAR, degree=1, continuous=True)
+    with pytest.raises(NotImplementedError):
+        sm.OT_adapt(T)
+
+
+# ---------------------------------------------------------------------------
+# grad_smoothing_length default ("auto" ≈ grid size) de-noises the metric so
+# production refinement stays sliver-free; "off" (None) does not.
+# ---------------------------------------------------------------------------
+def _qmin(mesh):
+    tris = _sm._tri_cells(mesh.dm)
+    X = np.asarray(mesh.X.coords); p = X[tris]
+    e = (np.linalg.norm(p[:, 1]-p[:, 0], axis=1)**2
+         + np.linalg.norm(p[:, 2]-p[:, 1], axis=1)**2
+         + np.linalg.norm(p[:, 0]-p[:, 2], axis=1)**2)
+    A = _sm._signed_areas(X, tris)
+    return float((4*np.sqrt(3)*np.abs(A)/(e+1e-30)).min())
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_auto_smoothing_default_avoids_slivers():
+    # Sharp boundary layer: with smoothing OFF the metric chases sub-cell
+    # noise and slivers form; the "auto" default de-noises it.
+    m_off, T_off, _ = _annulus_with_field(slope=28.0)
+    m_off.OT_adapt(T_off, refinement=3.0, fields_to_remap=[T_off],
+                   grad_smoothing_length=None)
+    q_off = _qmin(m_off)
+
+    m_auto, T_auto, _ = _annulus_with_field(slope=28.0)
+    m_auto.OT_adapt(T_auto, refinement=3.0, fields_to_remap=[T_auto])  # default "auto"
+    q_auto = _qmin(m_auto)
+
+    assert q_auto > q_off            # auto is strictly better
+    assert q_auto > 0.2              # and sliver-free
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_smoothing_accepts_explicit_and_unit_aware_length():
+    # explicit non-dimensional length
+    m, T, _ = _annulus_with_field()
+    assert m.OT_adapt(T, refinement=3.0, fields_to_remap=[T],
+                      grad_smoothing_length=0.0625) is True
+    # unit-aware (Pint) length is accepted (routed through the projection's
+    # non-dimensionalisation)
+    m2, T2, _ = _annulus_with_field()
+    L = 0.0625 * uw.scaling.units.meter
+    assert m2.OT_adapt(T2, refinement=3.0, fields_to_remap=[T2],
+                       grad_smoothing_length=L) is True
+
+
+@pytest.mark.tier_a
+@pytest.mark.level_1
+def test_ot_adapt_invalid_smoothing_string_raises():
+    m, T, _ = _annulus_with_field()
+    with pytest.raises(ValueError):
+        m.OT_adapt(T, refinement=3.0, grad_smoothing_length="bogus")