Skip to content

Fix vector DMInterpolation overshoot at cell-shared boundaries#164

Merged
lmoresi merged 1 commit intodevelopmentfrom
bugfix/dminterp-vector-multifield
May 5, 2026
Merged

Fix vector DMInterpolation overshoot at cell-shared boundaries#164
lmoresi merged 1 commit intodevelopmentfrom
bugfix/dminterp-vector-multifield

Conversation

@lmoresi
Copy link
Copy Markdown
Member

@lmoresi lmoresi commented May 3, 2026

Summary

uw.function.evaluate(vec_var.sym, coords) silently returned values
far outside the data range at a small handful of degree-2
cell-interior nodes whose coordinates floating-point-coincide with
an inter-cell face. Scalars were unaffected; vectors lost their
y-component to either uninitialised PETSc memory (values up to
~1e+246) or Q2 basis-polynomial overshoot.

This bug silently corrupted any vector field evaluated through
uw.function.evaluate — initial conditions, BC enforcement checks,
post-processing, visualisation, anything downstream of the canonical
evaluator on certain mesh resolutions. Found while diagnosing a
visualisation snapshot that showed max |u| = 70 instead of
V_top = 0.227.

Three layered defects in src/underworld3/function/petsc_tools.c

  1. Uninitialised output for unlocated points
    DMInterpolationEvaluate_UW skipped the FE evaluation for
    points with cells[p] = -1 but never zeroed
    interpolant[p*dof + ...]. Now zeroed explicitly.

  2. Lost upstream cell hintDMInterpolationSetUp_UW discarded
    the owning_cell hint passed in by mesh.get_closest_cells().
    recovery_cells map now combines DMLocatePoints results with
    the hint; rank 0 claims ownership of hint-recovered points.
    Justification: callers already filter for in-domain points via
    mesh.points_in_domain() upstream, so any point unlocated here
    is genuinely inside the mesh — just sitting on a shared face
    PETSc's strict containment test rejected.

  3. Reference-coord overshootDMPlexCoordinatesToReference
    returns ξ slightly outside [-1, 1] for boundary points evaluated
    against an adjacent cell, and the Q2 basis polynomial
    extrapolates wildly. ξ now clamped to the cell's valid range —
    correct boundary value because the basis is continuous at faces.

MeshVariable.read_timestep hardening

Adds a local→global vector sync + mesh-lvec invalidation at the end
of read_timestep. Not part of the visible bug, but makes
read_timestep symmetric with sibling load_from_h5_plex_vector
and removes a foot-gun for any caller reading _gvec directly
after a load.

Test plan

  • New regression gate docs/developer/design/_repro_dminterp_multifield_bug.py
    builds a fresh mesh, assigns known constants to a vector +
    multiple scalars, evaluates each at its own DOF nodes,
    reports max abs error. Both fresh-build and save+load.
    - Pre-fix: vector err = 7.0 (full y-component loss);
    scalar err = 2.8e-14.
    - Post-fix: every variable round-trips at machine precision
    (1.4e-14).
  • Check Tier-A regression suite still passes
  • Check Tier-B regression suite still passes
  • Confirm fix is parallel-correct (currently tested single-rank only)

Underworld development team with AI support from Claude Code

Copilot AI review requested due to automatic review settings May 3, 2026 11:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect/garbage values produced by the PETSc-based evaluator for vector-valued MeshVariables at a small set of boundary-coincident Q2 DOF locations, and hardens post-load vector visibility by forcing proper local↔global sync / mesh-lvec invalidation after read_timestep.

Changes:

  • Add “hint cell” recovery + output zeroing + reference-coordinate clamping in DMInterpolationSetUp_UW / DMInterpolationEvaluate_UW to avoid uninitialised/overshoot results at cell-shared boundaries.
  • Harden MeshVariable.read_timestep() by syncing var local→global and invalidating the cached mesh-level local vector.
  • Add a developer reproducer script documenting the bug and the expected round-trip behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/underworld3/function/petsc_tools.c Changes PETSc interpolation setup/evaluation to prevent uninitialised outputs and mitigate boundary/reference-coordinate issues for vectors.
src/underworld3/discretisation/discretisation_mesh_variables.py Ensures read_timestep() results are visible through mesh-level vectors by syncing and invalidating cached mesh _lvec.
docs/developer/design/_repro_dminterp_multifield_bug.py Adds a minimal reproducer/regression gate script for the multi-field/vector interpolation failure mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1317 to +1321
self._sync_lvec_to_gvec()
if self.mesh._lvec is not None:
self.mesh._lvec.destroy()
self.mesh._lvec = None
self.mesh._stale_lvec = True
"""

import numpy as np
import sympy
Comment on lines +106 to +120
/*
Build a per-point cell map combining DMLocatePoints results with
the upstream `owning_cell` hint. PETSc's _REMOVE flag silently
drops points sitting exactly on inter-cell boundaries (e.g. the
degree-2 cell-interior nodes of a Q2 vector field that happen to
floating-point-coincide with a face). The upstream caller already
filters out genuinely-out-of-domain points via
`mesh.points_in_domain`, so any unlocated point here is in the
mesh; the hint from `mesh.get_closest_cells` gives a valid
adjacent cell. Using the hint to recover those points avoids
silent data loss in the interpolant — the symptom in Phase H was
`uw.function.evaluate(u.sym, u.coords)` returning ~1e+246 at five
isolated cell-midpoint nodes (uninitialised PETSc memory in the
output buffer).
*/
Comment on lines +121 to 138
PetscInt *recovery_cells = NULL;
PetscCall(PetscMalloc1(N, &recovery_cells));
for (PetscInt k = 0; k < N; ++k)
recovery_cells[k] = owning_cell ? (PetscInt)owning_cell[k] : -1;
for (p = 0; p < numFound; ++p) {
if (foundCells[p].index >= 0) foundProcs[foundPoints ? foundPoints[p] : p] = rank;
if (foundCells[p].index >= 0) {
PetscInt orig_p = foundPoints ? foundPoints[p] : p;
recovery_cells[orig_p] = foundCells[p].index;
foundProcs[orig_p] = rank;
}
}
/* Recover unlocated points on rank 0 using the hint. */
if (owning_cell && rank == 0) {
for (PetscInt k = 0; k < N; ++k) {
if (foundProcs[k] == size && recovery_cells[k] >= 0)
foundProcs[k] = rank;
}
}
Comment on lines +263 to +276
// Clamp reference coords to the cell's valid range. When a
// hint cell was used (via SetUp's owning_cell fallback) for a
// point sitting on a shared face, the physical-to-reference
// map can return ξ slightly outside [-1, 1] (e.g. 1+1e-16) —
// and the Q2 / higher-order basis polynomial evaluates to
// large numbers when extrapolated. For shared-boundary points
// the correct answer is the cell-boundary value, which the
// basis returns when ξ is clamped to the cell's valid range.
// [-1, 1] covers quad / hex elements; triangle / tet maps
// already return ξ inside their reference simplex so this
// clamp is a no-op for them at well-behaved coordinates.
for (d = 0; d < cdim; ++d) {
if (xi[d] < -1.0) xi[d] = -1.0;
else if (xi[d] > 1.0) xi[d] = 1.0;
`uw.function.evaluate(vec_var.sym, coords)` silently returned values
far outside the data range — up to ~1e+246 — at a small handful of
degree-2 cell-interior nodes whose coordinates floating-point-coincide
with an inter-cell face. Scalars were unaffected; vectors lost their
y-component to either uninitialised PETSc memory or basis-polynomial
overshoot.

Three layered defects in `src/underworld3/function/petsc_tools.c`:

1. `DMInterpolationEvaluate_UW` skipped the FE evaluation for points
   with `cells[p] = -1` (rejected by `DMLocatePoints`'s strict
   containment test) but never zeroed `interpolant[p*dof + ...]`.
   Result: those points presented as whatever was in PETSc-allocated
   output memory (1e+246-class outliers).

2. `DMInterpolationSetUp_UW` discarded the upstream `owning_cell`
   hint for any point `DMLocatePoints` rejected. The hint comes from
   `mesh.get_closest_cells()` which is reliable: callers already
   filter for in-domain points via `mesh.points_in_domain()` upstream
   so any unlocated point at this stage is genuinely inside the
   mesh, just sitting on a face PETSc decided not to claim.
   `recovery_cells` map now combines `foundCells` with the hint;
   rank 0 claims ownership of hint-recovered points.

3. `DMPlexCoordinatesToReference` returns ξ slightly outside [-1, 1]
   for boundary points evaluated against an adjacent cell, and the
   Q2 basis polynomial extrapolates wildly there. Clamping ξ to the
   cell's reference range produces the correct boundary value (the
   basis is continuous at faces, so either adjacent cell yields the
   same answer once ξ is clamped to the shared face).

`MeshVariable.read_timestep` also gains a local→global vector sync
plus mesh-lvec invalidation. This isn't part of the visible bug but
makes `read_timestep` symmetric with the sibling `load_from_h5_plex_vector`
and removes a foot-gun for any caller reading `_gvec` directly after
a load.

Regression gate: `docs/developer/design/_repro_dminterp_multifield_bug.py`
builds a fresh mesh, assigns known constants to a vector + several
scalars, evaluates each at its own DOF nodes via `uw.function.evaluate`,
and reports max abs error. Pre-fix output: vector `err = 7.0` (full
component loss); scalar `err = 2.8e-14`. Post-fix: every variable
round-trips at machine precision (1.4e-14).

This bug silently corrupted the y-component of any vector field
evaluated through `uw.function.evaluate`. Anything downstream that
sets initial conditions, checks BC enforcement, post-processes, or
visualises a velocity field could pick up the corruption depending
on whether mesh resolution placed a degree-2 interior node on a cell
face. Found while diagnosing a Phase H pyvista snapshot that showed
`max |u| = 70` instead of `V_top = 0.227`.

Underworld development team with AI support from [Claude Code](https://claude.com/claude-code)
@lmoresi lmoresi force-pushed the bugfix/dminterp-vector-multifield branch from 346b1a0 to 8a56a80 Compare May 4, 2026 12:03
@lmoresi lmoresi merged commit 259958b into development May 5, 2026
1 check passed
@lmoresi lmoresi deleted the bugfix/dminterp-vector-multifield branch May 5, 2026 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants