[BUG] - Three stale-cache issues after swarm particle addition

## Summary

Three independent stale-cache issues were found when a swarm is modified via `add_particles_with_coordinates()` and then used for interpolation or projection. All three cause silent data corruption or MPI deadlock (PETSc < 3.24) and affect the common pattern of re-adding particles to empty cells each timestep.

---

### Bug 1 — Stale kd-tree in `add_particles_with_coordinates`

**File:** `src/underworld3/swarm.py:3571-3577`

`add_particles_with_coordinates()` calls `self.dm.migrate()` (the raw PETSc DMSwarm migration) rather than `self.migrate()` (the UW3 wrapper). The manual cache invalidation at lines 3574-3577 nils `_particle_coordinates._canonical_data` and each variables `_canonical_data`, but **misses `self._kdtree`**:

```python
# Lines 3574-3577 (BEFORE fix)
self._particle_coordinates._canonical_data = None
for var in self._vars.values():
    if hasattr(var, "_canonical_data"):
        var._canonical_data = None
# missing: self._kdtree = None
```

By contrast, `Swarm._invalidate_canonical_data()` at line 2648 correctly sets `self._kdtree = None`. It is called by `self.migrate()` (line 3463), but `add_particles_with_coordinates` bypasses that path.

**Effect:** After adding particles, `swarm._get_kdtree()` returns a kd-tree built from OLD particle coordinates. RBF interpolation (both for proxy mesh variables and for `uw.function.evaluate()` with `rbf=True`) looks up particle indices from the stale tree, accessing wrong PETSc memory locations. On PETSc 3.22.2 this produces an MPI deadlock inside the kd-tree query → `rbf_evaluate` → `update_lvec` path; on PETSc 3.24.2 it silently returns wrong interpolated values.

**Fix applied:** Added `self._kdtree = None` after line 3573.

---

### Bug 2 — Stale cached projector in `_project_to_work_variable`

**File:** `src/underworld3/function/_function.pyx:529-642`

`_project_to_work_variable()` caches `Projection` solver instances on the mesh object as `_eval_projector_scalar` (scalar) or `_eval_{shape}_projector` (tensor). The solver is created once and reused across all subsequent `evaluate()` calls on that mesh:

```python
if not hasattr(mesh, "_eval_projector_scalar"):
    mesh._eval_projector_scalar = uw.systems.Projection(mesh, ...)
projector = mesh._eval_projector_scalar
projector.uw_function = scalar_expr
projector.solve(zero_init_guess=False)  # no _force_setup
```

When a Stokes solve (or any other solver modifying the DM) runs between two `evaluate()` calls, the cached projectors PETSc solver state (SNES/KSP/matrix decomposition) is stale. On PETSc 3.22.2 the `projector.solve()` deadlocks because the cached matrix doesnt match the current DM state. PETSc 3.24.2 tolerates this (silently returns wrong results).

**Fix applied:** Changed both the scalar projector (line 640) and the tensor projector (line 613) to pass `_force_setup=True`:
```python
projector.solve(zero_init_guess=False, _force_setup=True)
```

**Note:** The same stale-cached-projector pattern exists in user code that reuses `Projection` solver instances across timesteps or after Stokes solves. Any cached projection solver should either (a) pass `_force_setup=True` on every solve, or (b) track a DM version counter and auto-rebuild when the DM changes.

---

### Bug 3 — Stale proxy mesh variable data after swarm write

**File:** `src/underworld3/swarm.py:1034-1087` (proxy update pipeline)

When a `SwarmVariable` has `proxy_degree > 0` (the default is `proxy_degree=2`), a proxy `MeshVariable` is created that stores RBF-interpolated values from the swarm. The update is lazy:

1. `swarm.access(var)` modifies the canonical data array
2. On exit, `delay_callbacks_global` fires the data callback
3. The callback calls `pack_raw_data_to_petsc()` (line 478), which writes to PETSc and calls `self._update()` (line 1291), setting `self._proxy_stale = True`
4. The actual re-interpolation (`_rbf_to_meshVar`) happens only when `material.sym` is accessed or `_update_proxy_if_stale()` is called

**The problem:** If code reads the proxys `MeshVariable` DM directly (e.g., a `Projection` solver that evaluates its `uw_function` at quadrature points), it reads STALE data from the proxys PETSc DM — the lazy update hasnt fired yet.

**Concrete scenario:**
```python
material = swarm.add_variable("material", 1, dtype=int, proxy_degree=2)
meshMat.uw_function = material.sym[0]  # triggers proxy update, stores symbol

# ... add particles and set new material values ...
meshMat.solve(_force_setup=True)
# ^ evaluates stored proxy symbol at quadrature points
# ^ proxy DM still contains data from the FIRST sym access — STALE
```

**Why `uw.function.evaluate(material.sym[0], ...)` works:** It re-accesses `material.sym`, which calls `_update_proxy_if_stale()` and re-interpolates from the current swarm.

**Fix needed:** Either:
- (a) Document that `_update_proxy_if_stale()` must be called before using the proxy MeshVariable DM after a swarm write
- (b) Make the evaluation pipeline check for stale proxies and auto-update before reading
- (c) Remove the lazy proxy update pattern and update immediately on data write
- (d) Add proxy update hooks in `add_particles_with_coordinates` and other swarm-mutating methods

---

## Reproduction

The test file `tests/test_0112_swarm_add_particles.py` contains `test_proxy_updates_after_add_particles` which reproduces Bug 1 (kd-tree) and Bug 3 (proxy staleness). Bug 2 was reproduced on Setonix HPC (PETSc 3.22.2) and confirmed locally on macOS (PETSc 3.24.2).

## Environment

- PETSc 3.22.2 (Setonix HPC) — deadlocks on Bugs 2
- PETSc 3.24.2 (macOS) — silently returns wrong values on Bugs 2
- Underworld3 `development` branch (as of 2026-05-29)

## Related Files

- `src/underworld3/swarm.py` — lines 3501, 3571-3577 (Bug 1), lines 1034-1087 (Bug 3)
- `src/underworld3/function/_function.pyx` — lines 529-642 (Bug 2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Three stale-cache issues after swarm particle addition #215

Summary

Bug 1 — Stale kd-tree in `add_particles_with_coordinates`

Bug 2 — Stale cached projector in `_project_to_work_variable`

Bug 3 — Stale proxy mesh variable data after swarm write

Reproduction

Environment

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] - Three stale-cache issues after swarm particle addition #215

Description

Summary

Bug 1 — Stale kd-tree in add_particles_with_coordinates

Bug 2 — Stale cached projector in _project_to_work_variable

Bug 3 — Stale proxy mesh variable data after swarm write

Reproduction

Environment

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 1 — Stale kd-tree in `add_particles_with_coordinates`

Bug 2 — Stale cached projector in `_project_to_work_variable`