Summary
The cupy polygonize backend groups float pixels into regions using exact equality (data == v) while the numpy (and therefore dask and dask+cupy) backend uses tolerance-based comparison via _is_close (atol=1e-8, rtol=1e-5). This produces different polygons for the same float input depending on the backend.
Reproducer
import numpy as np
import xarray as xr
import cupy as cp
from xrspatial.polygonize import polygonize
data = np.array([
[1.0, 1.000001, 2.0],
[1.000001, 1.0, 2.0],
[3.0, 3.0, 2.0],
], dtype=np.float64)
vals_np, polys_np = polygonize(xr.DataArray(data))
vals_cp, polys_cp = polygonize(xr.DataArray(cp.asarray(data)))
print(vals_np) # [1.0, 2.0, 3.0] -> 3 polygons
print(vals_cp) # [1.0, 1.000001, 2.0, 1.000001, 1.0, 3.0] -> 6 polygons
Root cause
xrspatial/polygonize.py::_calculate_regions (numba CPU path) calls _is_close for adjacency checks:
matches_W = ... _is_close(values[ij], values[ij-1])
_is_close uses abs(value - reference) <= atol + rtol*abs(reference) for float types.
xrspatial/polygonize.py::_calculate_regions_cupy instead bins pixels by exact value:
unique_vals = cp.unique(data[valid] if valid is not None else data.ravel())
for v in unique_vals:
bin_mask = (data == v)
...
Impact
- HIGH: numerically different results on identical float input across backends.
- The dask and dask+cupy backends both route per-chunk work through
_polygonize_numpy, so only the pure-cupy backend diverges.
- Documented or not, two of the four backends produce a different polygon set on the same float input.
Suggested fix
In _calculate_regions_cupy, group near-equal float values before per-value labeling. For float dtypes, sort unique_vals, walk them, and merge any consecutive pair where the cpu-side _is_close predicate would return True. Then label each merged group with a single (data == merged_values).any(axis=...) style mask. For integer dtypes, keep the existing exact-equality path (it already matches the numba path).
The check should match _is_close exactly so the two backends produce equivalent regions on float input.
Discovered by the /sweep-accuracy audit (Cat 5 - Backend Inconsistency).
Summary
The
cupypolygonize backend groups float pixels into regions using exact equality (data == v) while thenumpy(and therefore dask and dask+cupy) backend uses tolerance-based comparison via_is_close(atol=1e-8,rtol=1e-5). This produces different polygons for the same float input depending on the backend.Reproducer
Root cause
xrspatial/polygonize.py::_calculate_regions(numba CPU path) calls_is_closefor adjacency checks:_is_closeusesabs(value - reference) <= atol + rtol*abs(reference)for float types.xrspatial/polygonize.py::_calculate_regions_cupyinstead bins pixels by exact value:Impact
_polygonize_numpy, so only the pure-cupy backend diverges.Suggested fix
In
_calculate_regions_cupy, group near-equal float values before per-value labeling. For float dtypes, sortunique_vals, walk them, and merge any consecutive pair where the cpu-side_is_closepredicate would return True. Then label each merged group with a single(data == merged_values).any(axis=...)style mask. For integer dtypes, keep the existing exact-equality path (it already matches the numba path).The check should match
_is_closeexactly so the two backends produce equivalent regions on float input.Discovered by the
/sweep-accuracyaudit (Cat 5 - Backend Inconsistency).