Describe the bug
The Dask backend of polygonize does not match the NumPy polygon count under 8-connectivity. The docstring at xrspatial/polygonize.py:1646 already notes that extra polygon splits happen at chunk corners.
This isn't a cosmetic mismatch. Region counts, joins, and area statistics computed downstream will disagree between the two backends.
Reproduction
import numpy as np
import xarray as xr
import dask.array as da
from xrspatial.polygonize import polygonize
arr = np.array([[1, 0], [0, 1]], dtype=np.int32)
# NumPy backend: 2 polygons
raster_np = xr.DataArray(arr)
vals_np, polys_np = polygonize(raster_np, connectivity=8)
print(len(vals_np)) # 2
# Dask backend with chunks (1, 1): 4 polygons
raster_dk = xr.DataArray(da.from_array(arr, chunks=(1, 1)))
vals_dk, polys_dk = polygonize(raster_dk, connectivity=8)
print(len(vals_dk)) # 4
Under connectivity=8, the two 1 cells are diagonally adjacent and should merge into a single polygon. Same for the two 0 cells. NumPy reports 2 polygons. Dask with chunks=(1, 1) reports 4 because each cell sits in its own chunk and the diagonal connection across the chunk corner is not detected.
Expected behavior
Dask should match NumPy's polygon count and geometry for the same input and connectivity value, regardless of chunking.
Fix direction
The Dask chunk-stitching logic handles edges between adjacent chunks today, which covers 4-connectivity. For 8-connectivity it also needs a corner-merge pass over the four cells meeting at every interior chunk corner. The pass should look at the 2x2 corner neighborhood at every interior (chunk_y_boundary, chunk_x_boundary) and merge the two diagonal cells when they share the same value, reusing the existing float tolerance helper.
The pass needs to stay chunked. Materializing the full array would defeat the point of the Dask path on large rasters.
Test coverage to add
Compare polygon count and per-region area between NumPy and Dask (multi-chunk) under connectivity=8 on:
- The repro
[[1, 0], [0, 1]]
- A larger checkerboard
- A diagonal stripe that crosses several chunk corners
Docstring
Update or remove the caveat at xrspatial/polygonize.py:1646 once the fix lands.
Describe the bug
The Dask backend of
polygonizedoes not match the NumPy polygon count under 8-connectivity. The docstring atxrspatial/polygonize.py:1646already notes that extra polygon splits happen at chunk corners.This isn't a cosmetic mismatch. Region counts, joins, and area statistics computed downstream will disagree between the two backends.
Reproduction
Under
connectivity=8, the two1cells are diagonally adjacent and should merge into a single polygon. Same for the two0cells. NumPy reports 2 polygons. Dask withchunks=(1, 1)reports 4 because each cell sits in its own chunk and the diagonal connection across the chunk corner is not detected.Expected behavior
Dask should match NumPy's polygon count and geometry for the same input and
connectivityvalue, regardless of chunking.Fix direction
The Dask chunk-stitching logic handles edges between adjacent chunks today, which covers 4-connectivity. For 8-connectivity it also needs a corner-merge pass over the four cells meeting at every interior chunk corner. The pass should look at the 2x2 corner neighborhood at every interior
(chunk_y_boundary, chunk_x_boundary)and merge the two diagonal cells when they share the same value, reusing the existing float tolerance helper.The pass needs to stay chunked. Materializing the full array would defeat the point of the Dask path on large rasters.
Test coverage to add
Compare polygon count and per-region area between NumPy and Dask (multi-chunk) under
connectivity=8on:[[1, 0], [0, 1]]Docstring
Update or remove the caveat at
xrspatial/polygonize.py:1646once the fix lands.