Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 w
polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)."
rasterize,2026-05-21,2251,HIGH,1,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean. | Re-audited 2026-05-21 (agent-a645dc07f847ae8ae worktree, branch deep-sweep-metadata-rasterize-2026-05-21). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified: all 4 backends route through the same final xr.DataArray constructor in rasterize(); crs / spatial_ref non-dim coord / coords / dims agree across backends. NEW HIGH finding #2251 (Cat 1): when rasterize(geoms, like=template, bounds=..., width=..., height=..., resolution=...) overrides the grid relative to like, the inherited attrs['transform'] and attrs['res'] from like are propagated unchanged so they describe the template's grid, not the actual output. get_dataarray_resolution() prefers attrs['res'] over calc_res from coords, so downstream slope/aspect/proximity see the wrong cellsize. Same class as #1407 sky_view_factor bug. Fix in rasterize(): out_attrs.pop('res') / out_attrs.pop('transform') when like_attrs is present but reuse_like_coords is False (output grid != template grid). Preserves crs / nodata triplet / spatial_ref handling. 9 new tests in TestLikeStaleGridAttrs2251 cover bounds override, width/height override, resolution override, matching width/height preserves attrs, get_dataarray_resolution consistency, and parity across all 4 backends. Full rasterize test suite (224 passed, 2 skipped) clean."
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.
resample,2026-05-27,2542,MEDIUM,2;4;5,"Audited 2026-05-27 (agent-a8135a6a246ecb93c worktree, branch deep-sweep-metadata-resample-2026-05-27). Cat 2 MEDIUM + Cat 4 MEDIUM + Cat 5 MEDIUM all rolled into issue #2542. (a) 2D non-identity path dropped scalar non-dim coords like rioxarrays spatial_ref and squeezed time/band selectors; identity path (scale==1.0, agg.copy()) and 3D path (per-band xr.concat) preserved them, so the bug was path-inconsistent (Cat 5). (b) _resolve_nodata reads attrs[nodata] as a fallback sentinel but the output post-processing only refreshed _FillValue and nodatavals, leaving attrs[nodata]=-9999 alongside data that was now NaN. Fix in resample(): refresh attrs[nodata] to NaN whenever the input had it, and carry across zero-dim non-dim coords on the 2D non-identity path. 7 new tests in TestMetadataPropagation cover nodata-attr refresh, spatial_ref/scalar coord carry, identity-vs-downsample coord parity, and the explicit choice to drop spatially-shaped extra coords. 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity verified for spatial_ref carry; nodata-attr refresh verified on numpy/cupy/dask+numpy (dask+cupy non-NaN nodata masking hits a pre-existing xarray xr.where + cupy.astype quirk unrelated to this audit). Full resample test suite (175 passed) clean."
31 changes: 25 additions & 6 deletions xrspatial/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1384,22 +1384,41 @@ def _new_coords(vals, n_out):
out_res_x, 0.0, left, 0.0, -out_res_y, top,
)

# Resample currently emits float32 with NaN as the missing-data
# sentinel regardless of input dtype. If the input declared a
# different sentinel via `_FillValue` or `nodatavals`, replace the
# value with NaN so the metadata matches the actual data. Leave the
# keys absent when the input did not have them.
# Resample replaces sentinel pixels with NaN regardless of input
# dtype. If the input declared a sentinel via `_FillValue`,
# `nodatavals`, or the rasterio-style `nodata` attr, refresh each
# one to NaN so the metadata matches the actual data. Leave the
# keys absent when the input did not have them. `_resolve_nodata`
# reads `nodata` as a fallback, so we must refresh it too -- a
# stale finite value here would silently mismatch the masked data
# on any downstream consumer that trusts `attrs['nodata']`.
if '_FillValue' in agg.attrs:
new_attrs['_FillValue'] = float('nan')
if 'nodatavals' in agg.attrs:
old = agg.attrs['nodatavals']
new_attrs['nodatavals'] = tuple(float('nan') for _ in old)
if 'nodata' in agg.attrs:
new_attrs['nodata'] = float('nan')

# Carry across scalar (zero-dim) non-dim coords like rioxarray's
# `spatial_ref` or a squeezed `time` / `band` selector. The
# identity path (scale==1.0) preserves these via `agg.copy()`;
# the 2D non-identity path must match so chained rioxarray
# pipelines don't silently lose CRS / spatial_ref / scalar
# selector coords. Spatially-shaped non-dim coords (dims include
# ydim or xdim) are not carried because their length changed.
extra_coords = {}
for coord_name, coord in agg.coords.items():
if coord_name in (ydim, xdim):
continue # spatial dim-coords are rebuilt above
if coord.ndim == 0:
extra_coords[coord_name] = coord

result = xr.DataArray(
result_data,
name=name,
dims=agg.dims,
coords={ydim: new_y, xdim: new_x},
coords={ydim: new_y, xdim: new_x, **extra_coords},
attrs=new_attrs,
)

Expand Down
161 changes: 161 additions & 0 deletions xrspatial/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,167 @@ def test_nodatavals_replaced_with_nan(self):
assert len(nv) == 1
assert np.isnan(nv[0])

def test_nodata_attr_refreshed_to_nan(self):
# `_resolve_nodata` reads `attrs['nodata']` as a fallback sentinel.
# If we mask its pixels to NaN we must also refresh the attr so
# downstream consumers don't trust the now-stale finite value.
data = np.array([[-9999, -9999, 10, 10],
[-9999, -9999, 10, 10],
[20, 20, 30, 30],
[20, 20, 30, 30]], dtype=np.float32)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={'y': np.linspace(3, 0, 4), 'x': np.linspace(0, 3, 4)},
attrs={'res': (1.0, 1.0), 'nodata': -9999},
)
out = resample(raster, scale_factor=0.5, method='nearest')
assert 'nodata' in out.attrs
assert np.isnan(out.attrs['nodata'])
# And the corresponding data pixel is NaN, matching the new sentinel.
assert np.isnan(out.values[0, 0])

def test_nodata_attr_absent_stays_absent(self):
data = np.arange(16, dtype=np.float32).reshape(4, 4)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={'y': np.linspace(3, 0, 4), 'x': np.linspace(0, 3, 4)},
attrs={'res': (1.0, 1.0)},
)
out = resample(raster, scale_factor=0.5)
assert 'nodata' not in out.attrs

def test_spatial_ref_coord_preserved(self):
# rioxarray stores CRS metadata on a scalar `spatial_ref` non-dim
# coord. Dropping it on resample silently breaks chained
# ``out.rio.crs`` calls even when ``attrs['crs']`` round-trips.
data = np.arange(16, dtype=np.float32).reshape(4, 4)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0,
},
attrs={'res': (1.0, 1.0), 'crs': 'EPSG:4326'},
)
out = resample(raster, scale_factor=0.5)
assert 'spatial_ref' in out.coords
assert out.coords['spatial_ref'].values == 0

def test_scalar_band_coord_preserved(self):
# A squeezed scalar band/time selector should round-trip.
data = np.arange(16, dtype=np.float32).reshape(4, 4)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'band': 1,
'time': np.datetime64('2024-01-01'),
},
attrs={'res': (1.0, 1.0)},
)
out = resample(raster, scale_factor=0.5)
assert 'band' in out.coords
assert out.coords['band'].values == 1
assert 'time' in out.coords

def test_identity_and_downsample_coords_consistent(self):
# Cat 5: identity path (scale=1.0, agg.copy()) and the 2D
# non-identity path must agree on which non-dim coords survive.
# Before #2542 only the identity path kept `spatial_ref`.
data = np.arange(16, dtype=np.float32).reshape(4, 4)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0,
},
attrs={'res': (1.0, 1.0)},
)
identity = resample(raster, scale_factor=1.0)
downsampled = resample(raster, scale_factor=0.5)
assert set(identity.coords) - {'y', 'x'} == set(downsampled.coords) - {'y', 'x'}

def test_spatial_ref_preserved_3d(self):
# The 3D per-band path hands each band to the 2D resample(), so
# spatial_ref should ride through xr.concat back onto the stacked
# result.
data = np.arange(3 * 4 * 4, dtype=np.float32).reshape(3, 4, 4)
raster = xr.DataArray(
data, dims=('band', 'y', 'x'),
coords={
'band': [1, 2, 3],
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0,
},
attrs={'res': (1.0, 1.0)},
)
out = resample(raster, scale_factor=0.5)
assert 'spatial_ref' in out.coords
assert out.coords['spatial_ref'].values == 0

@cuda_and_cupy_available
def test_spatial_ref_preserved_cupy(self):
# The fix lives at the shared 2D output constructor, so all four
# backends should preserve spatial_ref. Pin the cupy path
# explicitly so a backend-divergent regression would surface.
import cupy
data = cupy.arange(16, dtype=cupy.float32).reshape(4, 4)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0,
},
attrs={'res': (1.0, 1.0), 'crs': 'EPSG:4326'},
)
out = resample(raster, scale_factor=0.5)
assert 'spatial_ref' in out.coords
assert out.coords['spatial_ref'].values == 0

@dask_array_available
def test_spatial_ref_preserved_dask(self):
import dask.array as da
data = da.from_array(
np.arange(16, dtype=np.float32).reshape(4, 4), chunks=(2, 2),
)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0,
},
attrs={'res': (1.0, 1.0), 'crs': 'EPSG:4326'},
)
out = resample(raster, scale_factor=0.5)
assert 'spatial_ref' in out.coords
assert out.coords['spatial_ref'].values == 0

def test_2d_spatial_extra_coord_not_carried(self):
# A non-dim coord whose dims include the spatial axes has a
# length tied to the input resolution; it must NOT be carried
# over after a resize. Skipping it cleanly is the right
# behaviour (the user would need to resample it separately).
data = np.arange(16, dtype=np.float32).reshape(4, 4)
mask = np.zeros((4, 4), dtype=np.float32)
raster = xr.DataArray(
data, dims=('y', 'x'),
coords={
'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'mask': (('y', 'x'), mask),
},
attrs={'res': (1.0, 1.0)},
)
out = resample(raster, scale_factor=0.5)
# Spatial extra coords drop (would otherwise shape-mismatch the output).
assert 'mask' not in out.coords

def test_other_attrs_preserved(self):
# crs, units, long_name, scales, offsets should round-trip.
data = np.arange(16, dtype=np.float32).reshape(4, 4)
Expand Down
Loading