Describe the bug
reproject(raster, target_crs, nodata=...) does not check that the explicit nodata value fits the source/output integer dtype. When it does not fit, the cast-back step in the worker wraps or clamps it (usually to 0), so out-of-bounds output pixels look the same as valid zero pixels. attrs["nodata"] still records the original out-of-range value, so the array and the metadata disagree.
Relevant lines:
xrspatial/reproject/_crs_utils.py:146 -- _detect_nodata lets any finite explicit nodata pass through unchanged, even when it does not fit the dtype.
xrspatial/reproject/__init__.py:435 -- the per-chunk worker clips the float result to iinfo.min..iinfo.max and casts to the integer dtype, so -9999 written into a uint8 array lands at 0.
Reproduction
import numpy as np
import xarray as xr
from xrspatial.reproject import reproject
arr = np.ones((4, 4), dtype=np.uint8) * 10
da = xr.DataArray(
arr,
dims=["y", "x"],
coords={"y": np.linspace(40, 30, 4), "x": np.linspace(-5, 5, 4)},
attrs={"crs": "EPSG:4326"},
)
out = reproject(da, "EPSG:4326", nodata=-9999, bounds=(-20, 20, 20, 50))
print(out.dtype) # uint8
print(out.attrs["nodata"]) # -9999.0
print(np.unique(out.values)) # [0 10]
print(out.values[0, 0]) # 0, supposed to be nodata
Both 0 (valid input) and 0 (corrupted nodata) appear in the output with no way to tell them apart, and attrs["nodata"] advertises a value that does not match anything in the array.
Expected behavior
When an explicit nodata cannot be represented in the resolved dtype, reproject should fail fast with a clear ValueError naming the dtype and the offending value. This matches what xrspatial.rasterize already does when fill=NaN is passed with an integer dtype.
Suggested fix
In _detect_nodata (or at the API boundary in reproject), reject explicit integer nodata that falls outside np.iinfo(dtype).min..np.iinfo(dtype).max. NaN passed for an integer dtype is already swapped for a representable sentinel; this is the symmetric guard for the finite-but-out-of-range case.
Tests should cover:
uint8 with nodata=-9999 raises
uint16 with nodata=70000 raises
uint8 with nodata=255 still works (boundary case)
int16 with nodata=-32768 still works (in-range)
Describe the bug
reproject(raster, target_crs, nodata=...)does not check that the explicit nodata value fits the source/output integer dtype. When it does not fit, the cast-back step in the worker wraps or clamps it (usually to 0), so out-of-bounds output pixels look the same as valid zero pixels.attrs["nodata"]still records the original out-of-range value, so the array and the metadata disagree.Relevant lines:
xrspatial/reproject/_crs_utils.py:146--_detect_nodatalets any finite explicit nodata pass through unchanged, even when it does not fit the dtype.xrspatial/reproject/__init__.py:435-- the per-chunk worker clips the float result toiinfo.min..iinfo.maxand casts to the integer dtype, so-9999written into a uint8 array lands at 0.Reproduction
Both 0 (valid input) and 0 (corrupted nodata) appear in the output with no way to tell them apart, and
attrs["nodata"]advertises a value that does not match anything in the array.Expected behavior
When an explicit nodata cannot be represented in the resolved dtype, reproject should fail fast with a clear
ValueErrornaming the dtype and the offending value. This matches whatxrspatial.rasterizealready does whenfill=NaNis passed with an integer dtype.Suggested fix
In
_detect_nodata(or at the API boundary inreproject), reject explicit integer nodata that falls outsidenp.iinfo(dtype).min..np.iinfo(dtype).max. NaN passed for an integer dtype is already swapped for a representable sentinel; this is the symmetric guard for the finite-but-out-of-range case.Tests should cover:
uint8withnodata=-9999raisesuint16withnodata=70000raisesuint8withnodata=255still works (boundary case)int16withnodata=-32768still works (in-range)