Skip to content

reproject: out-of-range explicit nodata corrupts integer output and contradicts attrs #2572

@brendancol

Description

@brendancol

Describe the bug

reproject(raster, target_crs, nodata=...) does not check that the explicit nodata value fits the source/output integer dtype. When it does not fit, the cast-back step in the worker wraps or clamps it (usually to 0), so out-of-bounds output pixels look the same as valid zero pixels. attrs["nodata"] still records the original out-of-range value, so the array and the metadata disagree.

Relevant lines:

  • xrspatial/reproject/_crs_utils.py:146 -- _detect_nodata lets any finite explicit nodata pass through unchanged, even when it does not fit the dtype.
  • xrspatial/reproject/__init__.py:435 -- the per-chunk worker clips the float result to iinfo.min..iinfo.max and casts to the integer dtype, so -9999 written into a uint8 array lands at 0.

Reproduction

import numpy as np
import xarray as xr
from xrspatial.reproject import reproject

arr = np.ones((4, 4), dtype=np.uint8) * 10
da = xr.DataArray(
    arr,
    dims=["y", "x"],
    coords={"y": np.linspace(40, 30, 4), "x": np.linspace(-5, 5, 4)},
    attrs={"crs": "EPSG:4326"},
)
out = reproject(da, "EPSG:4326", nodata=-9999, bounds=(-20, 20, 20, 50))

print(out.dtype)                  # uint8
print(out.attrs["nodata"])        # -9999.0
print(np.unique(out.values))      # [0 10]
print(out.values[0, 0])           # 0, supposed to be nodata

Both 0 (valid input) and 0 (corrupted nodata) appear in the output with no way to tell them apart, and attrs["nodata"] advertises a value that does not match anything in the array.

Expected behavior

When an explicit nodata cannot be represented in the resolved dtype, reproject should fail fast with a clear ValueError naming the dtype and the offending value. This matches what xrspatial.rasterize already does when fill=NaN is passed with an integer dtype.

Suggested fix

In _detect_nodata (or at the API boundary in reproject), reject explicit integer nodata that falls outside np.iinfo(dtype).min..np.iinfo(dtype).max. NaN passed for an integer dtype is already swapped for a representable sentinel; this is the symmetric guard for the finite-but-out-of-range case.

Tests should cover:

  • uint8 with nodata=-9999 raises
  • uint16 with nodata=70000 raises
  • uint8 with nodata=255 still works (boundary case)
  • int16 with nodata=-32768 still works (in-range)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginput-validationInput validation and error messages

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions