rasterize: reject NaN fill against integer dtype (#2504)#2512
rasterize: reject NaN fill against integer dtype (#2504)#2512brendancol wants to merge 3 commits into
Conversation
When ``rasterize(..., dtype=<int>)`` runs with the default ``fill=np.nan``, the trailing ``out.astype(int_dtype)`` silently coerced NaN to a platform-specific sentinel (``np.iinfo(dtype).min`` on x86, ``0`` on Apple Silicon, ``0`` for unsigned dtypes) and the function emitted no ``_FillValue`` / ``nodata`` / ``nodatavals`` attr. Downstream tooling (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as valid burns. Guard the cast up front: when the resolved output dtype is integer and the fill is NaN, raise ``ValueError`` with a pointer to ``fill=0`` / ``fill=-9999`` or a floating dtype. The check runs before any host or device allocation, so all four backends (numpy / cupy / dask+numpy / dask+cupy) trip it identically. The previous ``TestIntegerDtypeNanFill`` test had pinned the silent cast as observed behaviour on 2026-05-17; rewrite it to pin the raise. Add ``test_rasterize_nan_int_fill_2504.py`` with 18 cases covering every signed/unsigned int width, the ``like=`` integer-dtype branch, all four backends, explicit and default NaN, the numpy-typed NaN edge case, and the unaffected float-dtype path.
Record the 2026-05-27 metadata sweep of xrspatial.rasterize. Re-verified the previously-fixed attrs / coords / dims propagation across all four backends and surfaced one new HIGH finding (Cat 4): NaN fill against an integer output dtype silently coerced to a platform sentinel with no _FillValue attr emitted. Fix and tests landed in the previous commit (issue #2504).
Three review fixes on the NaN/int-dtype guard added in the previous commit: 1. Inline comment lied about ordering. The "and before ``_check_output_dimensions``" line claimed the NaN guard ran before the width/height/max_pixels guard, but the code places it after. Rewrite the comment to match the actual order and note why the dimension check goes first (it produces a more actionable diagnostic for oversized grids; both checks still land before any allocation). 2. Docstring for ``fill`` did not mention the new restriction. Add a note pointing the user at ``fill=0`` / ``fill=-9999`` or a floating dtype when ``dtype`` resolves to an integer type, so the constraint surfaces at read time instead of at runtime. 3. ``isinstance(fill, (int, float, np.integer, np.floating))`` listed the two integer types redundantly: ``np.isnan`` on an integer is always False, so an int-typed fill could never reach the raise branch. Drop ``int`` and ``np.integer`` from the tuple; the semantics are unchanged (verified against the same 18-case regression file plus a manual smoke run with bool / np.int32 / np.float64(nan) fills).
Review follow-upRan Suggestion 1: inline comment contradicted the actual code orderThe block at Suggestion 2: docstring didn't mention the new constraintThe Nit 3: redundant types in the isinstance tuple
Tests
What looks good
|
Closes #2504.
Summary
rasterize(..., dtype=<int>)with the defaultfill=np.nanused to silently coerce NaN to a platform-specific sentinel (INT_MINon x86,0on Apple Silicon,0for unsigned) and emit no_FillValue/nodata/nodatavalsattr to mark unwritten pixels. Downstream tooling (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as valid burns. Surfaced by the 2026-05-27 metadata sweep.ValueErrorpointing the caller atfill=0/fill=-9999or a floating dtype. The check runs before any host or device allocation so every backend trips it consistently.TestIntegerDtypeNanFill(which had pinned the silent cast on 2026-05-17 as observed-but-unsupported behaviour) to pin the new raise, and added a regression file with 18 cases covering every signed/unsigned int width, thelike=integer-dtype branch, all four backends, explicit and default NaN, the numpy-typed NaN edge case, and the unaffected float-dtype path.Backend coverage
numpy / cupy / dask+numpy / dask+cupy. The guard sits in the public
rasterize()entry before backend dispatch, so the raise is identical across all four. Verified live on this CUDA host.Test plan
pytest xrspatial/tests/test_rasterize_nan_int_fill_2504.py -v(18 passed)pytest xrspatial/tests/ -k rasterize -q(476 passed, 2 skipped)likewithint16+ defaultfill=nanraises;dtype=int32, fill=-9999round-trips through thenodata/_FillValue/nodatavalstriplet;float64+ default NaN unaffected.