rasterize: reject NaN fill against integer dtype (#2504) by brendancol · Pull Request #2512 · xarray-contrib/xarray-spatial

brendancol · 2026-05-27T17:37:05Z

Closes #2504.

Summary

rasterize(..., dtype=<int>) with the default fill=np.nan used to silently coerce NaN to a platform-specific sentinel (INT_MIN on x86, 0 on Apple Silicon, 0 for unsigned) and emit no _FillValue / nodata / nodatavals attr to mark unwritten pixels. Downstream tooling (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as valid burns. Surfaced by the 2026-05-27 metadata sweep.
Guard the cast up front: when the resolved output dtype is integer and the fill is NaN, raise ValueError pointing the caller at fill=0 / fill=-9999 or a floating dtype. The check runs before any host or device allocation so every backend trips it consistently.
Rewrote the prior TestIntegerDtypeNanFill (which had pinned the silent cast on 2026-05-17 as observed-but-unsupported behaviour) to pin the new raise, and added a regression file with 18 cases covering every signed/unsigned int width, the like= integer-dtype branch, all four backends, explicit and default NaN, the numpy-typed NaN edge case, and the unaffected float-dtype path.

Backend coverage

numpy / cupy / dask+numpy / dask+cupy. The guard sits in the public rasterize() entry before backend dispatch, so the raise is identical across all four. Verified live on this CUDA host.

Test plan

pytest xrspatial/tests/test_rasterize_nan_int_fill_2504.py -v (18 passed)
pytest xrspatial/tests/ -k rasterize -q (476 passed, 2 skipped)
Manual smoke: like with int16 + default fill=nan raises; dtype=int32, fill=-9999 round-trips through the nodata / _FillValue / nodatavals triplet; float64 + default NaN unaffected.

When ``rasterize(..., dtype=<int>)`` runs with the default ``fill=np.nan``, the trailing ``out.astype(int_dtype)`` silently coerced NaN to a platform-specific sentinel (``np.iinfo(dtype).min`` on x86, ``0`` on Apple Silicon, ``0`` for unsigned dtypes) and the function emitted no ``_FillValue`` / ``nodata`` / ``nodatavals`` attr. Downstream tooling (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as valid burns. Guard the cast up front: when the resolved output dtype is integer and the fill is NaN, raise ``ValueError`` with a pointer to ``fill=0`` / ``fill=-9999`` or a floating dtype. The check runs before any host or device allocation, so all four backends (numpy / cupy / dask+numpy / dask+cupy) trip it identically. The previous ``TestIntegerDtypeNanFill`` test had pinned the silent cast as observed behaviour on 2026-05-17; rewrite it to pin the raise. Add ``test_rasterize_nan_int_fill_2504.py`` with 18 cases covering every signed/unsigned int width, the ``like=`` integer-dtype branch, all four backends, explicit and default NaN, the numpy-typed NaN edge case, and the unaffected float-dtype path.

Record the 2026-05-27 metadata sweep of xrspatial.rasterize. Re-verified the previously-fixed attrs / coords / dims propagation across all four backends and surfaced one new HIGH finding (Cat 4): NaN fill against an integer output dtype silently coerced to a platform sentinel with no _FillValue attr emitted. Fix and tests landed in the previous commit (issue #2504).

Three review fixes on the NaN/int-dtype guard added in the previous commit: 1. Inline comment lied about ordering. The "and before ``_check_output_dimensions``" line claimed the NaN guard ran before the width/height/max_pixels guard, but the code places it after. Rewrite the comment to match the actual order and note why the dimension check goes first (it produces a more actionable diagnostic for oversized grids; both checks still land before any allocation). 2. Docstring for ``fill`` did not mention the new restriction. Add a note pointing the user at ``fill=0`` / ``fill=-9999`` or a floating dtype when ``dtype`` resolves to an integer type, so the constraint surfaces at read time instead of at runtime. 3. ``isinstance(fill, (int, float, np.integer, np.floating))`` listed the two integer types redundantly: ``np.isnan`` on an integer is always False, so an int-typed fill could never reach the raise branch. Drop ``int`` and ``np.integer`` from the tuple; the semantics are unchanged (verified against the same 18-case regression file plus a manual smoke run with bool / np.int32 / np.float64(nan) fills).

brendancol · 2026-05-28T00:32:39Z

Review follow-up

Ran /review-pr. Three things to address on top of the guard from the previous commit. Pushed as e53f48e.

Suggestion 1: inline comment contradicted the actual code order

The block at rasterize.py:3222-3227 said the NaN guard runs "before _check_output_dimensions". It runs after. Reworded to match what the code actually does, and noted that the dimension check goes first because oversized-grid errors are more actionable. Both still land before any allocation.

Suggestion 2: docstring didn't mention the new constraint

The fill docstring (line 2994) just said "Value for pixels not covered by any geometry." No hint that an integer dtype now rejects the default NaN. Added a note pointing at fill=0 / fill=-9999 or a float dtype, so the constraint shows up at read time instead of biting at runtime.

Nit 3: redundant types in the isinstance tuple

isinstance(fill, (int, float, np.integer, np.floating)) listed both integer types pointlessly. np.isnan on an integer is always False, so an int-typed fill never gets to the raise. Dropped int and np.integer; behaviour unchanged. Re-checked with bool / np.int32 / np.float64(nan) smoke plus the 18-case regression file.

Tests

pytest xrspatial/tests/test_rasterize*.py -- 470 passed, 2 skipped on this CUDA host.

What looks good

Guard sits in the public entry before backend dispatch, so all four backends trip it identically with no per-backend duplication.
Error message names the failure mode and gives the caller both escape hatches (explicit int sentinel, or float dtype).
Rewriting TestIntegerDtypeNanFill was the right move. The old version pinned a platform-specific NaN-cast that was never documented behaviour.
18-case regression covers every signed/unsigned int width, the like=<int> branch, all four backends, explicit vs default NaN, the np.float32(nan) edge case, and the float path that the guard intentionally leaves alone.

brendancol added 2 commits May 27, 2026 10:36

github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rasterize: reject NaN fill against integer dtype (#2504)#2512

rasterize: reject NaN fill against integer dtype (#2504)#2512
brendancol wants to merge 3 commits into
mainfrom
deep-sweep-metadata-rasterize-2026-05-27

brendancol commented May 27, 2026

Uh oh!

brendancol commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 27, 2026

Summary

Backend coverage

Test plan

Uh oh!

brendancol commented May 28, 2026

Review follow-up

Suggestion 1: inline comment contradicted the actual code order

Suggestion 2: docstring didn't mention the new constraint

Nit 3: redundant types in the isinstance tuple

Tests

What looks good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant