geotiff: align eager nodata masking with the lazy/dask path for int sources (#2990)#2994
Conversation
…it (#2990) The eager numpy/GPU mask helper only promoted an integer array to float64 when at least one pixel matched the declared sentinel. The dask path declares float64 up front from the same in-range gate and stamps masked_nodata=True before any chunk decodes, so the two paths returned different dtypes and masked_nodata values for the same file when no sentinel pixel was present. Promote unconditionally in _apply_eager_nodata_mask whenever masking is on and the sentinel is maskable (finite, integer, in-range), matching the dask path and rioxarray's masked=True. The promotion gate is unchanged for out-of-range / non-finite / fractional sentinels, which stay integer with masked_nodata=False on both paths. nodata_pixels_present still records whether a pixel matched; it stays absent on the lazy path per the documented dask contract (no eager compute). Updates the eager no-hit test to the corrected contract and adds cross-backend parity tests.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: align eager nodata masking with the lazy/dask path for int sources (#2990)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None blocking. One thing worth confirming rather than changing: in _apply_eager_nodata_mask (xrspatial/geotiff/_attrs.py:1515-1517) the mask is now computed after the float64 cast (mask = arr == np.float64(nodata_int)), whereas before it ran on the integer buffer. For the in-range integer dtypes that TIFF actually uses (uint8/16, int8/16/32) the cast round-trips exactly through float64, so the equality is unchanged. A sentinel above 2^53 on an int64/uint64 source could in theory lose precision, but the original code had the same exact-cast assumption via arr.dtype.type(nodata_int), and 64-bit integer TIFF rasters with such sentinels are not a path this reader exercises. No action needed unless 64-bit integer sources become a supported tier.
Nits (optional improvements)
None.
What looks good
- The fix is the minimal change: drop the matching-pixel gate on the float promotion, keep the finite/integer/in-range gate that protects against unmatchable sentinels. The out-of-range and non-finite branches are untouched, so those still keep the integer dtype on both paths.
- The direction is correct. rioxarray's
open_rasterio(masked=True)promotes an integer source to float unconditionally, and the dask path already did the same, so aligning the eager path to the lazy path (rather than the reverse) matches the ecosystem and avoids forcing an eager compute on dask. nodata_pixels_presentkeeps its separate meaning and is still computed eagerly, so the did-any-pixel-match signal is preserved whilemasked_nodatabecomes the consistent did-masking-run flag.- The GPU eager path routes through the same helper via
_finalize_eager_read, so cupy picks up the fix without a separate kernel. - Tests cover the three relevant cases (no-hit, hit, out-of-range) and assert eager-vs-dask agreement directly. The stale eager no-hit test was updated to the corrected contract rather than deleted.
- Docstrings and
attrs_contract.rstwere updated so the contract text no longer claims promotion depends on a matching pixel.
Checklist
- Algorithm matches reference (rioxarray masked=True semantics; dask path)
- All implemented backends produce consistent results (eager numpy/GPU now match dask)
- NaN handling is correct (sentinel pixels still rewritten to NaN when present)
- Edge cases are covered by tests (no-hit, hit, out-of-range)
- Dask chunk boundaries handled correctly (dask path unchanged, used as reference)
- No premature materialization (lazy path untouched; no eager compute added)
- Benchmark not needed (no new function, masking cost unchanged in the common hit case)
- README feature matrix not applicable (no new function, no backend change)
- Docstrings present and accurate (updated for the new contract)
Scope note
The PR explicitly leaves the VRT masking helpers (_vrt.py) alone. They carry per-band sentinel handling and their own documented divergence, and the VRT lazy graph dtype comes from _effective_dtype_for_bands rather than the in-range sentinel gate, so the VRT eager/lazy story is self-contained and different from the main reader. Folding it into this PR would expand scope into a separate subsystem. A follow-up issue for VRT parity would be reasonable.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (#2990)
No code changes were needed after the first review. The first pass found no blockers, suggestions, or nits requiring edits.
Disposition of the one confirm-don't-change item (float-cast equality precision in _apply_eager_nodata_mask): dismissed. The mask now runs after the float64 cast, but for the integer dtypes TIFF actually uses (uint8/16, int8/16/32) the cast round-trips exactly, so the equality is unchanged. The prior code carried the same exact-cast assumption via arr.dtype.type(nodata_int), and 64-bit integer sentinels above 2^53 are not a path this reader exercises. No precision regression.
The full geotiff suite passes (6177 passed, 81 skipped, 1 xfailed), including the new cross-backend parity tests and the updated eager no-hit test.
Closes #2990
Problem
open_geotiff(..., masked=True)on an integer source with a declared, in-range nodata sentinel returned different results from the eager path than from the lazy (dask) path when no pixel matched the sentinel:uint16,masked_nodata=Falsefloat64,masked_nodata=TrueThe dask path declares
float64up front from the in-range sentinel gate and stampsmasked_nodata=Truebefore any chunk decodes. The eager helper only promoted tofloat64after finding a matching pixel, so the same file came back two different ways depending onchunks=. This matters becausemasked_nodatais read as semantic state: the writer's NaN-to-sentinel rewrite and the GPU writer both key off it, not just off the dtype.Fix
_apply_eager_nodata_masknow promotes a maskable integer source tofloat64whenever masking is on, independent of whether any pixel matches. This matches the dask path and rioxarray'sopen_rasterio(..., masked=True), both of which promote unconditionally. The promotion gate is unchanged for out-of-range, non-finite, and fractional sentinels, which still keep the source integer dtype and reportmasked_nodata=Falseon both paths.nodata_pixels_presentkeeps its existing meaning (did any pixel match) and stays absent on the lazy path per the documented dask contract. No eager compute is forced on the dask path.Backend coverage
_finalize_eager_read, so it picks up the fixTest plan
TestEagerLazyMaskParity2990asserts eager and dask agree on dtype andmasked_nodatafor the no-hit, hit, and out-of-range casesmasked_nodata=True+nodata_pixels_present=False)_apply_eager_nodata_mask,open_geotiff, module contract) andattrs_contract.rstScope note
The VRT eager/chunked masking helpers (
_vrt.py) carry their own per-band sentinel handling and a separate documented divergence; they are left unchanged here. Worth a follow-up look but out of scope for this fix.