Fix per-band nodata masking for multi-band reproject#2658
Merged
Conversation
Dedupe duplicate module rows (last-write-wins by last_inspected) and collapse multi-line notes to single physical lines. The notes had embedded newlines, which the merge=union .gitattributes strategy splits record-by-record, corrupting the file into a 156-column phantom row on parallel-agent appends. One line per record keeps union merges safe.
`_detect_nodata_raw` read only the first entry of the `nodatavals` tuple, so the 3-D worker masked every band with band 0's sentinel. Bands with a different declared nodata leaked their invalid pixels into the output as valid data. Add `_detect_band_nodata`, which resolves a per-band source sentinel tuple from `nodatavals` (or None when one scalar covers every band). Thread it through the chunk workers so each band is masked with its own sentinel before resampling, across numpy, cupy, dask+numpy, and dask+cupy. The output still uses a single resolved sentinel since every band collapses to it after the resample; `nodatavals` keeps the original band count. Add a regression test class covering all four backends plus the detection helper.
brendancol
commented
May 29, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Fix per-band nodata masking for multi-band reproject
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None blocking. Worth flagging in passing: _detect_band_nodata assumes the nodatavals tuple is in the same order as the band axis (rasterio's positional convention). That's the standard and the rest of the module already assumes it, so this matches existing behavior. No change needed.
Nits (optional improvements)
xrspatial/reproject/_crs_utils.py: the per-band float equality mask (band_data == src_nd) mirrors the existing single-band path, so it stays consistent. Float sentinels that aren't bit-exact wouldn't match, but that's a pre-existing property of the module, not introduced here.
What looks good
- Root cause is fixed at the source: each band is masked with its own sentinel before resampling, so band 1+ sentinels no longer leak through as valid data.
- The fix threads one optional
band_nodataargument through every dispatch path (in-memory numpy/cupy, dask, dask+cupy, streaming) without changing the scalar output-sentinel contract. Out-of-bounds fills and the integer cast-back still use the single resolvednd, so output contents and thenodatavalsband count stay consistent. _detect_band_nodatareturns None for the cases that don't need per-band handling (single band, explicit nodata arg, uniform tuple, length mismatch, bare scalar), so the common path is unchanged.- Test coverage hits all four backends plus the helper. The regression test was confirmed to fail on the pre-fix scalar path (band 1's 255 leaks) and pass with the fix.
Checklist
- Algorithm matches intended behavior (per-band source masking)
- All four backends route through the same per-band logic
- NaN handling correct (None entries become NaN, leaving that band unmasked)
- Edge cases covered (single-entry tuple, length mismatch, None entry, bare scalar)
- Dask chunk boundaries unchanged; band axis stays one chunk
- No premature materialization introduced; lazy paths preserved
- Benchmark not needed (bug fix, no perf change)
- README feature matrix not applicable (no new function, no backend change)
- Docstrings present on the new helper
# Conflicts: # xrspatial/reproject/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2647
What changed
_detect_nodata_rawread onlynodatavals[0], so the 3-D reproject worker masked every band with band 0's sentinel. A raster whose bands declared different nodata values leaked the other bands' invalid pixels into the output as valid data._detect_band_nodata, which resolves a per-band source sentinel tuple fromnodatavals(orNonewhen one scalar covers every band: single-band rasters, an explicitnodata=arg, or a uniform tuple).nodatavalsband count stay consistent.Backend coverage
numpy, cupy, dask+numpy, dask+cupy. All four route through
_reproject_chunk_numpy/_reproject_chunk_cupy, which now take an optionalband_nodatasequence.Test plan
TestPerBandNodatacovers numpy, dask+numpy, cupy, and dask+cupy with a 3-band raster carrying distinct per-band sentinels (-9999,255,0), plus the_detect_band_nodatahelper.255leaks) and passes with the fix.test_reproject.pysuite passes (387 tests). CuPy cases skip without a CUDA runtime.