Skip to content

Fix per-band nodata masking for multi-band reproject#2658

Merged
brendancol merged 3 commits into
mainfrom
issue-2647
May 29, 2026
Merged

Fix per-band nodata masking for multi-band reproject#2658
brendancol merged 3 commits into
mainfrom
issue-2647

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2647

What changed

  • _detect_nodata_raw read only nodatavals[0], so the 3-D reproject worker masked every band with band 0's sentinel. A raster whose bands declared different nodata values leaked the other bands' invalid pixels into the output as valid data.
  • Added _detect_band_nodata, which resolves a per-band source sentinel tuple from nodatavals (or None when one scalar covers every band: single-band rasters, an explicit nodata= arg, or a uniform tuple).
  • Threaded the per-band sentinels through the chunk workers so each band is masked with its own value before resampling. The single resolved output sentinel still handles out-of-bounds fills and the integer cast-back, so output contents and the nodatavals band count stay consistent.

Backend coverage

numpy, cupy, dask+numpy, dask+cupy. All four route through _reproject_chunk_numpy / _reproject_chunk_cupy, which now take an optional band_nodata sequence.

Test plan

  • New TestPerBandNodata covers numpy, dask+numpy, cupy, and dask+cupy with a 3-band raster carrying distinct per-band sentinels (-9999, 255, 0), plus the _detect_band_nodata helper.
  • Confirmed the test fails on the pre-fix scalar path (band 1's 255 leaks) and passes with the fix.
  • Full test_reproject.py suite passes (387 tests). CuPy cases skip without a CUDA runtime.

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
`_detect_nodata_raw` read only the first entry of the `nodatavals` tuple, so the 3-D worker masked every band with band 0's sentinel. Bands with a different declared nodata leaked their invalid pixels into the output as valid data.

Add `_detect_band_nodata`, which resolves a per-band source sentinel tuple from `nodatavals` (or None when one scalar covers every band). Thread it through the chunk workers so each band is masked with its own sentinel before resampling, across numpy, cupy, dask+numpy, and dask+cupy.

The output still uses a single resolved sentinel since every band collapses to it after the resample; `nodatavals` keeps the original band count.

Add a regression test class covering all four backends plus the detection helper.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix per-band nodata masking for multi-band reproject

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None blocking. Worth flagging in passing: _detect_band_nodata assumes the nodatavals tuple is in the same order as the band axis (rasterio's positional convention). That's the standard and the rest of the module already assumes it, so this matches existing behavior. No change needed.

Nits (optional improvements)

  • xrspatial/reproject/_crs_utils.py: the per-band float equality mask (band_data == src_nd) mirrors the existing single-band path, so it stays consistent. Float sentinels that aren't bit-exact wouldn't match, but that's a pre-existing property of the module, not introduced here.

What looks good

  • Root cause is fixed at the source: each band is masked with its own sentinel before resampling, so band 1+ sentinels no longer leak through as valid data.
  • The fix threads one optional band_nodata argument through every dispatch path (in-memory numpy/cupy, dask, dask+cupy, streaming) without changing the scalar output-sentinel contract. Out-of-bounds fills and the integer cast-back still use the single resolved nd, so output contents and the nodatavals band count stay consistent.
  • _detect_band_nodata returns None for the cases that don't need per-band handling (single band, explicit nodata arg, uniform tuple, length mismatch, bare scalar), so the common path is unchanged.
  • Test coverage hits all four backends plus the helper. The regression test was confirmed to fail on the pre-fix scalar path (band 1's 255 leaks) and pass with the fix.

Checklist

  • Algorithm matches intended behavior (per-band source masking)
  • All four backends route through the same per-band logic
  • NaN handling correct (None entries become NaN, leaving that band unmasked)
  • Edge cases covered (single-entry tuple, length mismatch, None entry, bare scalar)
  • Dask chunk boundaries unchanged; band axis stays one chunk
  • No premature materialization introduced; lazy paths preserved
  • Benchmark not needed (bug fix, no perf change)
  • README feature matrix not applicable (no new function, no backend change)
  • Docstrings present on the new helper

# Conflicts:
#	xrspatial/reproject/__init__.py
@brendancol brendancol merged commit 3aca1b5 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: multi-band per-band nodata is flattened to one sentinel

1 participant