Skip to content

Apply LERC valid-mask in GPU decode path (depends on #1529)#1535

Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix/lerc-valid-mask-gpu
May 9, 2026
Merged

Apply LERC valid-mask in GPU decode path (depends on #1529)#1535
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix/lerc-valid-mask-gpu

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #1529. The CPU LERC reader applies the LERC valid-mask
and writes nodata into masked positions, but the GPU LERC tile-decode
path in _gpu_decode.py was still dropping the mask. A masked pixel
read back as 0 on GPU and as NaN or the sentinel on CPU.

This PR fixes the GPU side:

  • The GPU LERC branch calls lerc_decompress_with_mask per tile and
    keeps any returned mask.
  • After tile assembly, an invalid-mask sized to the output image is
    built on host, copied to the GPU once, and used to write the
    resolved fill value into masked positions.
  • gpu_decode_tiles and gpu_decode_tiles_from_file get a
    masked_fill= kwarg that read_geotiff_gpu populates via
    _resolve_masked_fill(ifd.nodata_str, file_dtype) when compression
    is LERC.

Depends on #1529 (uses lerc_decompress_with_mask and
_resolve_masked_fill introduced there). Land #1529 first; this PR
should rebase cleanly off main once it is merged.

Test plan

  • pytest xrspatial/geotiff/tests/test_lerc_valid_mask_gpu.py passes (4 tests).
  • pytest xrspatial/geotiff/tests/test_lerc.py test_lerc_max_z_error.py test_gpu_byteswap_1508.py test_lerc_valid_mask.py passes (38 tests).
  • Manual repro: float32 LERC TIFF with a masked pixel read via read_geotiff_gpu returns NaN at the masked position (matches CPU); without the fix it returned 0.0.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

@copilot review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns GPU GeoTIFF LERC decoding with the CPU reader by preserving and applying LERC’s per-pixel valid-mask so masked pixels are restored to the file’s nodata value (or NaN fallback for floating dtypes), instead of leaking LERC’s zero-fill into outputs.

Changes:

  • Plumb LERC valid-mask through the GPU tile decode path and apply a post-assembly masked fill on-device.
  • Add a shared _resolve_masked_fill() helper in the CPU reader and use it from read_geotiff_gpu to match CPU nodata semantics for LERC.
  • Add CPU and GPU test coverage for LERC masked pixels across NaN nodata, float sentinel nodata, uint16 sentinel nodata, and no-mask round-trips.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
xrspatial/geotiff/_gpu_decode.py Capture per-tile LERC valid masks and apply masked-fill after GPU tile assembly.
xrspatial/geotiff/_reader.py Add _resolve_masked_fill() and apply LERC valid-mask during CPU decode for tiles/strips/COG HTTP reads.
xrspatial/geotiff/_compression.py Introduce lerc_decompress_with_mask() and keep lerc_decompress() backward compatible (drops mask).
xrspatial/geotiff/__init__.py Pass masked_fill into GPU tile decode when compression is LERC.
xrspatial/geotiff/tests/test_lerc_valid_mask.py New CPU tests validating wrapper behavior and TIFF round-trips with/without masks.
xrspatial/geotiff/tests/test_lerc_valid_mask_gpu.py New GPU tests ensuring read_geotiff_gpu matches CPU output for masked LERC pixels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2008 to +2015
if not invalid.any():
return out

d_invalid = cupy.asarray(invalid)
if out.ndim == 3:
# Broadcast (H, W) mask across the sample axis.
out[d_invalid, ...] = out.dtype.type(masked_fill)
else:
brendancol added 2 commits May 9, 2026 14:32
)

The CPU LERC reader from xarray-contrib#1529 honours the LERC valid-mask and writes
the file's nodata sentinel into masked pixels. The GPU LERC tile-decode
path was still dropping the mask, so masked pixels read back as 0 on
GPU but as NaN or the sentinel on CPU. Same bug, GPU side.

Changes:

_gpu_decode.py: the LERC branch now calls lerc_decompress_with_mask
per tile and keeps any returned valid-mask. After predictor decode and
tile assembly, _apply_lerc_mask_fill builds an invalid mask on host
(matching the GPU assembly kernel's tile-grid layout), copies it to
GPU once, and overwrites masked positions with the resolved fill
value. Tiles LERC reports as fully valid skip the host work, so the
no-mask path stays zero-copy.

gpu_decode_tiles and gpu_decode_tiles_from_file get a masked_fill
keyword that is forwarded through. read_geotiff_gpu computes it via
_resolve_masked_fill(ifd.nodata_str, file_dtype) for LERC sources.

tests/test_lerc_valid_mask_gpu.py: 4 tests covering float32+NaN,
float32+sentinel, uint16+sentinel, and the no-mask regression. Each
compares read_geotiff_gpu output to read_to_array output for the same
file. Skipped unless cupy + CUDA + lerc are available.

Out of scope: the encode side. The xrspatial writer still hard-codes
hasMask=False; the tests reuse the lerc_compress monkeypatch fixture
from the CPU PR to inject a valid-mask through lerc.encode directly.
_apply_lerc_mask_fill bypassed _check_gpu_memory when allocating the
H*W boolean invalid mask and the cupy temporary backing the boolean
indexing assignment. On near-budget decodes that pushed the GPU into
OOM during the mask transfer rather than failing fast with the usual
xrspatial budget message. Add explicit budget checks for both the
mask buffer and the index temporary (worst-case one int64 per pixel)
so the LERC path stays under the same envelope as the rest of the
GPU decode.
@brendancol brendancol force-pushed the fix/lerc-valid-mask-gpu branch from 0ee0f5a to bf4fdb8 Compare May 9, 2026 21:32
@brendancol brendancol merged commit c41dfa6 into xarray-contrib:main May 9, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants