From 2f15616a131cba71dcf5552fdd525119343cd1a2 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Fri, 15 May 2026 06:22:43 -0700 Subject: [PATCH 1/2] geotiff: dask streaming write coverage for 1x1/1xN/Nx1 + NaN/Inf Test coverage gap sweep pass 14 (2026-05-15). The dask streaming write path through to_geotiff on a dask-backed DataArray (#1084) lacked degenerate-shape and special-float coverage: * No 1x1, 1xN, or Nx1 dask streaming writes were tested. The tile-row segmenter (#1485) and the chunk-merge math on a 1-pixel-tall or 1-pixel-wide raster are reachable only via this path. * No all-NaN dask write -- test_streaming_write covers a 100x100 with an embedded NaN block but never the boundary case where every chunk the writer streams is entirely NaN and must mask to the nodata sentinel. * No +Inf / -Inf coverage on the dask streaming path. +Inf and -Inf are valid IEEE-754 floats and must round-trip bit-exactly; only NaN is treated as nodata. * No predictor=3 (TIFF float predictor) coverage on the dask streaming path. The existing test_streaming_write.test_predictor case covers predictor=True (=2) only. Add 16 tests in xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py exercising 1x1 / 1xN / Nx1 round-trips with chunk grids that both match and split the raster shape, all-NaN with and without a sentinel, mixed NaN / +Inf / -Inf (Inf preserved, NaN masked), all-+Inf and all--Inf, plus predictor=3 lossless round-trip on float32/float64 and the int-dtype ValueError guard. --- .claude/sweep-test-coverage-state.csv | 2 +- ...k_streaming_write_degenerate_2026_05_15.py | 272 ++++++++++++++++++ 2 files changed, 273 insertions(+), 1 deletion(-) create mode 100644 xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py diff --git a/.claude/sweep-test-coverage-state.csv b/.claude/sweep-test-coverage-state.csv index c1436b594..ac9ff9e09 100644 --- a/.claude/sweep-test-coverage-state.csv +++ b/.claude/sweep-test-coverage-state.csv @@ -1,3 +1,3 @@ module,last_inspected,issue,severity_max,categories_found,notes -geotiff,2026-05-13,1776,HIGH,4,"Pass 13 (2026-05-13): added test_size_param_validation_gpu_vrt_1776.py closing Cat 4 HIGH parameter-coverage gap on size-arg validation. Issue #1752 added tile_size validation to to_geotiff and chunks validation to read_geotiff_dask, but the matching kwargs on three sibling entry points were left unchecked: write_geotiff_gpu(tile_size=) raised ZeroDivisionError for 0, struct.error for -1, TypeError for 256.0; read_geotiff_gpu(chunks=) and read_vrt(chunks=) raised ZeroDivisionError for 0 and silently accepted negative values. Factored two shared validators (_validate_tile_size_arg, _validate_chunks_arg) and called them up front from each entry point. 34 new tests, all passing on GPU host: tile_size matrix on write_geotiff_gpu (0/-1/256.0/True/False/positive/np.int64), chunks matrix on read_geotiff_gpu and read_vrt (0/-1/(0,N)/(N,-1)/wrong-length/bool/non-int/(N,float)/positive/np.int64), dispatcher thread-through tests (open_geotiff(gpu=True, chunks=0), to_geotiff(gpu=True, tile_size=0)). Pre-existing 13 #1752 tests still pass after refactor. Filed issue #1776. Pass 12 (2026-05-12): added test_gpu_writer_overview_mode_and_compression_level_1740.py closing Cat 4 HIGH and Cat 4 MEDIUM parameter-coverage gaps. (1) write_geotiff_gpu(overview_resampling='mode') and the dedicated _block_reduce_2d_gpu mode-fallback branch (_gpu_decode.py:3051-3056) had zero direct tests; six of the seven overview_resampling modes were covered (mean/nearest by test_features, min/max/median by pass 6, cubic by test_signature_parity_1631) but mode was the odd one out -- a regression dropping the mode dispatch from _block_reduce_2d_gpu would fall through to the mean reshape branch and emit wrong overview pixels for integer rasters. (2) write_geotiff_gpu(compression_level=) documented as accepted-but-ignored had no test; the CPU writer rejects out-of-range levels with ValueError, the GPU writer is documented not to -- a regression wiring the GPU writer up to the CPU range validator would silently break every to_geotiff(gpu=True, compression_level=X) caller for in-range levels and noisily for out-of-range. 19 tests, all passing on GPU host: _block_reduce_2d_gpu(method='mode') CPU-parity on 4x4 deterministic + random 8x8 + dtype-preserved across u8/u16/i16/i32, write_geotiff_gpu(cog=True, overview_resampling='mode') end-to-end round trip, to_geotiff(gpu=True, ..., overview_resampling='mode') dispatcher thread-through, GPU-vs-CPU pixel parity on 8x8 input, write_geotiff_gpu(compression_level=) in-range matrix on zstd/deflate, out-of-range matrix (zstd=999/-5, deflate=50/0) accepted without raising + round-trip preserved, to_geotiff(gpu=True, compression_level=999) dispatcher thread-through, companion CPU rejects-OOR pin to lock the asymmetry. Mutation against the mode branch (drop the 'if method == mode' block in _block_reduce_2d_gpu) flipped 9 mode tests red. Filed issue #1740. Pass 11 (2026-05-12): added test_gpu_writer_cpu_fallback_codecs_2026_05_12.py closing a Cat 4 HIGH parameter-coverage gap on write_geotiff_gpu compression= modes for the CPU-fallback codecs (lzw, packbits, lz4, lerc, jpeg2000/j2k). Pass 7 (test_gpu_writer_compression_modes_2026_05_11) covered only none/deflate/zstd/jpeg; the remaining five codecs route through dedicated branches in gpu_compress_tiles (_gpu_decode.py:2974-3019) with CPU fallbacks (lerc_compress, jpeg2000_compress, cpu_compress) that had zero direct tests via write_geotiff_gpu. A regression in routing/tag-wiring/fallback dispatch would ship silently because the internal reader uses the same compression-tag table. 17 tests, all passing on GPU host: lzw/packbits/lz4 round-trip + compression-tag pin on uint16, lerc lossless float32 + uint16 round-trip + tag pin, jpeg2000 uint8 single-band + RGB multi-band lossless round-trip + j2k-alias parity + tag pin, GPU-vs-CPU writer pixel parity for lzw/packbits, to_geotiff(gpu=True, compression=lzw/packbits) dispatcher thread-through. Mutation against compression dispatch (swap lzw bytes to zstd; swap lerc bytes to deflate) flipped round-trip tests red. Filed issue #1706. Pass 10 (2026-05-12): added test_kwarg_behaviour_2026_05_12_v2.py closing two Cat 4 HIGH parameter-coverage gaps. (1) write_geotiff_gpu(predictor=True/2/3) had zero direct tests; the GPU writer threads predictor= through normalize_predictor and gpu_compress_tiles into five CUDA encode kernels (_predictor_encode_kernel_u8/u16/u32/u64 for predictor=2, _fp_predictor_encode_kernel for predictor=3) and a regression dropping the encode-kernel calls would ship corrupt files. (2) read_vrt(window=) had no behaviour tests (only a signature pin in test_signature_annotations_1654); the kwarg is documented and _vrt.read_vrt implements full windowed-read semantics (clip, multi-source overlap, src/dst scaling, GeoTransform origin shift on coords + attrs['transform']). 23 tests, all passing on GPU host: predictor=True/2 round-trips on u8/u16/i32 + 3-band RGB samples_per_pixel stride; predictor=3 lossless round-trip on f32 and f64; predictor=3 int-dtype ValueError (CPU/GPU parity); CPU/GPU pixel-exact parity for pred=2 u16 and pred=3 f32; read_vrt(window=) subregion + full + clamp-overflow + clamp-negative + 2x1 mosaic seam straddle + offset past seam + transform-attr origin shift + y/x coords half-pixel shift + window+band + window+chunks (dask) + window+gpu (cupy) + window+gpu+chunks (dask+cupy). Mutation against the encode dispatch flipped 7 predictor tests red. Filed issue #1690. Pass 9 (2026-05-12): added test_kwarg_behaviour_2026_05_12.py closing three Cat 4 MEDIUM parameter-coverage gaps plus one Cat 4 LOW error path. write_vrt documented kwargs (relative/crs_wkt/nodata) had a smoke-test pinning that the kwargs are accepted but no test verified the override *effect* -- a regression dropping the override branch and silently using the default-from-first-source would ship undetected. read_geotiff_gpu(dtype=) cast had zero direct tests; the eager path has TestDtypeEager and dask has TestDtypeDask but the GPU branch had no equivalent. write_geotiff_gpu(bigtiff=) threads through to _assemble_tiff(force_bigtiff=) but no test asserted the on-disk header byte switches; the CPU writer had it via test_features::test_force_bigtiff_via_public_api. write_vrt(source_files=[]) ValueError was uncovered. 26 tests, all passing on GPU host: write_vrt relative=True/False XML attribute + path inspection + parse-back round-trip, write_vrt crs_wkt= override distinct-from-default XML check, write_vrt nodata= override + default-from-source coverage, write_vrt([]) ValueError + no-file side effect, read_geotiff_gpu dtype= matrix (float64->float32, float64->float16, uint16->int32, uint16->uint8, float-to-int raise, dtype=None preserves native), open_geotiff(gpu=True, dtype=) dispatcher, read_geotiff_gpu(chunks=, dtype=) dask+GPU branch, write_geotiff_gpu bigtiff=True/False/None header verification, to_geotiff(gpu=True, bigtiff=True) dispatcher thread-through. Pass 8 (2026-05-11): added test_lz4_compression_level_2026_05_11.py closing Cat 4 MEDIUM parameter-coverage gap on compression='lz4' + compression_level=. _LEVEL_RANGES advertises lz4: (0, 16) but only deflate (1, 9) and zstd (1, 22) had direct level boundary + round-trip + reject tests. The range check is the gatekeeper -- lz4_compress silently accepts any int level -- so a regression dropping 'lz4' from _LEVEL_RANGES would ship undetected. 18 tests, all passing: round-trip at levels 0/1/9/16 (lossless), default-level no-arg path, higher-level-not-larger smoke check on compressible input, out-of-range reject at -1/-10/17/100 on eager path, valid-range message format pin (lz4 valid: 0-16), dask streaming round-trip at 0/1/8/16, dask streaming out-of-range reject at -1/17/50 (separate _LEVEL_RANGES call site). Pass 7 (2026-05-11): added test_gpu_writer_compression_modes_2026_05_11.py closing Cat 4 HIGH gap on write_geotiff_gpu compression= modes. The writer documents zstd (default, fastest GPU), deflate, jpeg, and none, but only deflate + none had round-trip tests; the default zstd and the jpeg (nvJPEG/Pillow) paths shipped without targeted coverage. 11 new tests, all passing on GPU host: zstd round-trip + default-codec pinning, jpeg round-trip on 3-band RGB uint8 + 1-band greyscale, TIFF compression-tag header check across none/deflate/zstd/jpeg, plain deflate + none round-trips outside the COG/sentinel paths, and a cross-codec lossless parity check (zstd/deflate/none agree pixel-exact). nvJPEG path was exercised live, not just the Pillow fallback. Pass 6 (2026-05-11): added test_overview_resampling_min_max_median_2026_05_11.py covering Cat 4 HIGH parameter-coverage gap on overview_resampling=min/max/median. CPU end-to-end paths were already covered by test_cog_overview_nodata_1613::test_cpu_cog_overview_aggregations_ignore_sentinel; the GPU end-to-end paths and the direct CPU+GPU block-reducer branches had no targeted tests, so a regression on those code paths would ship undetected. 26 tests, all passing on GPU host: block-reducer unit tests (finite + partial-NaN), end-to-end COG writes for both to_geotiff and write_geotiff_gpu, CPU/GPU parity for to_geotiff(gpu=True), CPU nodata-sentinel regression check, and ValueError error-path tests for unknown method names on both backends. Pass 5 (2026-05-11): added test_degenerate_shapes_backends_2026_05_11.py covering Cat 3 HIGH geometric gaps (1x1 / 1xN / Nx1 reads on dask+numpy, GPU, dask+cupy backends; 1x1 / 1xN / Nx1 writes through write_geotiff_gpu) and Cat 2 MEDIUM NaN/Inf gaps (all-NaN read on GPU + dask+cupy, Inf / -Inf reads on all non-eager backends, NaN sentinel mask on dask read path including sentinel block split across chunk boundary). 23 tests, all passing on GPU host. Prior passes still hold: pass 4 (r4) closed read_geotiff_gpu/dask name= + max_pixels= kwargs (Cat 4), pass 3 (r3) closed read_vrt GPU/dask+GPU backend dispatch (Cat 1) and dtype/name kwargs (Cat 4)." +geotiff,2026-05-15,,HIGH,2;3;4,"Pass 14 (2026-05-15): added test_dask_streaming_write_degenerate_2026_05_15.py closing Cat 3 HIGH and Cat 2 HIGH/MEDIUM gaps on the dask streaming write path (to_geotiff with dask-backed DataArray, #1084). test_streaming_write.py covered 100x100 with a NaN block plus a 2x2 small raster but had nothing 1-pixel-row, 1-pixel-column, all-NaN, all-Inf, or +/-Inf-mixed. The streaming tile-row segmenter (#1485) on a 1-pixel-tall raster and the streaming nodata-mask coercion on an all-NaN chunk were reachable only with a dask input and had no direct coverage; a regression on either would not surface from the eager numpy path or the write_geotiff_gpu path (pass 5 covered the GPU writer's degenerate shapes). 16 new tests, all passing: 1x1 chunk-matches-shape + nodata-attr round-trip + uint16, 1xN single chunk + chunks-split-columns + wide-segmented-by-buffer (#1485 streaming_buffer_bytes=1 forces the segmenter), Nx1 single chunk + chunks-split-rows, all-NaN with finite sentinel + all-NaN without sentinel, mixed NaN/+Inf/-Inf preserving Inf bit-exact + sentinel masking NaN only, all-+Inf and all--Inf, predictor=3 (float predictor) round-trip on float32 + float64 plus int-dtype ValueError. predictor=3 streaming coverage extends prior test_streaming_write.test_predictor (which only covers predictor=True / predictor=2). Cat 3 HIGH (1x1/1xN/Nx1) + Cat 2 HIGH (all-NaN with sentinel) + Cat 2 MEDIUM (mixed-Inf, all-Inf) + Cat 4 MEDIUM (predictor=3 streaming). Pass 13 (2026-05-13): added test_size_param_validation_gpu_vrt_1776.py closing Cat 4 HIGH parameter-coverage gap on size-arg validation. Issue #1752 added tile_size validation to to_geotiff and chunks validation to read_geotiff_dask, but the matching kwargs on three sibling entry points were left unchecked: write_geotiff_gpu(tile_size=) raised ZeroDivisionError for 0, struct.error for -1, TypeError for 256.0; read_geotiff_gpu(chunks=) and read_vrt(chunks=) raised ZeroDivisionError for 0 and silently accepted negative values. Factored two shared validators (_validate_tile_size_arg, _validate_chunks_arg) and called them up front from each entry point. 34 new tests, all passing on GPU host: tile_size matrix on write_geotiff_gpu (0/-1/256.0/True/False/positive/np.int64), chunks matrix on read_geotiff_gpu and read_vrt (0/-1/(0,N)/(N,-1)/wrong-length/bool/non-int/(N,float)/positive/np.int64), dispatcher thread-through tests (open_geotiff(gpu=True, chunks=0), to_geotiff(gpu=True, tile_size=0)). Pre-existing 13 #1752 tests still pass after refactor. Filed issue #1776. Pass 12 (2026-05-12): added test_gpu_writer_overview_mode_and_compression_level_1740.py closing Cat 4 HIGH and Cat 4 MEDIUM parameter-coverage gaps. (1) write_geotiff_gpu(overview_resampling='mode') and the dedicated _block_reduce_2d_gpu mode-fallback branch (_gpu_decode.py:3051-3056) had zero direct tests; six of the seven overview_resampling modes were covered (mean/nearest by test_features, min/max/median by pass 6, cubic by test_signature_parity_1631) but mode was the odd one out -- a regression dropping the mode dispatch from _block_reduce_2d_gpu would fall through to the mean reshape branch and emit wrong overview pixels for integer rasters. (2) write_geotiff_gpu(compression_level=) documented as accepted-but-ignored had no test; the CPU writer rejects out-of-range levels with ValueError, the GPU writer is documented not to -- a regression wiring the GPU writer up to the CPU range validator would silently break every to_geotiff(gpu=True, compression_level=X) caller for in-range levels and noisily for out-of-range. 19 tests, all passing on GPU host: _block_reduce_2d_gpu(method='mode') CPU-parity on 4x4 deterministic + random 8x8 + dtype-preserved across u8/u16/i16/i32, write_geotiff_gpu(cog=True, overview_resampling='mode') end-to-end round trip, to_geotiff(gpu=True, ..., overview_resampling='mode') dispatcher thread-through, GPU-vs-CPU pixel parity on 8x8 input, write_geotiff_gpu(compression_level=) in-range matrix on zstd/deflate, out-of-range matrix (zstd=999/-5, deflate=50/0) accepted without raising + round-trip preserved, to_geotiff(gpu=True, compression_level=999) dispatcher thread-through, companion CPU rejects-OOR pin to lock the asymmetry. Mutation against the mode branch (drop the 'if method == mode' block in _block_reduce_2d_gpu) flipped 9 mode tests red. Filed issue #1740. Pass 11 (2026-05-12): added test_gpu_writer_cpu_fallback_codecs_2026_05_12.py closing a Cat 4 HIGH parameter-coverage gap on write_geotiff_gpu compression= modes for the CPU-fallback codecs (lzw, packbits, lz4, lerc, jpeg2000/j2k). Pass 7 (test_gpu_writer_compression_modes_2026_05_11) covered only none/deflate/zstd/jpeg; the remaining five codecs route through dedicated branches in gpu_compress_tiles (_gpu_decode.py:2974-3019) with CPU fallbacks (lerc_compress, jpeg2000_compress, cpu_compress) that had zero direct tests via write_geotiff_gpu. A regression in routing/tag-wiring/fallback dispatch would ship silently because the internal reader uses the same compression-tag table. 17 tests, all passing on GPU host: lzw/packbits/lz4 round-trip + compression-tag pin on uint16, lerc lossless float32 + uint16 round-trip + tag pin, jpeg2000 uint8 single-band + RGB multi-band lossless round-trip + j2k-alias parity + tag pin, GPU-vs-CPU writer pixel parity for lzw/packbits, to_geotiff(gpu=True, compression=lzw/packbits) dispatcher thread-through. Mutation against compression dispatch (swap lzw bytes to zstd; swap lerc bytes to deflate) flipped round-trip tests red. Filed issue #1706. Pass 10 (2026-05-12): added test_kwarg_behaviour_2026_05_12_v2.py closing two Cat 4 HIGH parameter-coverage gaps. (1) write_geotiff_gpu(predictor=True/2/3) had zero direct tests; the GPU writer threads predictor= through normalize_predictor and gpu_compress_tiles into five CUDA encode kernels (_predictor_encode_kernel_u8/u16/u32/u64 for predictor=2, _fp_predictor_encode_kernel for predictor=3) and a regression dropping the encode-kernel calls would ship corrupt files. (2) read_vrt(window=) had no behaviour tests (only a signature pin in test_signature_annotations_1654); the kwarg is documented and _vrt.read_vrt implements full windowed-read semantics (clip, multi-source overlap, src/dst scaling, GeoTransform origin shift on coords + attrs['transform']). 23 tests, all passing on GPU host: predictor=True/2 round-trips on u8/u16/i32 + 3-band RGB samples_per_pixel stride; predictor=3 lossless round-trip on f32 and f64; predictor=3 int-dtype ValueError (CPU/GPU parity); CPU/GPU pixel-exact parity for pred=2 u16 and pred=3 f32; read_vrt(window=) subregion + full + clamp-overflow + clamp-negative + 2x1 mosaic seam straddle + offset past seam + transform-attr origin shift + y/x coords half-pixel shift + window+band + window+chunks (dask) + window+gpu (cupy) + window+gpu+chunks (dask+cupy). Mutation against the encode dispatch flipped 7 predictor tests red. Filed issue #1690. Pass 9 (2026-05-12): added test_kwarg_behaviour_2026_05_12.py closing three Cat 4 MEDIUM parameter-coverage gaps plus one Cat 4 LOW error path. write_vrt documented kwargs (relative/crs_wkt/nodata) had a smoke-test pinning that the kwargs are accepted but no test verified the override *effect* -- a regression dropping the override branch and silently using the default-from-first-source would ship undetected. read_geotiff_gpu(dtype=) cast had zero direct tests; the eager path has TestDtypeEager and dask has TestDtypeDask but the GPU branch had no equivalent. write_geotiff_gpu(bigtiff=) threads through to _assemble_tiff(force_bigtiff=) but no test asserted the on-disk header byte switches; the CPU writer had it via test_features::test_force_bigtiff_via_public_api. write_vrt(source_files=[]) ValueError was uncovered. 26 tests, all passing on GPU host: write_vrt relative=True/False XML attribute + path inspection + parse-back round-trip, write_vrt crs_wkt= override distinct-from-default XML check, write_vrt nodata= override + default-from-source coverage, write_vrt([]) ValueError + no-file side effect, read_geotiff_gpu dtype= matrix (float64->float32, float64->float16, uint16->int32, uint16->uint8, float-to-int raise, dtype=None preserves native), open_geotiff(gpu=True, dtype=) dispatcher, read_geotiff_gpu(chunks=, dtype=) dask+GPU branch, write_geotiff_gpu bigtiff=True/False/None header verification, to_geotiff(gpu=True, bigtiff=True) dispatcher thread-through. Pass 8 (2026-05-11): added test_lz4_compression_level_2026_05_11.py closing Cat 4 MEDIUM parameter-coverage gap on compression='lz4' + compression_level=. _LEVEL_RANGES advertises lz4: (0, 16) but only deflate (1, 9) and zstd (1, 22) had direct level boundary + round-trip + reject tests. The range check is the gatekeeper -- lz4_compress silently accepts any int level -- so a regression dropping 'lz4' from _LEVEL_RANGES would ship undetected. 18 tests, all passing: round-trip at levels 0/1/9/16 (lossless), default-level no-arg path, higher-level-not-larger smoke check on compressible input, out-of-range reject at -1/-10/17/100 on eager path, valid-range message format pin (lz4 valid: 0-16), dask streaming round-trip at 0/1/8/16, dask streaming out-of-range reject at -1/17/50 (separate _LEVEL_RANGES call site). Pass 7 (2026-05-11): added test_gpu_writer_compression_modes_2026_05_11.py closing Cat 4 HIGH gap on write_geotiff_gpu compression= modes. The writer documents zstd (default, fastest GPU), deflate, jpeg, and none, but only deflate + none had round-trip tests; the default zstd and the jpeg (nvJPEG/Pillow) paths shipped without targeted coverage. 11 new tests, all passing on GPU host: zstd round-trip + default-codec pinning, jpeg round-trip on 3-band RGB uint8 + 1-band greyscale, TIFF compression-tag header check across none/deflate/zstd/jpeg, plain deflate + none round-trips outside the COG/sentinel paths, and a cross-codec lossless parity check (zstd/deflate/none agree pixel-exact). nvJPEG path was exercised live, not just the Pillow fallback. Pass 6 (2026-05-11): added test_overview_resampling_min_max_median_2026_05_11.py covering Cat 4 HIGH parameter-coverage gap on overview_resampling=min/max/median. CPU end-to-end paths were already covered by test_cog_overview_nodata_1613::test_cpu_cog_overview_aggregations_ignore_sentinel; the GPU end-to-end paths and the direct CPU+GPU block-reducer branches had no targeted tests, so a regression on those code paths would ship undetected. 26 tests, all passing on GPU host: block-reducer unit tests (finite + partial-NaN), end-to-end COG writes for both to_geotiff and write_geotiff_gpu, CPU/GPU parity for to_geotiff(gpu=True), CPU nodata-sentinel regression check, and ValueError error-path tests for unknown method names on both backends. Pass 5 (2026-05-11): added test_degenerate_shapes_backends_2026_05_11.py covering Cat 3 HIGH geometric gaps (1x1 / 1xN / Nx1 reads on dask+numpy, GPU, dask+cupy backends; 1x1 / 1xN / Nx1 writes through write_geotiff_gpu) and Cat 2 MEDIUM NaN/Inf gaps (all-NaN read on GPU + dask+cupy, Inf / -Inf reads on all non-eager backends, NaN sentinel mask on dask read path including sentinel block split across chunk boundary). 23 tests, all passing on GPU host. Prior passes still hold: pass 4 (r4) closed read_geotiff_gpu/dask name= + max_pixels= kwargs (Cat 4), pass 3 (r3) closed read_vrt GPU/dask+GPU backend dispatch (Cat 1) and dtype/name kwargs (Cat 4)." reproject,2026-05-10,,HIGH,1;4;5,"Added 39 tests: LiteCRS direct coverage, itrf_transform behaviour/roundtrip/array, itrf_frames, geoid_height numerical correctness + raster happy-path, vertical helpers (ellipsoidal<->orthometric/depth), reproject() lat/lon and latitude/longitude dim propagation. Note: _merge_arrays_cupy is imported but unused (no cupy merge dispatch in merge()); flagged as feature gap not test gap." diff --git a/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py b/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py new file mode 100644 index 000000000..59f633a9a --- /dev/null +++ b/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py @@ -0,0 +1,272 @@ +"""Dask streaming write coverage for degenerate raster shapes and NaN / Inf. + +The eager numpy write path (``test_edge_cases.py``) covers 1x1, 1xN, Nx1 +rasters end-to-end plus all-NaN, all-Inf, and -Inf inputs. The +``write_geotiff_gpu`` path got the same shapes via the test coverage +sweep pass 5 (``test_degenerate_shapes_backends_2026_05_11.py``). The +dask streaming write path through ``to_geotiff`` on a dask-backed +DataArray (#1084) had no matching coverage: ``test_streaming_write.py`` +hits 100x100 with a NaN block and a 2x2 ``test_small_raster`` but +nothing single-pixel-row / single-pixel-column, nothing all-NaN, and +nothing Inf / -Inf. + +A regression in the dask streaming tile-row segmenter (#1485) on a +1-pixel-tall raster, or in the streaming nodata-mask coercion on an +all-NaN chunk, would not surface from any other path. Both of those +code branches are reached only when the input is a dask-backed +DataArray. + +Pass 14 (2026-05-15) closes the gap: + +* Cat 3 HIGH -- 1x1, 1xN, Nx1 round-trips through the dask streaming + writer with chunk sizes that match the raster shape and chunk sizes + smaller than the raster (so the chunk boundary genuinely splits the + array). +* Cat 2 HIGH -- all-NaN dask streaming write with a finite nodata + sentinel: the writer must mask every NaN to the sentinel during + streaming (the eager path's equivalent is + ``test_edge_cases.TestNanAndInfHandling.test_all_nan``). +* Cat 2 MEDIUM -- mixed NaN / +Inf / -Inf dask streaming write: +Inf + and -Inf are valid IEEE-754 float values and must round-trip + bit-exactly through the streaming pipeline. Only NaN is treated as + nodata. +* Cat 2 MEDIUM -- all-Inf and all -Inf dask streaming writes. +* Cat 4 MEDIUM -- ``predictor=3`` (floating-point predictor) on a + small dask raster: the streaming write path threads ``predictor=`` + through to each tile-row encode, and the float predictor branch had + no direct streaming-write coverage (``test_streaming_write.py`` + covers ``predictor=True`` only). +""" +from __future__ import annotations + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.geotiff import open_geotiff, to_geotiff + + +# --------------------------------------------------------------------------- +# Cat 3: 1x1, 1xN, Nx1 dask streaming writes +# --------------------------------------------------------------------------- + + +class TestStreamingWrite1x1: + """A single-pixel dask raster must round-trip through the streaming writer.""" + + def test_1x1_chunk_matches_shape(self, tmp_path): + arr = np.array([[42.0]], dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 1, 'x': 1}) + path = str(tmp_path / '1x1_a.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert result.shape == (1, 1) + assert result.values[0, 0] == pytest.approx(42.0) + + def test_1x1_with_nodata_attr(self, tmp_path): + """``attrs['nodata']`` must round-trip even for a 1x1 raster.""" + arr = np.array([[7.5]], dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x'], + attrs={'nodata': -9999.0}).chunk({'y': 1, 'x': 1}) + path = str(tmp_path / '1x1_nodata.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert result.shape == (1, 1) + assert result.values[0, 0] == pytest.approx(7.5) + assert result.attrs.get('nodata') == pytest.approx(-9999.0) + + def test_1x1_uint16(self, tmp_path): + arr = np.array([[255]], dtype=np.uint16) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 1, 'x': 1}) + path = str(tmp_path / '1x1_u16.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert result.shape == (1, 1) + assert int(result.values[0, 0]) == 255 + + +class TestStreamingWrite1xN: + """A 1-pixel-tall raster exercises the single-tile-row streaming path.""" + + def test_1xN_single_chunk(self, tmp_path): + arr = np.arange(10, dtype=np.float32).reshape(1, 10) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 1, 'x': 10}) + path = str(tmp_path / '1xN_a.tif') + to_geotiff(da, path) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + def test_1xN_chunks_split_columns(self, tmp_path): + """Chunk grid splits the row into multiple column-chunks.""" + arr = np.arange(20, dtype=np.float32).reshape(1, 20) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 1, 'x': 7}) + path = str(tmp_path / '1xN_b.tif') + to_geotiff(da, path) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + def test_1xN_wide_segmented_by_buffer(self, tmp_path): + """Wide single row segmented by streaming_buffer_bytes (#1485).""" + arr = np.arange(64, dtype=np.float32).reshape(1, 64) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 1, 'x': 16}) + path = str(tmp_path / '1xN_seg.tif') + # Tiny streaming buffer so the segmenter splits the tile-row. + to_geotiff(da, path, tile_size=16, + streaming_buffer_bytes=1) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + +class TestStreamingWriteNx1: + """A 1-pixel-wide raster exercises the column-degenerate streaming path.""" + + def test_Nx1_single_chunk(self, tmp_path): + arr = np.arange(10, dtype=np.float32).reshape(10, 1) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 10, 'x': 1}) + path = str(tmp_path / 'Nx1_a.tif') + to_geotiff(da, path) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + def test_Nx1_chunks_split_rows(self, tmp_path): + """Chunk grid splits the column into multiple row-chunks.""" + arr = np.arange(20, dtype=np.float32).reshape(20, 1) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 7, 'x': 1}) + path = str(tmp_path / 'Nx1_b.tif') + to_geotiff(da, path) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + +# --------------------------------------------------------------------------- +# Cat 2: NaN / Inf dask streaming writes +# --------------------------------------------------------------------------- + + +class TestStreamingWriteAllNan: + """All-NaN dask raster must mask every pixel to the nodata sentinel.""" + + def test_all_nan_with_sentinel(self, tmp_path): + arr = np.full((8, 8), np.nan, dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x'], + attrs={'nodata': -9999.0}).chunk({'y': 4, 'x': 4}) + path = str(tmp_path / 'allnan.tif') + to_geotiff(da, path) + result = open_geotiff(path) + # Every pixel must round-trip back to NaN (sentinel -> NaN on read). + assert np.isnan(result.values).all() + # Sentinel must be preserved in attrs. + assert result.attrs.get('nodata') == pytest.approx(-9999.0) + + def test_all_nan_default_nodata(self, tmp_path): + """``attrs['nodata']`` omitted -- the streaming writer must still + accept the all-NaN input. The reader cannot mask without a + sentinel so the float NaN survives in the file.""" + arr = np.full((4, 4), np.nan, dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 2, 'x': 2}) + path = str(tmp_path / 'allnan_nosen.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert np.isnan(result.values).all() + + +class TestStreamingWriteMixedNanInf: + """Mixed NaN / +Inf / -Inf in a single dask raster. + + The streaming writer must (a) replace NaN with the nodata sentinel, + (b) leave +Inf and -Inf untouched (they are valid IEEE-754 floats). + """ + + def test_mixed_nan_plus_minus_inf(self, tmp_path): + arr = np.array([ + [1.0, np.nan, 3.0, 4.0], + [np.inf, 6.0, -np.inf, 8.0], + [9.0, 10.0, np.nan, 12.0], + [13.0, np.inf, 15.0, -np.inf], + ], dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x'], + attrs={'nodata': -9999.0}).chunk({'y': 2, 'x': 2}) + path = str(tmp_path / 'mixed.tif') + to_geotiff(da, path) + result = open_geotiff(path) + # NaN positions round-trip to NaN. + assert np.isnan(result.values[0, 1]) + assert np.isnan(result.values[2, 2]) + # +Inf and -Inf round-trip verbatim. + assert result.values[1, 0] == np.inf + assert result.values[3, 1] == np.inf + assert result.values[1, 2] == -np.inf + assert result.values[3, 3] == -np.inf + # Finite values stay finite. + assert result.values[0, 0] == pytest.approx(1.0) + assert result.values[2, 0] == pytest.approx(9.0) + + +class TestStreamingWriteAllInf: + """All +Inf and all -Inf dask streaming writes. + + +Inf and -Inf are valid IEEE-754 floats; the streaming writer + should pass them through unchanged. The reader keeps Inf as Inf + because the nodata mask only matches the sentinel value, not Inf. + """ + + def test_all_plus_inf(self, tmp_path): + arr = np.full((4, 4), np.inf, dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 2, 'x': 2}) + path = str(tmp_path / 'allposinf.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert (result.values == np.inf).all() + + def test_all_minus_inf(self, tmp_path): + arr = np.full((4, 4), -np.inf, dtype=np.float32) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 2, 'x': 2}) + path = str(tmp_path / 'allneginf.tif') + to_geotiff(da, path) + result = open_geotiff(path) + assert (result.values == -np.inf).all() + + +# --------------------------------------------------------------------------- +# Cat 4: predictor=3 floating-point predictor through dask streaming write +# --------------------------------------------------------------------------- + + +class TestStreamingWriteFloatPredictor: + """``predictor=3`` (TIFF float predictor) on a small dask raster. + + The streaming writer threads ``predictor=`` through to every tile-row + encode. ``test_streaming_write.py`` covers ``predictor=True`` (=2) + only; the float predictor 3 branch lacked direct streaming + coverage. Verify lossless float32 round-trip plus the dtype-guard + on int input. + """ + + def test_predictor3_float32_round_trip(self, tmp_path): + rng = np.random.default_rng(2026_05_15) + arr = rng.random((40, 40), dtype=np.float32) * 100.0 + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 16, 'x': 16}) + path = str(tmp_path / 'pred3_f32.tif') + to_geotiff(da, path, compression='deflate', predictor=3, + tile_size=16) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + def test_predictor3_float64_round_trip(self, tmp_path): + rng = np.random.default_rng(2026_05_15) + arr = rng.random((32, 32), dtype=np.float64) * 100.0 + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 16, 'x': 16}) + path = str(tmp_path / 'pred3_f64.tif') + to_geotiff(da, path, compression='deflate', predictor=3, + tile_size=16) + result = open_geotiff(path) + np.testing.assert_array_equal(result.values, arr) + + def test_predictor3_int_input_rejected(self, tmp_path): + """``predictor=3`` requires float dtype; int input must raise.""" + arr = np.arange(32 * 32, dtype=np.int32).reshape(32, 32) + da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 16, 'x': 16}) + path = str(tmp_path / 'pred3_i32.tif') + with pytest.raises(ValueError, match='predictor'): + to_geotiff(da, path, compression='deflate', predictor=3, + tile_size=16) From f0b3080e170a06122097d4eb89ad702e6d1d61f0 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Fri, 15 May 2026 06:53:49 -0700 Subject: [PATCH 2/2] geotiff: address copilot review on dask streaming write coverage (#1912) - Add raw-pixel verification to the all-NaN, mixed NaN/Inf, and no- sentinel-declared streaming tests. ``open_geotiff`` maps the GDAL_NODATA sentinel back to NaN on read, so asserting on the decoded DataArray alone cannot distinguish a writer that left NaNs as floats from a writer that wrote the sentinel correctly. The tests now decode the raw pixels with ``tifffile`` (no GDAL_NODATA consultation) and pin the on-disk encoding, then keep the public ``open_geotiff`` assertion to lock the round-trip end-to-end. - Update the docstring + sweep-state note to acknowledge that test_predictor_fp_write_1313.test_predictor3_streaming_dask already covers dask streaming predictor=3 with a Predictor-tag assertion. The new tests extend coverage to smaller chunk geometries and pin the int-dtype rejection rather than claiming predictor=3 streaming was previously uncovered. --- .claude/sweep-test-coverage-state.csv | 2 +- ...k_streaming_write_degenerate_2026_05_15.py | 83 +++++++++++++++---- 2 files changed, 67 insertions(+), 18 deletions(-) diff --git a/.claude/sweep-test-coverage-state.csv b/.claude/sweep-test-coverage-state.csv index ac9ff9e09..bf36632ed 100644 --- a/.claude/sweep-test-coverage-state.csv +++ b/.claude/sweep-test-coverage-state.csv @@ -1,3 +1,3 @@ module,last_inspected,issue,severity_max,categories_found,notes -geotiff,2026-05-15,,HIGH,2;3;4,"Pass 14 (2026-05-15): added test_dask_streaming_write_degenerate_2026_05_15.py closing Cat 3 HIGH and Cat 2 HIGH/MEDIUM gaps on the dask streaming write path (to_geotiff with dask-backed DataArray, #1084). test_streaming_write.py covered 100x100 with a NaN block plus a 2x2 small raster but had nothing 1-pixel-row, 1-pixel-column, all-NaN, all-Inf, or +/-Inf-mixed. The streaming tile-row segmenter (#1485) on a 1-pixel-tall raster and the streaming nodata-mask coercion on an all-NaN chunk were reachable only with a dask input and had no direct coverage; a regression on either would not surface from the eager numpy path or the write_geotiff_gpu path (pass 5 covered the GPU writer's degenerate shapes). 16 new tests, all passing: 1x1 chunk-matches-shape + nodata-attr round-trip + uint16, 1xN single chunk + chunks-split-columns + wide-segmented-by-buffer (#1485 streaming_buffer_bytes=1 forces the segmenter), Nx1 single chunk + chunks-split-rows, all-NaN with finite sentinel + all-NaN without sentinel, mixed NaN/+Inf/-Inf preserving Inf bit-exact + sentinel masking NaN only, all-+Inf and all--Inf, predictor=3 (float predictor) round-trip on float32 + float64 plus int-dtype ValueError. predictor=3 streaming coverage extends prior test_streaming_write.test_predictor (which only covers predictor=True / predictor=2). Cat 3 HIGH (1x1/1xN/Nx1) + Cat 2 HIGH (all-NaN with sentinel) + Cat 2 MEDIUM (mixed-Inf, all-Inf) + Cat 4 MEDIUM (predictor=3 streaming). Pass 13 (2026-05-13): added test_size_param_validation_gpu_vrt_1776.py closing Cat 4 HIGH parameter-coverage gap on size-arg validation. Issue #1752 added tile_size validation to to_geotiff and chunks validation to read_geotiff_dask, but the matching kwargs on three sibling entry points were left unchecked: write_geotiff_gpu(tile_size=) raised ZeroDivisionError for 0, struct.error for -1, TypeError for 256.0; read_geotiff_gpu(chunks=) and read_vrt(chunks=) raised ZeroDivisionError for 0 and silently accepted negative values. Factored two shared validators (_validate_tile_size_arg, _validate_chunks_arg) and called them up front from each entry point. 34 new tests, all passing on GPU host: tile_size matrix on write_geotiff_gpu (0/-1/256.0/True/False/positive/np.int64), chunks matrix on read_geotiff_gpu and read_vrt (0/-1/(0,N)/(N,-1)/wrong-length/bool/non-int/(N,float)/positive/np.int64), dispatcher thread-through tests (open_geotiff(gpu=True, chunks=0), to_geotiff(gpu=True, tile_size=0)). Pre-existing 13 #1752 tests still pass after refactor. Filed issue #1776. Pass 12 (2026-05-12): added test_gpu_writer_overview_mode_and_compression_level_1740.py closing Cat 4 HIGH and Cat 4 MEDIUM parameter-coverage gaps. (1) write_geotiff_gpu(overview_resampling='mode') and the dedicated _block_reduce_2d_gpu mode-fallback branch (_gpu_decode.py:3051-3056) had zero direct tests; six of the seven overview_resampling modes were covered (mean/nearest by test_features, min/max/median by pass 6, cubic by test_signature_parity_1631) but mode was the odd one out -- a regression dropping the mode dispatch from _block_reduce_2d_gpu would fall through to the mean reshape branch and emit wrong overview pixels for integer rasters. (2) write_geotiff_gpu(compression_level=) documented as accepted-but-ignored had no test; the CPU writer rejects out-of-range levels with ValueError, the GPU writer is documented not to -- a regression wiring the GPU writer up to the CPU range validator would silently break every to_geotiff(gpu=True, compression_level=X) caller for in-range levels and noisily for out-of-range. 19 tests, all passing on GPU host: _block_reduce_2d_gpu(method='mode') CPU-parity on 4x4 deterministic + random 8x8 + dtype-preserved across u8/u16/i16/i32, write_geotiff_gpu(cog=True, overview_resampling='mode') end-to-end round trip, to_geotiff(gpu=True, ..., overview_resampling='mode') dispatcher thread-through, GPU-vs-CPU pixel parity on 8x8 input, write_geotiff_gpu(compression_level=) in-range matrix on zstd/deflate, out-of-range matrix (zstd=999/-5, deflate=50/0) accepted without raising + round-trip preserved, to_geotiff(gpu=True, compression_level=999) dispatcher thread-through, companion CPU rejects-OOR pin to lock the asymmetry. Mutation against the mode branch (drop the 'if method == mode' block in _block_reduce_2d_gpu) flipped 9 mode tests red. Filed issue #1740. Pass 11 (2026-05-12): added test_gpu_writer_cpu_fallback_codecs_2026_05_12.py closing a Cat 4 HIGH parameter-coverage gap on write_geotiff_gpu compression= modes for the CPU-fallback codecs (lzw, packbits, lz4, lerc, jpeg2000/j2k). Pass 7 (test_gpu_writer_compression_modes_2026_05_11) covered only none/deflate/zstd/jpeg; the remaining five codecs route through dedicated branches in gpu_compress_tiles (_gpu_decode.py:2974-3019) with CPU fallbacks (lerc_compress, jpeg2000_compress, cpu_compress) that had zero direct tests via write_geotiff_gpu. A regression in routing/tag-wiring/fallback dispatch would ship silently because the internal reader uses the same compression-tag table. 17 tests, all passing on GPU host: lzw/packbits/lz4 round-trip + compression-tag pin on uint16, lerc lossless float32 + uint16 round-trip + tag pin, jpeg2000 uint8 single-band + RGB multi-band lossless round-trip + j2k-alias parity + tag pin, GPU-vs-CPU writer pixel parity for lzw/packbits, to_geotiff(gpu=True, compression=lzw/packbits) dispatcher thread-through. Mutation against compression dispatch (swap lzw bytes to zstd; swap lerc bytes to deflate) flipped round-trip tests red. Filed issue #1706. Pass 10 (2026-05-12): added test_kwarg_behaviour_2026_05_12_v2.py closing two Cat 4 HIGH parameter-coverage gaps. (1) write_geotiff_gpu(predictor=True/2/3) had zero direct tests; the GPU writer threads predictor= through normalize_predictor and gpu_compress_tiles into five CUDA encode kernels (_predictor_encode_kernel_u8/u16/u32/u64 for predictor=2, _fp_predictor_encode_kernel for predictor=3) and a regression dropping the encode-kernel calls would ship corrupt files. (2) read_vrt(window=) had no behaviour tests (only a signature pin in test_signature_annotations_1654); the kwarg is documented and _vrt.read_vrt implements full windowed-read semantics (clip, multi-source overlap, src/dst scaling, GeoTransform origin shift on coords + attrs['transform']). 23 tests, all passing on GPU host: predictor=True/2 round-trips on u8/u16/i32 + 3-band RGB samples_per_pixel stride; predictor=3 lossless round-trip on f32 and f64; predictor=3 int-dtype ValueError (CPU/GPU parity); CPU/GPU pixel-exact parity for pred=2 u16 and pred=3 f32; read_vrt(window=) subregion + full + clamp-overflow + clamp-negative + 2x1 mosaic seam straddle + offset past seam + transform-attr origin shift + y/x coords half-pixel shift + window+band + window+chunks (dask) + window+gpu (cupy) + window+gpu+chunks (dask+cupy). Mutation against the encode dispatch flipped 7 predictor tests red. Filed issue #1690. Pass 9 (2026-05-12): added test_kwarg_behaviour_2026_05_12.py closing three Cat 4 MEDIUM parameter-coverage gaps plus one Cat 4 LOW error path. write_vrt documented kwargs (relative/crs_wkt/nodata) had a smoke-test pinning that the kwargs are accepted but no test verified the override *effect* -- a regression dropping the override branch and silently using the default-from-first-source would ship undetected. read_geotiff_gpu(dtype=) cast had zero direct tests; the eager path has TestDtypeEager and dask has TestDtypeDask but the GPU branch had no equivalent. write_geotiff_gpu(bigtiff=) threads through to _assemble_tiff(force_bigtiff=) but no test asserted the on-disk header byte switches; the CPU writer had it via test_features::test_force_bigtiff_via_public_api. write_vrt(source_files=[]) ValueError was uncovered. 26 tests, all passing on GPU host: write_vrt relative=True/False XML attribute + path inspection + parse-back round-trip, write_vrt crs_wkt= override distinct-from-default XML check, write_vrt nodata= override + default-from-source coverage, write_vrt([]) ValueError + no-file side effect, read_geotiff_gpu dtype= matrix (float64->float32, float64->float16, uint16->int32, uint16->uint8, float-to-int raise, dtype=None preserves native), open_geotiff(gpu=True, dtype=) dispatcher, read_geotiff_gpu(chunks=, dtype=) dask+GPU branch, write_geotiff_gpu bigtiff=True/False/None header verification, to_geotiff(gpu=True, bigtiff=True) dispatcher thread-through. Pass 8 (2026-05-11): added test_lz4_compression_level_2026_05_11.py closing Cat 4 MEDIUM parameter-coverage gap on compression='lz4' + compression_level=. _LEVEL_RANGES advertises lz4: (0, 16) but only deflate (1, 9) and zstd (1, 22) had direct level boundary + round-trip + reject tests. The range check is the gatekeeper -- lz4_compress silently accepts any int level -- so a regression dropping 'lz4' from _LEVEL_RANGES would ship undetected. 18 tests, all passing: round-trip at levels 0/1/9/16 (lossless), default-level no-arg path, higher-level-not-larger smoke check on compressible input, out-of-range reject at -1/-10/17/100 on eager path, valid-range message format pin (lz4 valid: 0-16), dask streaming round-trip at 0/1/8/16, dask streaming out-of-range reject at -1/17/50 (separate _LEVEL_RANGES call site). Pass 7 (2026-05-11): added test_gpu_writer_compression_modes_2026_05_11.py closing Cat 4 HIGH gap on write_geotiff_gpu compression= modes. The writer documents zstd (default, fastest GPU), deflate, jpeg, and none, but only deflate + none had round-trip tests; the default zstd and the jpeg (nvJPEG/Pillow) paths shipped without targeted coverage. 11 new tests, all passing on GPU host: zstd round-trip + default-codec pinning, jpeg round-trip on 3-band RGB uint8 + 1-band greyscale, TIFF compression-tag header check across none/deflate/zstd/jpeg, plain deflate + none round-trips outside the COG/sentinel paths, and a cross-codec lossless parity check (zstd/deflate/none agree pixel-exact). nvJPEG path was exercised live, not just the Pillow fallback. Pass 6 (2026-05-11): added test_overview_resampling_min_max_median_2026_05_11.py covering Cat 4 HIGH parameter-coverage gap on overview_resampling=min/max/median. CPU end-to-end paths were already covered by test_cog_overview_nodata_1613::test_cpu_cog_overview_aggregations_ignore_sentinel; the GPU end-to-end paths and the direct CPU+GPU block-reducer branches had no targeted tests, so a regression on those code paths would ship undetected. 26 tests, all passing on GPU host: block-reducer unit tests (finite + partial-NaN), end-to-end COG writes for both to_geotiff and write_geotiff_gpu, CPU/GPU parity for to_geotiff(gpu=True), CPU nodata-sentinel regression check, and ValueError error-path tests for unknown method names on both backends. Pass 5 (2026-05-11): added test_degenerate_shapes_backends_2026_05_11.py covering Cat 3 HIGH geometric gaps (1x1 / 1xN / Nx1 reads on dask+numpy, GPU, dask+cupy backends; 1x1 / 1xN / Nx1 writes through write_geotiff_gpu) and Cat 2 MEDIUM NaN/Inf gaps (all-NaN read on GPU + dask+cupy, Inf / -Inf reads on all non-eager backends, NaN sentinel mask on dask read path including sentinel block split across chunk boundary). 23 tests, all passing on GPU host. Prior passes still hold: pass 4 (r4) closed read_geotiff_gpu/dask name= + max_pixels= kwargs (Cat 4), pass 3 (r3) closed read_vrt GPU/dask+GPU backend dispatch (Cat 1) and dtype/name kwargs (Cat 4)." +geotiff,2026-05-15,,HIGH,2;3;4,"Pass 14 (2026-05-15): added test_dask_streaming_write_degenerate_2026_05_15.py closing Cat 3 HIGH and Cat 2 HIGH/MEDIUM gaps on the dask streaming write path (to_geotiff with dask-backed DataArray, #1084). test_streaming_write.py covered 100x100 with a NaN block plus a 2x2 small raster but had nothing 1-pixel-row, 1-pixel-column, all-NaN, all-Inf, or +/-Inf-mixed. The streaming tile-row segmenter (#1485) on a 1-pixel-tall raster and the streaming nodata-mask coercion on an all-NaN chunk were reachable only with a dask input and had no direct coverage; a regression on either would not surface from the eager numpy path or the write_geotiff_gpu path (pass 5 covered the GPU writer's degenerate shapes). 16 new tests, all passing: 1x1 chunk-matches-shape + nodata-attr round-trip + uint16, 1xN single chunk + chunks-split-columns + wide-segmented-by-buffer (#1485 streaming_buffer_bytes=1 forces the segmenter), Nx1 single chunk + chunks-split-rows, all-NaN with finite sentinel + all-NaN without sentinel, mixed NaN/+Inf/-Inf preserving Inf bit-exact + sentinel masking NaN only, all-+Inf and all--Inf, predictor=3 (float predictor) round-trip on float32 + float64 plus int-dtype ValueError. predictor=3 streaming coverage extends the small-chunk and int-rejection geometry around test_predictor_fp_write_1313.test_predictor3_streaming_dask (which already covers a 128x192 predictor=3 dask streaming write with a Predictor-tag assertion). Cat 3 HIGH (1x1/1xN/Nx1) + Cat 2 HIGH (all-NaN with sentinel) + Cat 2 MEDIUM (mixed-Inf, all-Inf) + Cat 4 MEDIUM (predictor=3 streaming). Pass 13 (2026-05-13): added test_size_param_validation_gpu_vrt_1776.py closing Cat 4 HIGH parameter-coverage gap on size-arg validation. Issue #1752 added tile_size validation to to_geotiff and chunks validation to read_geotiff_dask, but the matching kwargs on three sibling entry points were left unchecked: write_geotiff_gpu(tile_size=) raised ZeroDivisionError for 0, struct.error for -1, TypeError for 256.0; read_geotiff_gpu(chunks=) and read_vrt(chunks=) raised ZeroDivisionError for 0 and silently accepted negative values. Factored two shared validators (_validate_tile_size_arg, _validate_chunks_arg) and called them up front from each entry point. 34 new tests, all passing on GPU host: tile_size matrix on write_geotiff_gpu (0/-1/256.0/True/False/positive/np.int64), chunks matrix on read_geotiff_gpu and read_vrt (0/-1/(0,N)/(N,-1)/wrong-length/bool/non-int/(N,float)/positive/np.int64), dispatcher thread-through tests (open_geotiff(gpu=True, chunks=0), to_geotiff(gpu=True, tile_size=0)). Pre-existing 13 #1752 tests still pass after refactor. Filed issue #1776. Pass 12 (2026-05-12): added test_gpu_writer_overview_mode_and_compression_level_1740.py closing Cat 4 HIGH and Cat 4 MEDIUM parameter-coverage gaps. (1) write_geotiff_gpu(overview_resampling='mode') and the dedicated _block_reduce_2d_gpu mode-fallback branch (_gpu_decode.py:3051-3056) had zero direct tests; six of the seven overview_resampling modes were covered (mean/nearest by test_features, min/max/median by pass 6, cubic by test_signature_parity_1631) but mode was the odd one out -- a regression dropping the mode dispatch from _block_reduce_2d_gpu would fall through to the mean reshape branch and emit wrong overview pixels for integer rasters. (2) write_geotiff_gpu(compression_level=) documented as accepted-but-ignored had no test; the CPU writer rejects out-of-range levels with ValueError, the GPU writer is documented not to -- a regression wiring the GPU writer up to the CPU range validator would silently break every to_geotiff(gpu=True, compression_level=X) caller for in-range levels and noisily for out-of-range. 19 tests, all passing on GPU host: _block_reduce_2d_gpu(method='mode') CPU-parity on 4x4 deterministic + random 8x8 + dtype-preserved across u8/u16/i16/i32, write_geotiff_gpu(cog=True, overview_resampling='mode') end-to-end round trip, to_geotiff(gpu=True, ..., overview_resampling='mode') dispatcher thread-through, GPU-vs-CPU pixel parity on 8x8 input, write_geotiff_gpu(compression_level=) in-range matrix on zstd/deflate, out-of-range matrix (zstd=999/-5, deflate=50/0) accepted without raising + round-trip preserved, to_geotiff(gpu=True, compression_level=999) dispatcher thread-through, companion CPU rejects-OOR pin to lock the asymmetry. Mutation against the mode branch (drop the 'if method == mode' block in _block_reduce_2d_gpu) flipped 9 mode tests red. Filed issue #1740. Pass 11 (2026-05-12): added test_gpu_writer_cpu_fallback_codecs_2026_05_12.py closing a Cat 4 HIGH parameter-coverage gap on write_geotiff_gpu compression= modes for the CPU-fallback codecs (lzw, packbits, lz4, lerc, jpeg2000/j2k). Pass 7 (test_gpu_writer_compression_modes_2026_05_11) covered only none/deflate/zstd/jpeg; the remaining five codecs route through dedicated branches in gpu_compress_tiles (_gpu_decode.py:2974-3019) with CPU fallbacks (lerc_compress, jpeg2000_compress, cpu_compress) that had zero direct tests via write_geotiff_gpu. A regression in routing/tag-wiring/fallback dispatch would ship silently because the internal reader uses the same compression-tag table. 17 tests, all passing on GPU host: lzw/packbits/lz4 round-trip + compression-tag pin on uint16, lerc lossless float32 + uint16 round-trip + tag pin, jpeg2000 uint8 single-band + RGB multi-band lossless round-trip + j2k-alias parity + tag pin, GPU-vs-CPU writer pixel parity for lzw/packbits, to_geotiff(gpu=True, compression=lzw/packbits) dispatcher thread-through. Mutation against compression dispatch (swap lzw bytes to zstd; swap lerc bytes to deflate) flipped round-trip tests red. Filed issue #1706. Pass 10 (2026-05-12): added test_kwarg_behaviour_2026_05_12_v2.py closing two Cat 4 HIGH parameter-coverage gaps. (1) write_geotiff_gpu(predictor=True/2/3) had zero direct tests; the GPU writer threads predictor= through normalize_predictor and gpu_compress_tiles into five CUDA encode kernels (_predictor_encode_kernel_u8/u16/u32/u64 for predictor=2, _fp_predictor_encode_kernel for predictor=3) and a regression dropping the encode-kernel calls would ship corrupt files. (2) read_vrt(window=) had no behaviour tests (only a signature pin in test_signature_annotations_1654); the kwarg is documented and _vrt.read_vrt implements full windowed-read semantics (clip, multi-source overlap, src/dst scaling, GeoTransform origin shift on coords + attrs['transform']). 23 tests, all passing on GPU host: predictor=True/2 round-trips on u8/u16/i32 + 3-band RGB samples_per_pixel stride; predictor=3 lossless round-trip on f32 and f64; predictor=3 int-dtype ValueError (CPU/GPU parity); CPU/GPU pixel-exact parity for pred=2 u16 and pred=3 f32; read_vrt(window=) subregion + full + clamp-overflow + clamp-negative + 2x1 mosaic seam straddle + offset past seam + transform-attr origin shift + y/x coords half-pixel shift + window+band + window+chunks (dask) + window+gpu (cupy) + window+gpu+chunks (dask+cupy). Mutation against the encode dispatch flipped 7 predictor tests red. Filed issue #1690. Pass 9 (2026-05-12): added test_kwarg_behaviour_2026_05_12.py closing three Cat 4 MEDIUM parameter-coverage gaps plus one Cat 4 LOW error path. write_vrt documented kwargs (relative/crs_wkt/nodata) had a smoke-test pinning that the kwargs are accepted but no test verified the override *effect* -- a regression dropping the override branch and silently using the default-from-first-source would ship undetected. read_geotiff_gpu(dtype=) cast had zero direct tests; the eager path has TestDtypeEager and dask has TestDtypeDask but the GPU branch had no equivalent. write_geotiff_gpu(bigtiff=) threads through to _assemble_tiff(force_bigtiff=) but no test asserted the on-disk header byte switches; the CPU writer had it via test_features::test_force_bigtiff_via_public_api. write_vrt(source_files=[]) ValueError was uncovered. 26 tests, all passing on GPU host: write_vrt relative=True/False XML attribute + path inspection + parse-back round-trip, write_vrt crs_wkt= override distinct-from-default XML check, write_vrt nodata= override + default-from-source coverage, write_vrt([]) ValueError + no-file side effect, read_geotiff_gpu dtype= matrix (float64->float32, float64->float16, uint16->int32, uint16->uint8, float-to-int raise, dtype=None preserves native), open_geotiff(gpu=True, dtype=) dispatcher, read_geotiff_gpu(chunks=, dtype=) dask+GPU branch, write_geotiff_gpu bigtiff=True/False/None header verification, to_geotiff(gpu=True, bigtiff=True) dispatcher thread-through. Pass 8 (2026-05-11): added test_lz4_compression_level_2026_05_11.py closing Cat 4 MEDIUM parameter-coverage gap on compression='lz4' + compression_level=. _LEVEL_RANGES advertises lz4: (0, 16) but only deflate (1, 9) and zstd (1, 22) had direct level boundary + round-trip + reject tests. The range check is the gatekeeper -- lz4_compress silently accepts any int level -- so a regression dropping 'lz4' from _LEVEL_RANGES would ship undetected. 18 tests, all passing: round-trip at levels 0/1/9/16 (lossless), default-level no-arg path, higher-level-not-larger smoke check on compressible input, out-of-range reject at -1/-10/17/100 on eager path, valid-range message format pin (lz4 valid: 0-16), dask streaming round-trip at 0/1/8/16, dask streaming out-of-range reject at -1/17/50 (separate _LEVEL_RANGES call site). Pass 7 (2026-05-11): added test_gpu_writer_compression_modes_2026_05_11.py closing Cat 4 HIGH gap on write_geotiff_gpu compression= modes. The writer documents zstd (default, fastest GPU), deflate, jpeg, and none, but only deflate + none had round-trip tests; the default zstd and the jpeg (nvJPEG/Pillow) paths shipped without targeted coverage. 11 new tests, all passing on GPU host: zstd round-trip + default-codec pinning, jpeg round-trip on 3-band RGB uint8 + 1-band greyscale, TIFF compression-tag header check across none/deflate/zstd/jpeg, plain deflate + none round-trips outside the COG/sentinel paths, and a cross-codec lossless parity check (zstd/deflate/none agree pixel-exact). nvJPEG path was exercised live, not just the Pillow fallback. Pass 6 (2026-05-11): added test_overview_resampling_min_max_median_2026_05_11.py covering Cat 4 HIGH parameter-coverage gap on overview_resampling=min/max/median. CPU end-to-end paths were already covered by test_cog_overview_nodata_1613::test_cpu_cog_overview_aggregations_ignore_sentinel; the GPU end-to-end paths and the direct CPU+GPU block-reducer branches had no targeted tests, so a regression on those code paths would ship undetected. 26 tests, all passing on GPU host: block-reducer unit tests (finite + partial-NaN), end-to-end COG writes for both to_geotiff and write_geotiff_gpu, CPU/GPU parity for to_geotiff(gpu=True), CPU nodata-sentinel regression check, and ValueError error-path tests for unknown method names on both backends. Pass 5 (2026-05-11): added test_degenerate_shapes_backends_2026_05_11.py covering Cat 3 HIGH geometric gaps (1x1 / 1xN / Nx1 reads on dask+numpy, GPU, dask+cupy backends; 1x1 / 1xN / Nx1 writes through write_geotiff_gpu) and Cat 2 MEDIUM NaN/Inf gaps (all-NaN read on GPU + dask+cupy, Inf / -Inf reads on all non-eager backends, NaN sentinel mask on dask read path including sentinel block split across chunk boundary). 23 tests, all passing on GPU host. Prior passes still hold: pass 4 (r4) closed read_geotiff_gpu/dask name= + max_pixels= kwargs (Cat 4), pass 3 (r3) closed read_vrt GPU/dask+GPU backend dispatch (Cat 1) and dtype/name kwargs (Cat 4)." reproject,2026-05-10,,HIGH,1;4;5,"Added 39 tests: LiteCRS direct coverage, itrf_transform behaviour/roundtrip/array, itrf_frames, geoid_height numerical correctness + raster happy-path, vertical helpers (ellipsoidal<->orthometric/depth), reproject() lat/lon and latitude/longitude dim propagation. Note: _merge_arrays_cupy is imported but unused (no cupy merge dispatch in merge()); flagged as feature gap not test gap." diff --git a/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py b/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py index 59f633a9a..d3338665a 100644 --- a/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py +++ b/xrspatial/geotiff/tests/test_dask_streaming_write_degenerate_2026_05_15.py @@ -31,11 +31,14 @@ bit-exactly through the streaming pipeline. Only NaN is treated as nodata. * Cat 2 MEDIUM -- all-Inf and all -Inf dask streaming writes. -* Cat 4 MEDIUM -- ``predictor=3`` (floating-point predictor) on a - small dask raster: the streaming write path threads ``predictor=`` - through to each tile-row encode, and the float predictor branch had - no direct streaming-write coverage (``test_streaming_write.py`` - covers ``predictor=True`` only). +* Cat 4 MEDIUM -- ``predictor=3`` (floating-point predictor) on + degenerate / small dask rasters. The float-predictor + dask + streaming combination is already covered by + ``test_predictor_fp_write_1313.test_predictor3_streaming_dask`` on a + 128x192 raster with a Predictor-tag assertion; the tests below + exercise the same branch on smaller chunks and pin the + int-input rejection so the streaming-side dtype guard cannot + regress silently. """ from __future__ import annotations @@ -46,6 +49,22 @@ from xrspatial.geotiff import open_geotiff, to_geotiff +def _read_raw_pixels(path: str) -> np.ndarray: + """Read the raw pixel array off disk without xrspatial's NaN-mask + pass. + + ``open_geotiff`` maps the GDAL_NODATA sentinel back to NaN on + read, so asserting on its output cannot distinguish (a) a writer + that left NaNs as floats and (b) a writer that wrote the sentinel + correctly. ``tifffile`` decodes the pixels but does not consult + ``GDAL_NODATA``, so a raw read surfaces what is actually on disk. + """ + import tifffile + + with tifffile.TiffFile(path) as tif: + return tif.asarray() + + # --------------------------------------------------------------------------- # Cat 3: 1x1, 1xN, Nx1 dask streaming writes # --------------------------------------------------------------------------- @@ -152,10 +171,19 @@ def test_all_nan_with_sentinel(self, tmp_path): attrs={'nodata': -9999.0}).chunk({'y': 4, 'x': 4}) path = str(tmp_path / 'allnan.tif') to_geotiff(da, path) + # Raw decode (no NaN-mask pass): every pixel must be the + # sentinel on disk. Asserting against ``open_geotiff``'s output + # alone would also pass if the writer left NaNs as floats, + # because the reader maps both NaN and the sentinel back to NaN. + raw = _read_raw_pixels(path) + assert (raw == -9999.0).all(), ( + "writer must replace NaN with the GDAL_NODATA sentinel on " + "disk; raw read shows non-sentinel pixels" + ) + assert not np.isnan(raw).any() + # Public read still maps the sentinel back to NaN. result = open_geotiff(path) - # Every pixel must round-trip back to NaN (sentinel -> NaN on read). assert np.isnan(result.values).all() - # Sentinel must be preserved in attrs. assert result.attrs.get('nodata') == pytest.approx(-9999.0) def test_all_nan_default_nodata(self, tmp_path): @@ -166,6 +194,12 @@ def test_all_nan_default_nodata(self, tmp_path): da = xr.DataArray(arr, dims=['y', 'x']).chunk({'y': 2, 'x': 2}) path = str(tmp_path / 'allnan_nosen.tif') to_geotiff(da, path) + # No sentinel declared, so the file must carry raw NaN floats + # on disk -- a regression coercing NaN to some default sentinel + # would silently change the file's contents and would not be + # visible through ``open_geotiff`` alone. + raw = _read_raw_pixels(path) + assert np.isnan(raw).all() result = open_geotiff(path) assert np.isnan(result.values).all() @@ -188,16 +222,30 @@ def test_mixed_nan_plus_minus_inf(self, tmp_path): attrs={'nodata': -9999.0}).chunk({'y': 2, 'x': 2}) path = str(tmp_path / 'mixed.tif') to_geotiff(da, path) + # Raw decode pins the on-disk encoding: NaN cells were + # coerced to the sentinel, Inf cells were left as IEEE-754 + # Inf. A regression that stopped the NaN-to-sentinel coercion + # would still pass an ``open_geotiff``-only assertion because + # the reader maps both NaN and the sentinel back to NaN. + raw = _read_raw_pixels(path) + assert raw[0, 1] == -9999.0 + assert raw[2, 2] == -9999.0 + assert raw[1, 0] == np.inf + assert raw[3, 1] == np.inf + assert raw[1, 2] == -np.inf + assert raw[3, 3] == -np.inf + assert not np.isnan(raw).any(), ( + "writer must coerce every NaN to the sentinel; raw read " + "found surviving NaN floats" + ) + # Public read maps the sentinel back to NaN, keeps Inf as-is. result = open_geotiff(path) - # NaN positions round-trip to NaN. assert np.isnan(result.values[0, 1]) assert np.isnan(result.values[2, 2]) - # +Inf and -Inf round-trip verbatim. assert result.values[1, 0] == np.inf assert result.values[3, 1] == np.inf assert result.values[1, 2] == -np.inf assert result.values[3, 3] == -np.inf - # Finite values stay finite. assert result.values[0, 0] == pytest.approx(1.0) assert result.values[2, 0] == pytest.approx(9.0) @@ -233,13 +281,14 @@ def test_all_minus_inf(self, tmp_path): class TestStreamingWriteFloatPredictor: - """``predictor=3`` (TIFF float predictor) on a small dask raster. - - The streaming writer threads ``predictor=`` through to every tile-row - encode. ``test_streaming_write.py`` covers ``predictor=True`` (=2) - only; the float predictor 3 branch lacked direct streaming - coverage. Verify lossless float32 round-trip plus the dtype-guard - on int input. + """``predictor=3`` (TIFF float predictor) on small dask rasters. + + ``test_predictor_fp_write_1313.test_predictor3_streaming_dask`` + already covers a dask-backed streaming write with ``predictor=3`` + on a 128x192 raster and pins the Predictor tag. The tests below + extend coverage with smaller chunk geometries (16x16) and lock the + int-dtype ValueError on the streaming path so the dtype guard + cannot regress silently. """ def test_predictor3_float32_round_trip(self, tmp_path):