Skip to content

geotiff: golden corpus phase 2.6 - nodata sentinels (#1930)#1994

Open
brendancol wants to merge 2 commits into
xarray-contrib:mainfrom
brendancol:1930-phase2-6-nodata-sentinels
Open

geotiff: golden corpus phase 2.6 - nodata sentinels (#1930)#1994
brendancol wants to merge 2 commits into
xarray-contrib:mainfrom
brendancol:1930-phase2-6-nodata-sentinels

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Phase 2.6 of #1930. Adds three nodata-sentinel fixtures to the golden corpus:

  • nodata_int_sentinel_uint16: uint16 with explicit integer nodata=0
  • nodata_nan_float32: float32 with nodata=NaN
  • nodata_miniswhite_uint8: photometric=miniswhite, no nodata tag, white-as-min convention

From the plan on #1930:

PR 6 - Nodata sentinels - int sentinel, NaN, miniswhite (3 fixtures)

Each fixture is 16x16 and under 4 KB. The generator gets a small _stamp_nodata_pixels helper that plants the sentinel value at three fixed positions after pattern generation, so noise / ramp patterns can't dodge the masked-data path. Smoke tests cover:

  • file is a valid TIFF under the size budget,
  • the convention is observable on the rasterio source (int / NaN tag, or IMAGE_STRUCTURE MINISWHITE),
  • compare_to_oracle accepts a hand-built candidate DataArray for each convention, which proves the oracle's NaN-aware equality handles nodata=NaN.

No backend wiring; Phase 3 handles that. Per the plan, the TODO(#1988) note documents that today's candidate mirrors whatever shape attrs['nodata'] has on the existing reader; the declared-vs-masked-state split lands later.

Test plan

  • pytest xrspatial/geotiff/tests/golden_corpus/test_nodata_sentinels.py - 9 passed
  • pytest xrspatial/geotiff/tests/golden_corpus/test_oracle.py - 17 passed (unchanged)
  • Re-running the generator produces byte-identical .tif files

)

Adds three fixtures, one per nodata convention the manifest schema
recognises:

- nodata_int_sentinel_uint16: integer nodata=0, three sentinel pixels.
- nodata_nan_float32: nodata=NaN, three NaN pixels.
- nodata_miniswhite_uint8: photometric=miniswhite, no nodata tag,
  three dtype-max pixels.

Each fixture is 16x16 and under 4 KB. The generator gets a small
_stamp_nodata_pixels helper that plants the sentinel value at three
fixed positions after pixel generation, so the oracle's nodata
handling gets exercised rather than just the tag round-trip. Smoke
tests verify each fixture opens cleanly, the nodata state is visible
on the rasterio source (int / NaN / IMAGE_STRUCTURE MINISWHITE flag),
and compare_to_oracle accepts a hand-built DataArray for each
convention.

No backend wiring; that comes in Phase 3 per the plan on xarray-contrib#1930.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 16, 2026
Self-review pass on PR xarray-contrib#1994:

- _stamp_nodata_pixels: reject bool sentinels explicitly so a manifest
  entry like nodata: true can't slip a 1 into the raster. Matches the
  write-side gate from xarray-contrib#1990.
- test_fixture_is_a_valid_tiff: tighten size budget from 4 KB to 2 KB
  (largest fixture today is 1402 bytes), so silent bloat trips the
  regression rather than drifting toward the documented limit.
- test_int_sentinel_round_trips_through_rasterio: also assert
  src.nodata is not NaN, since rasterio reports nodata as a float.
- manifest description: clarify that the masked-data path is reachable
  once Phase 3 backend wiring lands, not in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant