geotiff: golden corpus phase 3 PR 6, VRT backend (#1930)#2042
Merged
Conversation
This was referenced May 18, 2026
brendancol
added a commit
that referenced
this pull request
May 18, 2026
* geotiff: oracle understands masked-nodata candidates (#1930) Closes the 5-of-7 xfail shared by every phase-3 backend module. xrspatial reads integer GeoTIFFs whose nodata tag carries an integer sentinel by masking sentinel-equal pixels to NaN and upcasting the array to float (issue #1988). It stamps ``attrs['masked_nodata'] = True`` so a write round-trip can put the tag back. The oracle previously did strict dtype + raw-pixel comparison, so the dtype shift and sentinel-to-NaN rewrite both registered as mismatches. This change adds ``_normalise_for_masked_nodata`` to ``_oracle.py``. When the candidate reports the contract, the rasterio reference is cast to the candidate's float dtype and pixels equal to the sentinel are replaced with NaN; the dtype and pixel assertions then run on directly comparable arrays. Fixtures that do not report the contract pass through unchanged, so the 23+ currently passing fixtures and the citation / JPEG xfails are untouched. Drops the five nodata-masking entries from ``_PARITY_GAPS`` in each of the three on-main backend modules (eager numpy, dask numpy, GPU). The matching ``xfail(strict=True)`` flips to a real failure when the test starts passing, so the oracle change and the cleanup must land together to keep main green. Test coverage in ``golden_corpus/test_oracle.py``: * masked_nodata candidate matches the oracle * masked_nodata flag missing -- strict dtype check still fires * candidate that forgot to mask a sentinel pixel -- pixel check fires * candidate with wrong ``attrs['nodata']`` -- nodata check fires * plain float-NaN fixtures are not affected by the new path * masked_nodata flag plus NaN sentinel passes through cleanly Follow-ups: the same five entries will be removed from the ``_PARITY_GAPS`` tables in the still-open phase-3 PRs #2040 (dask+GPU), #2041 (HTTP), and #2042 (VRT) via separate commits on each branch. * geotiff: tighten masked-nodata oracle guards (#1930) Address PR #2046 review: * Reject fractional sentinels via ``float(nd_float).is_integer()``. Without the guard, a sentinel like ``3.5`` would cast to ``3`` and mask every 3-valued pixel. * Reject sentinels outside the source integer dtype's range via ``info.min <= int(nd_float) <= info.max``. Without the guard, ``np.uint16(-1.0)`` wraps to ``65535`` and would mask the dtype-max pixel. Both guards mirror the upstream xrspatial reader's checks in ``xrspatial/geotiff/__init__.py``, so the oracle's interpretation of the masked-nodata contract is now strictly tighter than what the reader can emit. Added two tests: * ``test_masked_nodata_fractional_sentinel_does_not_mask`` -- builds a fractional-nodata fixture, confirms the oracle stays on the raw-pixel path (dtype mismatch fires, not pixel mismatch). * ``test_masked_nodata_out_of_range_sentinel_does_not_mask`` -- rasterio refuses to write an out-of-range nodata at the writer level, so this calls ``_normalise_for_masked_nodata`` directly with synthesised inputs and confirms the inputs pass through unchanged.
Synthesises a two-source horizontal mosaic from a corpus fixture: the source ``.tif`` is rasterio-copied twice into ``tmp_path`` with the second copy's origin shifted east by one image width, then ``write_vrt`` builds the VRT XML and ``open_geotiff`` reads it back. The expected mosaic is ``np.concatenate`` of the source pixels along the x axis. Each cell asserts three things: the rasterio read of the VRT matches the manual concat (sanity check that the VRT itself is well-formed), the xrspatial read matches the same concat (the actual parity assertion), and the xrspatial dtype matches the rasterio dtype. Two source fixtures are parametrised: ``dtype_uint8`` and ``dtype_uint16``. Both are bit-exact comparable and use the manifest default transform. ``_VRT_SOURCE_IDS`` is validated against the manifest so a renamed source fixture surfaces a typo rather than silently letting the test skip.
Address phase 3 PR 6 review: * Replace the hand-rolled pixel + dtype assertion with ``compare_to_oracle(vrt_path, candidate)`` so the VRT cell checks pixels, dtype, transform, CRS, and nodata the same way every other phase-3 backend module does. * Keep the independent ``np.concatenate`` sanity check before the oracle call. The oracle also reads through rasterio; if the VRT writer drifts the oracle would silently agree with itself, so the manual concat verifies the rasterio reference is what we expect before using it. * Use ``_height, width`` to mark the unpacked height as unused.
c7a999d to
fb50742
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 PR 6 of the golden corpus plan in #1930. The VRT path is reached when
open_geotiffsees a.vrtsource; it delegates toread_vrtwhich parses the GDAL VRT XML and stitches the listed source GeoTIFFs back into a single DataArray.This module synthesises a two-source horizontal mosaic from a corpus fixture: the source
.tifis rasterio-copied twice intotmp_pathwith the second copy's origin shifted east by one image width, thenwrite_vrtbuilds the VRT XML, andopen_geotiffreads it back. The expected mosaic isnp.concatenateof the source pixels along the x axis.Each cell asserts three things:
Two source fixtures are parametrised:
dtype_uint8anddtype_uint16. Both are bit-exact comparable and use the manifest default transform._VRT_SOURCE_IDSis validated against the manifest so a renamed source fixture surfaces a typo rather than silently letting the test skip.Test plan
pytest xrspatial/geotiff/tests/test_golden_corpus_vrt_1930.py: 3 passedRefs #1930. Sixth and final backend-wiring PR; closes the phase 3 set (eager numpy #2036, dask+numpy #2038, GPU #2039, dask+GPU #2040, HTTP/COG #2041, VRT here).