geotiff: golden corpus phase 3 PR 5, HTTP/COG backend (#1930)#2041
Merged
Conversation
This was referenced May 18, 2026
brendancol
added a commit
that referenced
this pull request
May 18, 2026
* geotiff: oracle understands masked-nodata candidates (#1930) Closes the 5-of-7 xfail shared by every phase-3 backend module. xrspatial reads integer GeoTIFFs whose nodata tag carries an integer sentinel by masking sentinel-equal pixels to NaN and upcasting the array to float (issue #1988). It stamps ``attrs['masked_nodata'] = True`` so a write round-trip can put the tag back. The oracle previously did strict dtype + raw-pixel comparison, so the dtype shift and sentinel-to-NaN rewrite both registered as mismatches. This change adds ``_normalise_for_masked_nodata`` to ``_oracle.py``. When the candidate reports the contract, the rasterio reference is cast to the candidate's float dtype and pixels equal to the sentinel are replaced with NaN; the dtype and pixel assertions then run on directly comparable arrays. Fixtures that do not report the contract pass through unchanged, so the 23+ currently passing fixtures and the citation / JPEG xfails are untouched. Drops the five nodata-masking entries from ``_PARITY_GAPS`` in each of the three on-main backend modules (eager numpy, dask numpy, GPU). The matching ``xfail(strict=True)`` flips to a real failure when the test starts passing, so the oracle change and the cleanup must land together to keep main green. Test coverage in ``golden_corpus/test_oracle.py``: * masked_nodata candidate matches the oracle * masked_nodata flag missing -- strict dtype check still fires * candidate that forgot to mask a sentinel pixel -- pixel check fires * candidate with wrong ``attrs['nodata']`` -- nodata check fires * plain float-NaN fixtures are not affected by the new path * masked_nodata flag plus NaN sentinel passes through cleanly Follow-ups: the same five entries will be removed from the ``_PARITY_GAPS`` tables in the still-open phase-3 PRs #2040 (dask+GPU), #2041 (HTTP), and #2042 (VRT) via separate commits on each branch. * geotiff: tighten masked-nodata oracle guards (#1930) Address PR #2046 review: * Reject fractional sentinels via ``float(nd_float).is_integer()``. Without the guard, a sentinel like ``3.5`` would cast to ``3`` and mask every 3-valued pixel. * Reject sentinels outside the source integer dtype's range via ``info.min <= int(nd_float) <= info.max``. Without the guard, ``np.uint16(-1.0)`` wraps to ``65535`` and would mask the dtype-max pixel. Both guards mirror the upstream xrspatial reader's checks in ``xrspatial/geotiff/__init__.py``, so the oracle's interpretation of the masked-nodata contract is now strictly tighter than what the reader can emit. Added two tests: * ``test_masked_nodata_fractional_sentinel_does_not_mask`` -- builds a fractional-nodata fixture, confirms the oracle stays on the raw-pixel path (dtype mismatch fires, not pixel mismatch). * ``test_masked_nodata_out_of_range_sentinel_does_not_mask`` -- rasterio refuses to write an out-of-range nodata at the writer level, so this calls ``_normalise_for_masked_nodata`` directly with synthesised inputs and confirms the inputs pass through unchanged.
The HTTP path in ``xrspatial.geotiff`` is COG-only:
``open_geotiff('http://...')`` routes to ``_read_cog_http`` which
uses range requests to fetch only metadata and the tiles a window
needs. Plain stripped/tiled fixtures would require a full-object
download, which is a different code path; this module focuses
narrowly on the COG fixture.
The in-process HTTP server reuses the range-supporting
``BaseHTTPRequestHandler`` pattern from
``test_http_meta_buffer_1718.py``. It binds an ephemeral
127.0.0.1 port, serves the COG fixture's bytes, and lets the test
read through ``open_geotiff(url)`` with the SSRF guard opened for
loopback via ``XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1``.
The single corpus COG fixture (``cog_internal_overview_uint16``)
passes the oracle through the HTTP path. A
``test_at_least_one_cog_fixture_exists`` guard surfaces the case
where a future refactor accidentally drops the only COG entry.
Address phase 3 PR 5 review: * Pull the ``lossy`` flag from the manifest entry via a new ``_is_lossy(fixture_id)`` helper. The hardcoded ``lossy=False`` in the oracle call would silently fail bit-exact comparison if a future lossy COG (e.g. JPEG-COG) landed in the corpus. * Add ``_resolved_fixtures()`` so the manifest is parsed once per helper. Mirrors the eager / dask / GPU modules. * Update the module docstring to reflect that the current corpus has no COG fixture in a parity gap; document the path forward if one is added. * Join the HTTP server thread with a 2 s timeout in the test's ``finally`` block so a hung ``httpd.shutdown()`` surfaces cleanly rather than waiting on process exit.
16f6c50 to
e9e66e1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 PR 5 of the golden corpus plan in #1930. The HTTP path in
xrspatial.geotiffis COG-only:open_geotiff('http://...')routes to_reader._read_cog_httpwhich uses range requests to fetch only metadata and the tiles a window needs. Plain stripped or tiled fixtures over HTTP would require a full-object download, which is a different code path; this module focuses narrowly on the COG fixture.The in-process HTTP server reuses the range-supporting
BaseHTTPRequestHandlerpattern fromtest_http_meta_buffer_1718.py. It binds an ephemeral 127.0.0.1 port, serves the COG fixture's bytes, and lets the test read throughopen_geotiff(url)with the SSRF guard opened for loopback viaXRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1(the same hatch every other in-process HTTP test in the repo uses).The corpus has one COG fixture today (
cog_internal_overview_uint16, phase 2 PR 7). It passes the oracle through the HTTP path. Atest_at_least_one_cog_fixture_existsguard catches the case where a future refactor accidentally drops the only COG entry.Test plan
pytest xrspatial/geotiff/tests/test_golden_corpus_http_1930.py: 2 passedhttpd.shutdown()+server_close()in atry/finally)Refs #1930. Fifth of six backend-wiring PRs; VRT remains.