Skip to content

Positive VRT mosaic tests over GeoTIFF sources (#2369)#2373

Open
brendancol wants to merge 6 commits into
mainfrom
issue-2369
Open

Positive VRT mosaic tests over GeoTIFF sources (#2369)#2373
brendancol wants to merge 6 commits into
mainfrom
issue-2369

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2369. Sub-task of #2342.

Positive coverage for the simple VRT mosaic subset that the release contract lists as supported. The other VRT tests touch this subset while exercising codecs, dtypes, or chunk policies; nothing pinned the contract by itself.

Six tests, all driving the public read_vrt:

  • eager 2x1 horizontal mosaic
  • eager 2x2 mosaic with one distinct value per tile (catches placement swaps loudly)
  • windowed read aligned with the source seam
  • dask read of both mosaics with multiple chunks
  • multi-band 2x1 mosaic

Each test asserts values, coords, and key attrs (crs, transform, nodata).

Backend coverage

  • numpy: yes
  • dask + numpy: yes
  • cupy / dask + cupy: not in scope (the release contract only promises eager + dask for VRT)

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py passes locally
  • CI runs green on the same file

Several call sites in xrspatial/geotiff/ classified HTTP(S) sources
with startswith(('http://', 'https://')). That comparison is
case-sensitive, so URLs like HTTP://169.254.169.254/... slipped past
_HTTPSource and the SSRF / DNS-pinning checks in _validate_http_url
and fell through to fsspec. RFC 3986 section 3.1 defines URI schemes
as case-insensitive.

Adds _is_http_source(s) in _sources.py, backed by
urlparse(s).scheme.lower() in ('http', 'https'), and routes every
dispatch site through it: _sources.py (_is_fsspec_uri + _open_source),
_reader.py, _writer.py (_is_fsspec_uri), _backends/dask.py,
_backends/gpu.py (eager-via-CPU branch + GDS opt-in gate), and the
_sidecar.py _is_http_url helper.

New regression tests in test_http_scheme_case_2321.py cover the
helper, _open_source dispatch for HTTP / HTTPS / Http / hTTpS, the
partner _is_fsspec_uri classifiers in both _sources and _writer, the
sidecar helper, and the end-to-end SSRF rejection of uppercase URLs
that resolve to 127.0.0.1, 169.254.169.254, 10.0.0.1, 192.168.1.1,
0.0.0.0, and localhost. socket.getaddrinfo is monkeypatched so the
tests stay offline.

Parent: #2321 sub-task 5.
#2332)

Follow-ups on the /review-pr pass for PR #2337:

* Hoist ``from urllib.parse import urlparse`` to ``_sources.py`` module
  top so ``_is_http_source`` no longer pays the import cost per call.
* Hoist ``from ._sources import _is_http_source`` to module top in
  ``_writer.py`` and ``_sidecar.py`` (no cycle: neither is imported by
  ``_sources``).
* Keep the function-scope imports in ``_backends/dask.py`` and
  ``_backends/gpu.py``: the backend modules deliberately defer the
  reader / sources import so the package can be imported without
  urllib3 in dask-only environments. Add a one-line comment so the
  next reader does not assume it is an oversight.
* Reject HTTP(S) write targets in ``_write_bytes`` with a typed
  ``NotImplementedError`` before the local file write path interprets
  the URL as a filename. The error message lists the supported
  destinations. Covers both lowercase and uppercase scheme casing.
* Lock the behavioural broadening: ``_is_http_source('http:foo')``
  now returns True (urlparse-correct, broader than the old
  ``startswith``). Added a test asserting the classifier matches and
  that ``_open_source('http:foo')`` raises ``UnsafeURLError`` so the
  validator still rejects it as "no hostname".
* Added 7 new test cases (helper + open_source + 5 writer-rejection
  parametrizations); full geotiff suite: 5324 passed, 68 skipped.
Reconciles the case-insensitive HTTP(S) dispatch helper introduced on
both branches:

* main #2326 added ``_is_http_url`` in ``_sources.py`` and routed the
  geotiff dispatch sites through it.
* PR #2337 added the same fix as ``_is_http_source`` with broader test
  coverage, hoisted imports, and a typed ``NotImplementedError`` for
  HTTP(S) write targets in ``_write_bytes``.

Resolution keeps ``_is_http_source`` as the canonical name (used by
this PR's tests and the writer-side guard) and adds an ``_is_http_url``
alias in ``_sources.py``, plus a re-export from ``_reader.py``, so the
test file shipped with #2326 (``test_uppercase_scheme_ssrf_2323.py``)
keeps importing ``_is_http_url`` unchanged. The ``_sidecar.py`` wrapper
``_is_http_url`` now delegates to ``_is_http_source`` directly instead
of through the canonical alias.

Full geotiff suite: 5413 passed, 64 skipped, 1 xfailed, 1 xpassed.
Sub-task of #2342. Covers the supported simple-VRT subset on both
the eager numpy and dask read paths:

- 2x1 horizontal mosaic of two compatible tiles
- 2x2 mosaic of four compatible tiles
- Windowed read aligned with the source seam
- Dask reads of both mosaics with multiple chunks
- Multi-band 2x1 mosaic

Each test asserts values, coords, and key attrs (crs, transform,
nodata) so the contract is checked at the pixel level rather than
on shape alone.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 25, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Positive VRT mosaic tests over GeoTIFF sources (#2369)

Blockers

None.

Suggestions

  • xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:118,146,184: the three fixtures use tempfile.mkdtemp(prefix='tmp_2369_...'), which leaks directories between runs. The implementation notes called for tmp_path, and most newer tests in this directory use the pytest fixture. Switching gets auto-cleanup and keeps the prefix-based collision protection.
  • xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:240-254: _assert_attrs_ok checks the pixel size on the transform but not origin_x / origin_y. A transform with the wrong origin would still pass. Asserting transform[2] and transform[5] against the fixture origin would catch that case.

Nits

  • xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:333: h, w_total = expected.shape -- w_total is unused. Drop it or reference it in the comment.
  • xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:410-431: no dask variant of the multi-band test. The eager case covers the contract; a chunked multi-band read would also pin down per-block band handling.

What looks good

  • Six tests, narrow scope, exactly the supported subset from the issue.
  • 2x2 fixture uses one distinct constant per tile so placement bugs show up in the value diff instead of as numeric drift.
  • Each test asserts values, coords, and key attrs (crs, transform, nodata).
  • Tile builders are short and shared cleanly across fixtures.

Checklist

  • Algorithm matches reference: synthesises mosaics from to_geotiff + internal write_vrt, mirroring test_vrt_lazy_chunks_1814.py.
  • All implemented backends produce consistent results: dask compute compared against eager.
  • NaN handling: not exercised in this PR; mosaics use finite data plus declared nodata sentinel.
  • Edge cases: windowed read on a 2x1 seam covers the documented boundary case.
  • Dask chunk boundaries handled correctly: 2x1 and 2x2 mosaics with sub-tile chunks both match eager.
  • No premature materialization: compute() only fires after the multi-block dask graph is built.
  • [n/a] Benchmark exists or is not needed: test-only PR.
  • [n/a] README feature matrix updated: no public API change.
  • Docstrings present and accurate.

- Switch the three fixtures from tempfile.mkdtemp to tmp_path
  subdirectories so pytest cleans them up between runs while keeping
  per-issue prefix protection.
- _assert_attrs_ok now optionally checks transform[2] (origin_x) and
  transform[5] (origin_y); every call site passes the fixture origin
  so a translation bug would not slip through.
- Drop the unused w_total in the windowed test.
- Add test_dask_multiband_2x1_mosaic_matches_eager so the multi-band
  contract is covered on the dask path too.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (post-fixes)

Disposition for the four findings from the previous review:

  • Fixture tempfile.mkdtemp -> tmp_path subdirs. Fixed.
  • _assert_attrs_ok now checks transform[2] / transform[5]. Every call site passes the fixture origin; the windowed call passes the shifted origin (ox + _PIXEL_W*c0, oy + _PIXEL_H*r0). Fixed.
  • Unused w_total removed; the windowed test now binds only h. Fixed.
  • Dask multi-band test added (test_dask_multiband_2x1_mosaic_matches_eager) so per-block band handling is covered. Fixed.

Total: 7 tests pass locally. No deferrals, no dismissals.

# Conflicts:
#	xrspatial/geotiff/_reader.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRT tests: positive coverage for simple GeoTIFF mosaics (#2342 work item)

1 participant