VRT metadata parity tests across backends (#2333) by brendancol · Pull Request #2338 · xarray-contrib/xarray-spatial

brendancol · 2026-05-23T02:44:32Z

Closes #2333. Part of #2321 (release hardening sub-task 3 of 6).

Summary

Add xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py locking the cross-backend metadata contract for VRT reads. Each backend's attrs and coord arrays must match the numpy baseline; a regression that fixes pixels but breaks attrs (or vice versa) trips one of the paired assertions, never both silently.
Cover three positive cases (full georef, transform-only, integer with nodata sentinel) parametrized over numpy, dask, and GPU; GPU is gated by pytest.importorskip('cupy').
Cover five negative cases (mixed CRS, mixed per-band nodata, unsupported resampling with size-changing DstRect, malformed SrcRect/DstRect, missing sources under 'raise' and 'warn').
The mixed-CRS case is currently silently flattened to the VRT-declared SRS. It lands as xfail(strict=True) so it flips to XPASS the moment sub-PR 2's validator closes the gap.

Backend coverage

numpy: eager via open_geotiff(vrt) (dispatcher path)
dask: lazy via open_geotiff(vrt, chunks=2) (dispatcher path)
GPU: eager via read_vrt(vrt, gpu=True) (open_geotiff rejects .vrt + gpu=True; the direct entry point owns the gpu kwarg)

pytest.importorskip('cupy') plus a cupy.cuda.is_available() check guards the GPU cases.

Test plan

pytest xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py -v (27 passed, 1 xfailed locally with a working CUDA device)
pytest against the surrounding VRT regression suite (84 passed, 1 xfailed)

Notes

Sub-PR 2's VRTUnsupportedError is not landed yet. The negative tests carry TODO(#2321) comments and assert against the current error types; the upgrade is mechanical when PR 2 ships.
Temp file names include _2321_ per CLAUDE.md.

Locks the cross-backend metadata contract for VRT reads: eager numpy, dask via the chunks= dispatcher, and GPU eager via read_vrt(gpu=True). Each backend's attrs (transform, crs, georef_status, nodata, masked_nodata) and coord arrays must match the numpy baseline for the full-georef, transform-only, and integer-with-nodata cases. Negative tests pin the fail-closed posture for ambiguous VRT input: mixed CRS, mixed per-band nodata, unsupported resampling with a size-changing destination rectangle, malformed SrcRect/DstRect, and missing sources under both 'raise' and 'warn' policies. The mixed-CRS case is currently silently flattened and lands as xfail(strict=True) so it surfaces as XPASS the moment sub-PR 2's validator closes the gap. Part of #2321 (release hardening sub-task 3). Temp file names include '_2321_' per CLAUDE.md.

brendancol

PR Review: VRT metadata parity tests across backends (#2338)

Tests-only PR adding backend parity coverage for VRT reads. The cross-backend assertion shape (key-set diff + value diff + paired pixel test) is solid, and the xfail(strict=True) on the mixed-CRS case is the correct way to pin a known gap. A few findings on tightening the safety net and on de-duplicating against the existing test infrastructure.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:51-62 duplicates the project's shared GPU detection helper. xrspatial/geotiff/tests/conftest.py:14-53 already exports gpu_available() and requires_gpu = pytest.mark.skipif(...). The local _cupy_available / _HAS_GPU block reaches the same answer, but it runs at import time twice (once here, once in conftest) and risks drifting from the canonical implementation. Replace the local _HAS_GPU with from xrspatial.geotiff.tests.conftest import requires_gpu and apply marks=requires_gpu in the _BACKENDS list instead of an inline pytest.mark.skipif(not _HAS_GPU, ...).
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:739-749 and 825-836: test_unsupported_resample_alg_raises and test_negative_dstrect_size_rejected assert against the exception type but not the message. A regression that swaps in an unrelated NotImplementedError (e.g., a code path that hasn't been wired up yet) or an unrelated ValueError (e.g., the source validator firing on a different field) would keep the test green. Add a match= clause naming the offending field: match=r"Bilinear|1751" for the resample test, match=r"DstRect" or similar for the dstrect test, so the test pins the actual rejection path rather than "something raised."
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:896-928: test_missing_sources_warn_records_holes weakens silently if the hole data structure changes. Line 924 reads holes[0].get('source') if isinstance(holes[0], dict) else None and line 925 only asserts when hole_source is not None. If a future refactor changes hole entries from dicts to dataclass instances (or anything else), the path-equality assertion is silently skipped and the test still passes. Make the dict check a hard assertion: assert isinstance(holes[0], dict), f"hole entry type drifted: {type(holes[0])}", then hole_source = holes[0]['source'] and the existing path check. The contract documented in #1734 promises a dict shape, so the test should pin it.
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:574-612: _write_mixed_crs_vrt's docstring (lines 510-518) says "the existing reader rejects the conflicting WKT at parse time," but the xfail block on test_mixed_crs_vrt_does_not_silently_flatten records the opposite: the read succeeds and silently flattens. The two comments contradict each other. The helper docstring should match reality. Either rewrite it to say "today the per-source CRS check does NOT reject this; see the xfail on the consumer test," or drop the misleading sentence. Keeping it makes the next reader doubt which one is current.

Nits (optional improvements)

xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:75-85: _UTM33N_WKT is defined and commented "Used for the mixed-CRS negative" but never referenced anywhere in the file. The mixed-CRS fixture uses attrs={'crs': 32633} on the second source and lets to_geotiff resolve the WKT; the constant is dead. Either wire it into the fixture so the test exercises a literal-WKT mixed-CRS case (not just an EPSG-code one), or drop it.
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:661-667: the dask parametrize for test_mixed_nodata_vrt_fails_closed_by_default uses a bare lambda, so the pytest test id renders as dask_chunks_2-<lambda> instead of a readable name. Define _read_dask_chunks_2(p): return open_geotiff(p, chunks=2) at module scope and pass that, or pass id= explicitly on the pytest.param, so failures render [dask_chunks_2] cleanly.
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:39-40: to_geotiff is imported from xrspatial.geotiff but write is imported from xrspatial.geotiff._writer. The two writers aren't interchangeable (public surface vs internal one). Worth a one-line comment near the import block stating why both are needed. The fixture helpers below use to_geotiff for full-coords DataArray sources and write for raw numpy arrays (the per-band integer fixtures), which is fine, just make the split visible to the next reader.
xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py:611: the # noqa: B017 comment refers to flake8-bugbear's "do not assert blind exception" rule, but the project's lint config may not enable bugbear. If B017 isn't in the active ruleset the noqa pragma is a no-op and can be dropped. Defensive comments naming a non-active rule add noise without protecting anything.
Module docstring at lines 1-29 lists gdal_metadata_xml and extra_tags in the contract list (lines 8-9), but both keys live in _VRT_OMITTED_ATTR_KEYS and are never asserted on (the VRT path documented at _backends/vrt.py intentionally omits them). The docstring overstates the file's coverage. Either trim the contract list at the top to match the keys the file actually asserts on, or add an explicit "this file's coverage stops at <list>; gdal_metadata_xml / extra_tags parity is owned by the non-VRT backend parity suite" note so the next reader doesn't chase a phantom assertion.

What looks good

Paired attrs/pixels/coords tests (one VRT case, three separate test functions): a regression that fixes one and breaks another surfaces on exactly one assertion. Helps triage.
xfail(strict=True) on the mixed-CRS gap flips to XPASS the moment sub-PR 2's validator closes the gap, which forces the follow-up commit to convert it to a proper pytest.raises(VRTUnsupportedError). The reasoning is captured in the xfail message.
The eager-vs-dask _BACKEND_LIFECYCLE_KEYS filter for nodata_pixels_present is documented (lines 114-120) with a citation to issue #2135. Without that note the cross-backend equality would either fail or silently drop a real attr.
_to_numpy uses the .data.get() path for cupy buffers per the project's GPU convention rather than .values.
Every temp file name embeds _2321_ per CLAUDE.md so parallel runs do not collide.

Checklist

No algorithmic code changed (tests-only PR)
Backend parity exercised across numpy / dask / GPU
NaN handling pinned via the masked-nodata case
Edge cases: file does not cover single-pixel rasters, all-NaN VRTs, or empty SrcRect (probably out of scope here; the existing suite covers those individually)
Dask chunk boundaries handled (chunks=2 on 4x4 grids forces non-trivial chunking)
No premature materialization (only test paths that need value comparison call .compute() / .values)
[N/A] Benchmark (tests-only)
[N/A] README feature matrix (no new public function)
Docstrings: module docstring slightly overstates coverage (see Nits); per-test docstrings are clear

- Reuse the project-wide requires_gpu marker from xrspatial.geotiff.tests.conftest instead of re-implementing the cupy + CUDA probe locally. - Tighten the unsupported-resample and bad-DstRect negatives with match= clauses on the algorithm name and the field name so an unrelated raise from elsewhere in the VRT stack cannot keep the tests green. - Replace the conditional isinstance check on the vrt_holes entry in the missing_sources='warn' test with a hard assertion + key access, so a future refactor that changes the entry shape surfaces instead of silently weakening the path-equality check. - Fix the misleading docstring on _write_mixed_crs_vrt; today the per-source CRS check does NOT reject the conflict, which is what the xfail on the consumer test documents. - Drop the unreferenced _UTM33N_WKT constant. - Replace the inline lambda in the mixed_nodata parametrize with a named module-level _read_dask_chunks_2 so the pytest id renders readably (no more <lambda> in failure output). - Add a comment in the import block explaining why to_geotiff and write are both pulled in. - Drop the # noqa: B017 pragma; the project lint config does not enable flake8-bugbear, so it was a no-op. - Move gdal_metadata_xml / extra_tags out of the module docstring's "covered" list and into a new "out of scope" section so the doc matches the actual assert surface.

brendancol

Follow-up review pass (#2338)

Commit 06268931 addresses every finding from the prior review. Disposition summary:

Suggestions (4): all fixed

Reuse requires_gpu from xrspatial/geotiff/tests/conftest.py:51 instead of the local _cupy_available / _HAS_GPU block. The local helper was deleted; _BACKENDS now uses marks=requires_gpu.
test_unsupported_resample_alg_raises now passes match=r"Bilinear|1751". test_negative_dstrect_size_rejected now passes match=r"DstRect.*negative size" (verified against the actual error message string).
test_missing_sources_warn_records_holes: the conditional isinstance(...) else None shim is gone. The test now hard-asserts isinstance(holes[0], dict) and reads holes[0]['source'] directly, so a future entry-type drift surfaces here.
_write_mixed_crs_vrt docstring rewritten so it matches the xfail on the consumer test: today the per-source CRS check does NOT reject the conflict.

Nits (5): all applied

_UTM33N_WKT constant removed.
Inline lambda in the mixed_nodata parametrize replaced by a named module-level _read_dask_chunks_2. Test ids now render [dask_chunks_2-_read_dask_chunks_2].
Import block carries a comment explaining why to_geotiff and write are both pulled in.
# noqa: B017 pragma dropped; the project lint config does not enable bugbear so it was a no-op.
Module docstring contract list trimmed to the keys this file actually asserts on (transform, crs, nodata, masked_nodata, georef_status, raster_type), with crs_wkt / gdal_metadata_xml / extra_tags moved into an explicit "out of scope" section.

Test status

27 passed, 1 xfailed (the mixed-CRS gap that sub-PR 2 must close). 84 passed, 1 xfailed across the broader VRT test suite locally.

github-actions Bot added the performance PR touches performance-sensitive code label May 23, 2026

brendancol commented May 23, 2026

View reviewed changes

brendancol merged commit 7172cab into main May 23, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRT metadata parity tests across backends (#2333)#2338

VRT metadata parity tests across backends (#2333)#2338
brendancol merged 2 commits into
mainfrom
issue-2333

brendancol commented May 23, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 23, 2026

Summary

Backend coverage

Test plan

Notes

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: VRT metadata parity tests across backends (#2338)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review pass (#2338)

Suggestions (4): all fixed

Nits (5): all applied

Test status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant