Skip to content

geotiff: chunked-VRT coverage for missing_sources policy (#1799)#1959

Merged
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-test-coverage-geotiff-2026-05-15-1778858256-04
May 15, 2026
Merged

geotiff: chunked-VRT coverage for missing_sources policy (#1799)#1959
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-test-coverage-geotiff-2026-05-15-1778858256-04

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes a Cat 1 MEDIUM cross-backend equivalence gap for issue #1799 (VRT missing_sources policy). test_vrt_missing_sources_policy_1799.py covers the eager read_vrt path but not the chunked path.

The chunked path (_backends/vrt.py:_read_vrt_chunked) plumbs missing_sources through two separate code paths:

  1. A parse-time static os.path.exists sweep at vrt.py:595-621 that populates attrs['vrt_holes'] at build time without forcing a compute (the contract documented in geotiff: read_vrt default lenient mode hides source read failures as zero-filled holes #1734 for partial-mosaic detection on the lazy graph).
  2. Per-chunk forwarding of missing_sources to _vrt_chunk_read so each task applies the same warn/raise policy as the eager path.

A regression dropping the parse-time sweep would break partial-mosaic detection on the lazy build; a regression dropping the per-chunk forward would silently flip missing_sources='raise' to 'warn' (or vice versa) on the chunked path.

8 new tests, all passing:

Mutation against the attrs['vrt_holes'] population at vrt.py:620-621 flipped the two build-time tests red.

This is a deep-sweep test-coverage PR. No source files were modified.

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_chunked_missing_sources_1799.py (8 passed)
  • pytest xrspatial/geotiff/tests/test_vrt_chunked_missing_sources_1799.py xrspatial/geotiff/tests/test_vrt_missing_sources_policy_1799.py (11 passed)
  • Mutation: drop attrs['vrt_holes'] = chunked_holes at vrt.py:620 -> 2 build-time tests fail

Copilot AI review requested due to automatic review settings May 15, 2026 15:37
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@brendancol brendancol force-pushed the deep-sweep-test-coverage-geotiff-2026-05-15-1778858256-04 branch from 22d0aa1 to 80ec851 Compare May 15, 2026 15:42
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: chunked-VRT coverage for missing_sources policy (#1799)

Blockers (must fix before merge)

  • None

Suggestions (should fix, not blocking)

  • test_vrt_chunked_missing_sources_1799.py:96-99 only checks that holes[0]['source'] ends with missing.tif. The full record schema (band, dst_rect, error) is documented at _backends/vrt.py:608 and a regression dropping any of those fields would not be caught. Asserting on the dict keys (or the band and dst_rect values) would pin the schema the eager path also returns.
  • test_vrt_chunked_missing_sources_1799.py:120-123 (test_compute_emits_per_task_warning) pins the present-side decode to 7.0 but does not assert anything about the missing-side fill. The PR description says it decodes to "typically zero for float32" but the test silently allows any value. If the lenient path stops zero-filling and returns garbage on the missing half, this test would still pass. Either pin the missing-side fill or drop the comment.
  • test_compute_intersecting_missing_raises and test_chunked_default_raises_on_compute catch (OSError, FileNotFoundError, ValueError) which is loose. The eager-path equivalent at test_vrt_missing_sources_policy_1799.py:41 catches (OSError, ValueError). If you want parity with the eager test, drop FileNotFoundError (it is already a subclass of OSError).

Nits (optional improvements)

  • test_vrt_chunked_missing_sources_1799.py:55-75 builds the VRT XML by string concatenation. The sibling helper in test_vrt_chunked_shared_dataset_1923.py does the same, so this is consistent, but an f-string template or textwrap.dedent would be easier to read.
  • test_vrt_chunked_missing_sources_1799.py:141 comment says "2 chunks vertically * 2 chunks horizontally = 4 tasks" but the assertion is only on len(vrt_holes) == 1. The 4-task claim is unverified; either assert on result.data.numblocks or drop the comment.
  • _make_partial_vrt returns (vrt_path, present_src_path) but every call site discards the second element with _. Returning just vrt_path would simplify the signature.

What looks good

Checklist

  • Algorithm matches reference/paper (n/a, test-only)
  • All implemented backends produce consistent results (test pins cross-mode parity between eager and chunked)
  • NaN handling is correct (n/a for missing-source policy)
  • Edge cases are covered by tests (warn, raise, default, invalid policy, tuple chunks, windowed)
  • Dask chunk boundaries handled correctly (test exercises chunks that do and do not intersect missing sources)
  • No premature materialization or unnecessary copies (test explicitly checks compute() is deferred)
  • Benchmark exists or is not needed (n/a, test-only)
  • README feature matrix updated (if applicable) (n/a, test-only)
  • Docstrings present and accurate

brendancol added a commit to brendancol/xarray-spatial that referenced this pull request May 15, 2026
@brendancol brendancol force-pushed the deep-sweep-test-coverage-geotiff-2026-05-15-1778858256-04 branch from f9c4511 to a61436a Compare May 15, 2026 17:27
brendancol added a commit to brendancol/xarray-spatial that referenced this pull request May 15, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: chunked-VRT coverage for missing_sources policy (#1799) (re-review after rebase)

Blockers (must fix before merge)

  • None

Suggestions (should fix, not blocking)

  • test_vrt_chunked_missing_sources_1799.py:160 and :197 catch (OSError, ValueError). The eager path's contract on missing_sources='raise' is more specific than that. If the reader raises a FileNotFoundError (subclass of OSError) on missing files today, pinning that subclass would catch a regression where the raise switches to a generic RuntimeError or Exception. Consider narrowing once the exact exception type is confirmed against the eager test in test_vrt_missing_sources_policy_1799.py.
  • test_vrt_chunked_missing_sources_1799.py:130-141 (test_chunks_tuple_form): the assertion only checks len(result.attrs["vrt_holes"]) == 1. The chunks tuple form is meant to thread through identically, so asserting the chunk shape on result.data (e.g. result.data.chunks == ((2, 2), (4, 4))) would pin the actual tuple-form behavior the docstring claims.

Nits (optional improvements)

  • test_vrt_chunked_missing_sources_1799.py:39 (_make_partial_vrt): could be a pytest fixture rather than a helper function. Minor; the helper is clear enough as is.
  • test_vrt_chunked_missing_sources_1799.py:57-76: the VRT XML is built by string concatenation. A small helper or textwrap.dedent with a triple-quoted template would read more cleanly, but the current form is fine.

What looks good

  • Each test class targets one invariant (warn/raise/default/validation) with a focused docstring that explains why the contract matters.
  • Both sides of the mosaic are asserted in test_compute_emits_per_task_warning (:125-128): present-source fill value pinned and missing-source NaN pinned.
  • test_compute_present_only_chunk_succeeds (:163) covers the window scoping rule, which is a real edge case.
  • vrt_holes record schema is pinned at :103 ({"source", "band", "dst_rect", "error"}), which will catch silent key renames in the parse-time sweep.
  • Parity test at :209 confirms the eager and chunked paths reject the same invalid policy value.

Checklist

  • Algorithm matches reference/paper (test-only PR; pins existing behavior)
  • All implemented backends produce consistent results (eager vs chunked parity pinned)
  • NaN handling is correct (lenient missing source -> NaN asserted at :128)
  • Edge cases are covered by tests (window scoping, tuple chunks, invalid policy)
  • Dask chunk boundaries handled correctly (chunks=4 splits at the missing-source boundary)
  • No premature materialization or unnecessary copies (vrt_holes checked before .compute())
  • Benchmark exists or is not needed (N/A - test only)
  • README feature matrix updated (if applicable) (N/A)
  • Docstrings present and accurate

…rib#1799)

The existing test_vrt_missing_sources_policy_1799 covers the eager
read_vrt path but not the chunked path. The chunked path
(_backends/vrt.py:_read_vrt_chunked) plumbs missing_sources through
two separate code paths:

- A parse-time static os.path.exists sweep at vrt.py:595-621 that
  populates attrs['vrt_holes'] at build time without forcing a
  compute (the contract documented in xarray-contrib#1734 for partial-mosaic
  detection on the lazy graph).
- Per-chunk forwarding of missing_sources to _vrt_chunk_read so each
  task applies the same warn/raise policy as the eager path.

A regression dropping the parse-time sweep would break partial-mosaic
detection on the lazy build; a regression dropping the per-chunk
forward would silently degrade missing_sources='raise' to 'warn' (or
vice versa) on the chunked path.

Adds 8 tests, all passing:
- missing_sources='warn' populates vrt_holes at build + emits warning
  on compute + present-source chunk decodes correctly
- chunks=(h, w) tuple form threads through identically
- missing_sources='raise' fails at compute on chunks touching missing
  sources but succeeds on windowed compute that misses them
- default missing_sources is 'raise' on the chunked path (parity with
  eager default flipped in xarray-contrib#1843 / xarray-contrib#1860)
- invalid policy raises ValueError at build, parity with eager path

Mutation against attrs['vrt_holes'] population at vrt.py:620-621
flipped the two build-time tests red.
@brendancol
Copy link
Copy Markdown
Contributor Author

Rebased onto current main at 7085ee8 to pick up the Python 3.14 CI matrix limit (b1579f8).

@brendancol brendancol force-pushed the deep-sweep-test-coverage-geotiff-2026-05-15-1778858256-04 branch from a61436a to 7085ee8 Compare May 15, 2026 17:51
@brendancol brendancol merged commit c2b6517 into xarray-contrib:main May 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants