Skip to content

Add VRT missing source policy#1806

Merged
brendancol merged 2 commits into
mainfrom
issue-1799
May 13, 2026
Merged

Add VRT missing source policy#1806
brendancol merged 2 commits into
mainfrom
issue-1799

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1799.\n\nAdds read_vrt(missing_sources=...) with warn and raise modes, preserving historical warn behavior while giving pipelines an explicit fail-fast option.\n\nTested: pytest xrspatial/geotiff/tests/test_vrt_missing_sources_policy_1799.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 14:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an explicit, API-level policy to control how read_vrt handles unreadable/missing VRT source files, preserving the historical “warn and continue” behavior by default while enabling a fail-fast option for production pipelines (closes #1799).

Changes:

  • Add missing_sources={'warn','raise'} parameter to both internal VRT assembly and the public xrspatial.geotiff.read_vrt API.
  • Validate missing_sources values and implement fail-fast behavior when set to 'raise'.
  • Add tests and reference documentation describing the new policy.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_vrt.py Adds missing_sources handling to the internal VRT reader and integrates it with the existing strict-mode behavior.
xrspatial/geotiff/__init__.py Exposes missing_sources on the public read_vrt API and forwards it to the internal reader; documents the new parameter.
xrspatial/geotiff/tests/test_vrt_missing_sources_policy_1799.py Adds regression tests for warn vs raise behavior and policy validation.
docs/source/reference/geotiff.rst Documents the new missing-source policy option for VRT reads.
Comments suppressed due to low confidence (1)

xrspatial/geotiff/_vrt.py:907

  • The fallback warning text still suggests only XRSPATIAL_GEOTIFF_STRICT=1 as the way to raise on unreadable VRT sources. Now that missing_sources='raise' exists, the warning message should mention that option as well (otherwise users may miss the new API-level fail-fast control).
                    f"VRT source {src.filename!r} could not be read "
                    f"({type(e).__name__}: {e}); skipping. The output "
                    f"mosaic will have a hole at this tile. Inspect "
                    f"``DataArray.attrs['vrt_holes']`` or set "
                    f"XRSPATIAL_GEOTIFF_STRICT=1 to raise instead.",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_vrt.py
Comment on lines 611 to 615
def read_vrt(vrt_path: str, *, window=None,
band: int | None = None,
max_pixels: int | None = None) -> tuple[np.ndarray, VRTDataset]:
max_pixels: int | None = None,
missing_sources: str = 'warn') -> tuple[np.ndarray, VRTDataset]:
"""Read a VRT file by assembling pixel data from its source files.
Comment on lines +3795 to +3796
preserves the historical behavior: emit ``GeoTIFFFallbackWarning``,
record ``attrs['vrt_holes']``, and return a partial mosaic.
@brendancol brendancol merged commit 5773b13 into main May 13, 2026
10 of 11 checks passed
brendancol added a commit that referenced this pull request May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the
`_read_vrt_dask` dispatch hook from the PR branch. All other
geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804,
#1805, #1806) were already integrated into the working tree by the
prior 7329dd9 commit; this merge just records the parent so git
recognises the reconciliation.
brendancol added a commit that referenced this pull request May 14, 2026
#1811)

* geotiff: forward missing_sources from open_geotiff to read_vrt (#1810)

read_vrt gained a missing_sources='warn'|'raise' policy kwarg in #1806
but the documented dispatcher open_geotiff did not expose it. Callers
who wanted strict failure on broken VRT sources had to call read_vrt
directly, defeating the dispatcher contract.

Same class of bug as #1561 / #1605 / #1685 / #1795: a backend kwarg
reachable on the routed-to function is unreachable through the
documented entry point. Fix mirrors the on_gpu_failure pattern that
open_geotiff already uses for GPU-only kwargs:

- Add missing_sources= to open_geotiff with a sentinel default so the
  dispatcher can tell "caller never set it" (skip forwarding, let the
  read_vrt default of 'warn' apply) from "caller set it" (forward
  verbatim).
- Reject missing_sources on non-VRT sources with a clear ValueError
  so callers learn the policy is being ignored instead of getting a
  silent drop.
- Document the kwarg in the open_geotiff docstring with the same
  values and semantics as read_vrt.

Regression tests in test_open_geotiff_missing_sources_1810.py cover
the signature, warn/raise/invalid policies through the dispatcher,
non-VRT rejection, and the no-kwarg backward-compatible path.

* sweep: update api-consistency state for geotiff (v5)

Record the v5 api-consistency sweep against geotiff. One MEDIUM
finding filed and fixed (#1810). Cross-sibling write-return-type
drift remains deferred at LOW.

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRT missing-source handling returns partial mosaics by default

2 participants