Surface chunked VRT decode-time hole gap under missing_sources='warn' (#2989)#2991
Merged
Conversation
…#2989) The chunked VRT path's parse-time os.path.exists sweep records only missing-file holes in attrs['vrt_holes']. A source that exists but fails to decode at compute time reads as a hole in the per-chunk worker and is warned there, but cannot be reduced back onto the parent DataArray's attrs without an eager decode of every source (which would defeat lazy reading). Document attrs['vrt_holes'] under chunked dispatch as a lower bound (statically-detectable missing-file holes only) and emit one build-time GeoTIFFFallbackWarning under missing_sources='warn' when the requested window touches an existing (decode-capable) source, so callers do not treat the absence of the attr as proof of a complete mosaic.
brendancol
commented
Jun 6, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Surface chunked VRT decode-time hole gap under missing_sources='warn' (#2991)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
xrspatial/geotiff/_backends/vrt.py:1290-1308-- thedecode_capablescan re-walksvrt.bandsand every source with the same window-intersection test the static-hole loop at line ~1185 already runs. On a mosaic with many tiles that is a second full pass. Consider computingdecode_capableinside the existing static-hole loop: a source that intersects the window and exists is decode-capable, one that intersects and is missing is a hole. That avoids the duplicate traversal at the cost of a slightly busier loop body. Not blocking: the scan breaks on the first existing in-window source, so the common case is cheap.
Nits (optional improvements)
xrspatial/geotiff/_backends/vrt.py:1310-1312--import warningsandfrom .._runtime import GeoTIFFFallbackWarningsit inside theif decode_capableblock. That matches the eager path's local-import style in_vrt.py:1621, so it is consistent. A module-level import would be marginally cleaner; leave it if matching the existing pattern is the intent.
What looks good
- The window-intersection predicate is copied verbatim from the static-hole guard, so the heads-up and the attrs population agree on what "in window" means.
- The warning is gated on
missing_sources == 'warn'and sits after the raise guard, so'raise'and strict mode are untouched. - Suppressing the warning when no in-window source exists on disk keeps it from firing on all-missing VRTs where nothing can decode-fail, and a test pins that.
- Docstring, inline comment, and geotiff.rst describe the lower-bound contract consistently.
Checklist
- Algorithm matches reference (N/A, no numerical change)
- All implemented backends consistent (VRT reads are CPU/dask only)
- NaN handling correct (N/A, no pixel-value change)
- Edge cases covered by tests (corrupt source, all-missing, present source)
- Dask chunk boundaries handled correctly (window test reused from existing guard)
- No premature materialization (build-time scan is path-existence only, no decode)
- Benchmark not needed
- README feature matrix not applicable (no new function)
- Docstrings accurate
Compute the build-time heads-up flag inside the existing source-window scan instead of re-walking vrt.bands a second time. The static-hole loop now applies the window-intersection test to every in-window source and branches on os.path.exists: present sources mark decode_capable, missing ones append to chunked_holes. Single pass, same behaviour.
brendancol
commented
Jun 6, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after a69477a)
Disposition of prior findings
- Suggestion (duplicate traversal,
_backends/vrt.py): fixed.decode_capableis now computed inside the existing static-hole sweep. The loop applies the window-intersection test to every in-window source once and branches onos.path.exists: present sources set the flag, missing ones append tochunked_holes. No second pass. - Nit (local import of
warnings/GeoTIFFFallbackWarning): dismissed. It matches the eager path's local-import of the same symbols at_vrt.py:1621. A module-level import here would diverge from that established pattern, so leaving it keeps the two paths consistent.
State now
- Single source-list walk; behaviour unchanged.
- Full
xrspatial/geotiff/tests/vrt/suite green (502 passed); the four new #2989 tests pass. - flake8 clean on the changed file.
No new findings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2989
What
The chunked VRT path pre-populates
attrs['vrt_holes']from a parse-timeos.path.existssweep, which only catches sources whose backing file is missing. A source that exists but fails to decode at compute time (corrupt, truncated, codec error) reads as a hole inside the per-chunk worker and is warned there, but that record cannot be reduced back onto the parent DataArray's attrs without eagerly decoding every source. Eager decode would defeat lazy reading, so it is off the table.This PR makes the gap explicit instead of silent:
attrs['vrt_holes']under chunked dispatch as a lower bound (statically-detectable missing-file holes only), in the_read_vrtdocstring, the inline static-sweep comment, anddocs/source/reference/geotiff.rst.GeoTIFFFallbackWarningup front undermissing_sources='warn'when the requested window touches an existing (decode-capable) source, so a caller cannot treat the absence of the attr as proof of a complete mosaic. The warning is suppressed when no in-window source exists on disk, since nothing can decode-fail in that case.No behavior change for
missing_sources='raise'(still fails closed up front on missing files) or for the eager path.Backend coverage
VRT reads are CPU/dask only and do not go through the GPU decoder pipeline; the warning lands on the chunked dispatcher shared by dask+numpy and dask+cupy. No numpy/cupy eager-path change.
Test plan
attrs['vrt_holes']GeoTIFFFallbackWarningfires under chunked'warn'xrspatial/geotiff/tests/vrt/suite green (502 passed)