Skip to content

Make open_geotiff bbox= work for .vrt sources#2674

Merged
brendancol merged 4 commits into
mainfrom
issue-2668
May 29, 2026
Merged

Make open_geotiff bbox= work for .vrt sources#2674
brendancol merged 4 commits into
mainfrom
issue-2668

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2668

What

open_geotiff(..., bbox=...) raised Invalid TIFF byte order marker on .vrt sources. The dispatcher resolved bbox to a pixel window via _bbox_to_window, which reads a TIFF header through _read_geo_info and cannot parse VRT XML. That happened before the VRT branch was reached, so the read died trying to read the XML as a TIFF.

Fix path chosen: real VRT bbox support

I implemented real bbox= support for VRT rather than rejecting it, because the VRT backend already parses the metadata the resolution needs:

  • Detect VRT sources before bbox resolution and route them through a new _vrt_bbox_to_window, which parses only the VRT XML (header-only, no tile decode) and reuses the synthesised GeoInfo the VRT read path already builds (_vrt_to_synthetic_geo_info).
  • Factor the bbox validation and extent math out of _bbox_to_window into a shared _geo_info_to_window, so the TIFF and VRT resolvers run identical bbox validation, rotated-affine / no-georef rejection, and out-of-extent clamping.
  • Update the dispatcher comment that wrongly claimed bbox already covered VRT.

The resolved window forwards to the existing backend dispatch, so the eager, dask, and GPU VRT paths all get bbox without any new kwarg.

Backend coverage

bbox= resolves to a pixel window= at the dispatcher, so all VRT backends inherit it: eager (numpy), dask (chunks=), and GPU (gpu=). The numpy and dask paths are covered by tests here; GPU follows the same window forwarding.

Test plan

  • bbox= on a VRT returns the correctly windowed mosaic, matching the equivalent explicit window= read and the full read sliced at the resolved offsets.
  • Full-extent bbox clamps to file bounds and matches a no-window read.
  • Chunked (chunks=) VRT bbox read matches the eager bbox read after compute.
  • Validation parity with the TIFF path: mutual-exclusion with window=, wrong arity, inverted bbox, NaN coords, out-of-extent, no-georef VRT, rotated VRT.
  • Existing TIFF bbox tests (test_bbox_2555.py) unchanged and passing.
  • Full geotiff read + vrt suites pass (1660 passed, 4 skipped).

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
The dispatcher resolved bbox= to a pixel window via _bbox_to_window,
which reads a TIFF header and cannot parse VRT XML, so a .vrt source
raised 'Invalid TIFF byte order marker' before reaching the VRT branch.

Detect VRT sources before bbox resolution and route them through a new
_vrt_bbox_to_window that parses the VRT XML and reuses the synthesised
GeoInfo the VRT read path already builds. Both resolvers share the bbox
validation and extent math in _geo_info_to_window, so the georef,
rotated-affine, and out-of-extent rejections stay identical across TIFF
and VRT. Update the stale dispatcher comment that claimed bbox already
covered VRT.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Make open_geotiff bbox= work for .vrt sources

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/__init__.py:467-471: the no-georef error says "this file has no GeoTIFF tags," which is shared with the TIFF path but reads oddly for a .vrt (a VRT has no GeoTransform, not GeoTIFF tags). Both paths now share _geo_info_to_window, so it would be worth softening the wording to fit both source types. Low priority; the message still points at the right fix (window=).

Nits (optional improvements)

  • xrspatial/geotiff/__init__.py:419-421: the VRT XML is parsed here for bbox resolution and parsed again by read_vrt downstream. The parse is header-only and cheap, and the TIFF path reads its header twice the same way, so this is consistent rather than a regression. Noting it only for completeness.

What looks good

  • Real bbox support instead of a rejection. It reuses the VRT backend's own _vrt_to_synthetic_geo_info, so rotated and no-georef handling match the read path exactly.
  • The shared _geo_info_to_window keeps TIFF and VRT validation identical, so there's no message or behavior drift between the two.
  • Tests cover windowed-read parity (against an explicit window and the full slice), full-extent clamping, the chunked backend, and the validation matrix.
  • Reuses _read_vrt_xml, so the XML size cap and path-containment guards still apply.

Checklist

  • Algorithm matches reference (bbox-to-window affine reused from the TIFF path)
  • All implemented backends produce consistent results (window forwards to eager/dask/gpu)
  • NaN handling is correct (NaN bbox coords rejected upfront)
  • Edge cases are covered by tests
  • Dask chunk boundaries handled correctly (bbox resolves to a window before dispatch)
  • No premature materialization (header-only XML parse, no tile decode)
  • Benchmark not needed (bug fix, no perf-critical path)
  • README feature matrix update not needed (no new function)
  • Docstrings present and accurate

The shared _geo_info_to_window now serves both TIFF and VRT, so the
no-georef rejection message no longer assumes GeoTIFF tags. Update the
two bbox tests that matched the old wording.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up pass)

Re-reviewed after the follow-up commit.

Disposition of the first-pass findings

  • Suggestion (no-georef error wording read oddly for .vrt): fixed in 1e53f23. The message now says "this source has no georeferencing (no GeoTIFF tags or VRT GeoTransform)," which fits both the TIFF and VRT paths that share _geo_info_to_window. The two bbox tests that matched the old wording were updated to match "no georeferencing".
  • Nit (VRT XML parsed once for bbox resolution and again by read_vrt): not changed. The parse is header-only and cheap, and the TIFF path reads its header twice the same way, so this is consistent with the existing pattern rather than a regression.

New findings

None. The reword is a string change plus two test-regex updates; behavior is unchanged and the bbox suite (26 tests) passes.

@brendancol brendancol merged commit 83810c2 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

open_geotiff bbox= does not work for .vrt sources

1 participant