Skip to content

geotiff: add bbox= parameter to open_geotiff for geographic-space window reads #2555

@brendancol

Description

@brendancol

Reason or Problem

open_geotiff(window=...) only accepts a pixel window (row_start, col_start, row_stop, col_stop). Most geospatial users think in data-space coordinates: a bounding box in the file's CRS (x_min, y_min, x_max, y_max). Doing the conversion by hand means reading the file's transform, running the affine math, and clamping to file bounds. We already do exactly that internally via _extent_to_window in xrspatial/geotiff/_attrs.py:912, but it is not exposed on the public API.

A related gap: the PixelSafetyLimitError recovery hint that #2553 added mentions window= and chunks= but not a geographic bbox option. Once bbox= is on open_geotiff, the hint should mention it too so users with a known area of interest reach for the right kwarg.

Proposal

Add bbox=(x_min, y_min, x_max, y_max) to open_geotiff:

  • Geographic coordinates in the file's CRS.
  • Mutually exclusive with window=. Passing both raises ValueError.
  • Resolved internally via _read_geo_info(source, overview_level=...) + _extent_to_window(geo_info.transform, h, w, y_min, y_max, x_min, x_max). The result is forwarded to the existing backend dispatch as window=.
  • Requires the source to be georeferenced. A file without georef, or with a rotated affine that has not been cleared via allow_rotated, raises a clear ValueError naming the limitation.

Then update the PixelSafetyLimitError recovery hint in xrspatial/geotiff/_layout.py:_recovery_hint to add a fourth bullet:

  * Read a geographic sub-region with bbox=(x_min, y_min, x_max, y_max).

Design:

open_geotiff runs the bbox->window conversion before fanning out to backends. The dispatcher is the right place since it already coerces the source path once and dispatches once. Backends (read_geotiff_dask, read_geotiff_gpu, read_vrt, read_to_array) keep their existing window= surface; no new plumbing.

The conversion uses _read_geo_info, which already supports local files, BytesIO, HTTP, and fsspec URIs via a header-only read. For HTTP and cloud sources this is a small range request, not a full download, and the bytes are typically cached for the subsequent main read.

Usage:

arr = open_geotiff(
    "s3://bucket/large-dem.tif",
    bbox=(-122.5, 37.6, -122.3, 37.8),  # San Francisco area, WGS84
)

Value: Removes the most common workaround in user code (manual affine math). Mirrors how rasterio's windows.from_bounds is used.

Stakeholders and Impacts

Anyone reading sub-regions of large GeoTIFFs by geographic coordinates. The new parameter is opt-in; existing window= callers see no behaviour change. The error hint added in #2554 expands by one line.

Drawbacks

The bbox->window resolution requires a metadata read before the main read. _read_geo_info is O(1) memory and fast for local files; for HTTP it adds one range request that is usually cached for the subsequent read.

Rotated transforms are rejected unless allow_rotated=True has cleared the rotation, since _extent_to_window assumes an axis-aligned grid. Same restriction the existing read path applies.

Alternatives

Exposing _extent_to_window as public API would work but pushes the pixel math onto every caller. The whole point of adding bbox= is to remove that step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions