Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,13 +163,14 @@ VRT is supported as a conservative advanced feature for simple GeoTIFF mosaics,
|:-----|:------------|:-----:|:----:|:--------:|:-------------:|:-----:|
| [open_geotiff](xrspatial/geotiff/__init__.py) | Read GeoTIFF / COG / VRT | ✅ | ✅ | 🧪 | 🧪 | 🔼 |
| [to_geotiff](xrspatial/geotiff/__init__.py) | Write DataArray as GeoTIFF / COG | ✅ | ✅ | 🧪 | 🧪 | 🔼 |
| [write_geotiff_gpu](xrspatial/geotiff/__init__.py) | GPU-accelerated GeoTIFF / COG write | | | 🧪 | | |
| [write_vrt](xrspatial/geotiff/__init__.py) | Generate VRT mosaic from GeoTIFFs | 🔼 | | | | |
| [build_vrt](xrspatial/geotiff/__init__.py) | Generate VRT mosaic from existing GeoTIFFs | 🔼 | | | | |

`open_geotiff` and `to_geotiff` auto-dispatch to the correct backend:
`open_geotiff` and `to_geotiff` select the backend from their parameters
(`gpu=`, `chunks=`, `.vrt` path); GPU read/write is reached with `gpu=True`,
not a separate function:

```python
from xrspatial.geotiff import open_geotiff, to_geotiff
from xrspatial.geotiff import build_vrt, open_geotiff, to_geotiff

open_geotiff('dem.tif') # NumPy
open_geotiff('dem.tif', chunks=512) # Dask
Expand All @@ -186,7 +187,7 @@ to_geotiff(data, 'cog.tif', cog=True) # COG with auto overviews
to_geotiff(data, 'cog.tif', cog=True, # COG with explicit levels
overview_levels=[2, 4, 8],
overview_resampling='nearest')
write_vrt('mosaic.vrt', ['tile1.tif', 'tile2.tif']) # generate VRT
build_vrt('mosaic.vrt', ['tile1.tif', 'tile2.tif']) # mosaic existing tiles

open_geotiff('dem.tif', dtype='float32') # half memory
open_geotiff('dem.tif', dtype='float32', chunks=512) # Dask + half memory
Expand Down
33 changes: 20 additions & 13 deletions docs/source/reference/geotiff.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ What you should NOT rely on:

* GPU support for every codec on the CPU path. ``allow_experimental_codecs``
does NOT widen the GPU codec set; on the GPU writer, codecs outside the
GPU-supported set route through a CPU fallback inside
``write_geotiff_gpu`` rather than executing on the GPU. Locked by
GPU-supported set route through a CPU fallback inside the GPU writer
(``_write_geotiff_gpu``) rather than executing on the GPU. Locked by
``xrspatial/geotiff/tests/gpu/test_writer.py``.
* GPU promotion to ``stable`` inside this release cycle. See the GPU
rows in :ref:`reference.geotiff_release_gate` for the current tier
Expand Down Expand Up @@ -206,20 +206,26 @@ The lifecycle is locked end-to-end by

Reading
=======
``open_geotiff`` is the single read entry point. The backend follows the
parameters: ``gpu=True`` returns a CuPy-backed array, ``chunks=N`` returns a
lazy dask array, and a ``.vrt`` source reads a mosaic.

.. autosummary::
:toctree: _autosummary

xrspatial.geotiff.open_geotiff
xrspatial.geotiff.read_vrt

Writing
=======
``to_geotiff`` is the single write entry point (``gpu=True`` or CuPy data
selects the GPU path; a ``.vrt`` output path writes tiles plus an index).
``build_vrt`` mosaics a list of existing GeoTIFF files into a VRT.

.. autosummary::
:toctree: _autosummary

xrspatial.geotiff.to_geotiff
xrspatial.geotiff.write_geotiff_gpu
xrspatial.geotiff.write_vrt
xrspatial.geotiff.build_vrt

COG validator CI gate
=====================
Expand Down Expand Up @@ -346,7 +352,7 @@ with the original exception type and message.

Set ``XRSPATIAL_GEOTIFF_STRICT=1`` (or ``true``, ``yes``) to promote those
warnings into raised exceptions. The same env var also forces
``read_geotiff_gpu(on_gpu_failure='auto')`` to behave like
``open_geotiff(gpu=True, on_gpu_failure='auto')`` to behave like
``on_gpu_failure='strict'`` so CI can fail loudly when the GPU fast path
silently falls back to CPU.

Expand Down Expand Up @@ -393,9 +399,9 @@ VRT support matrix (issue #2321)

VRT reads sit at the ``advanced`` tier in
:data:`xrspatial.geotiff.SUPPORTED_FEATURES` (``reader.vrt``).
``open_geotiff``, ``read_vrt``, and ``write_vrt`` all target the same
narrow subset of GDAL's VRT spec. The reference below is the canonical
contract; the three docstrings echo it.
``open_geotiff`` (on a ``.vrt`` source), ``to_geotiff`` (to a ``.vrt``
output), and ``build_vrt`` all target the same narrow subset of GDAL's VRT
spec. The reference below is the canonical contract; the docstrings echo it.

Supported
---------
Expand All @@ -404,7 +410,7 @@ Supported
GeoTIFF files. The VRT XML must resolve to source paths under the
VRT's own directory (or under a root listed in
``XRSPATIAL_VRT_ALLOWED_ROOTS``); see the source-path containment
note on ``read_vrt`` (#1671).
note on the VRT reader (``_read_vrt``) (#1671).
* Sources that agree on CRS, transform orientation (axis-aligned,
same sign on the y step), pixel size, dtype, and band count. The
read rejects mismatch with ``MixedBandMetadataError`` /
Expand Down Expand Up @@ -450,11 +456,11 @@ the fail-closed defaults:

.. code-block:: python

from xrspatial.geotiff import open_geotiff, write_vrt
from xrspatial.geotiff import build_vrt, open_geotiff

# Write a VRT that mosaics two tiles. Both tiles share CRS,
# pixel size, dtype, and band count.
vrt_path = write_vrt(
vrt_path = build_vrt(
'mosaic.vrt',
source_files=['tile_west.tif', 'tile_east.tif'],
)
Expand Down Expand Up @@ -486,7 +492,8 @@ per-band nodata sentinels triggers the fail-closed check:
VRT missing sources
===================

``read_vrt`` accepts ``missing_sources='warn'`` or ``'raise'``. The default
``open_geotiff`` accepts ``missing_sources='warn'`` or ``'raise'`` for
``.vrt`` sources. The default
``'raise'`` (since #1860) fails the read immediately if any source file
referenced by the VRT does not exist on disk. Both the eager and chunked
dispatchers honour this at construction time -- chunked callers do not
Expand Down
18 changes: 9 additions & 9 deletions docs/source/reference/geotiff_internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@ public API. Files referenced live under `xrspatial/geotiff/`.
| Entry point | File | Returns |
| -------------------- | --------------------------------- | ---------------------- |
| `open_geotiff` | `xrspatial/geotiff/__init__.py` | dispatcher (NumPy / CuPy / Dask / Dask+CuPy / VRT) |
| `read_geotiff_dask` | `xrspatial/geotiff/_backends/dask.py` | Dask-NumPy DataArray |
| `read_geotiff_gpu` | `xrspatial/geotiff/_backends/gpu.py` | CuPy or Dask-CuPy DataArray |
| `read_vrt` | `xrspatial/geotiff/_backends/vrt.py` | NumPy / CuPy / Dask DataArray (mosaic) |
| `_read_geotiff_dask` | `xrspatial/geotiff/_backends/dask.py` | Dask-NumPy DataArray |
| `_read_geotiff_gpu` | `xrspatial/geotiff/_backends/gpu.py` | CuPy or Dask-CuPy DataArray |
| `_read_vrt` | `xrspatial/geotiff/_backends/vrt.py` | NumPy / CuPy / Dask DataArray (mosaic) |

### Write

| Entry point | File | Input |
| -------------------- | --------------------------------- | ---------------------- |
| `to_geotiff` | `xrspatial/geotiff/_writers/eager.py` | NumPy / Dask DataArray (auto-dispatches to GPU when input is CuPy-backed) |
| `write_geotiff_gpu` | `xrspatial/geotiff/_writers/gpu.py` | CuPy DataArray |
| `write_vrt` | `xrspatial/geotiff/_writers/vrt.py` | list of GeoTIFF paths (XML emitter) |
| `_write_geotiff_gpu` | `xrspatial/geotiff/_writers/gpu.py` | CuPy DataArray |
| `build_vrt` | `xrspatial/geotiff/_writers/vrt.py` | list of GeoTIFF paths (XML emitter) |

## Contract steps

Expand Down Expand Up @@ -122,12 +122,12 @@ now and that the call-site comments justify.

### Read backends

| Step | `open_geotiff` (eager) | `read_geotiff_dask` | `read_geotiff_gpu` (eager) | `read_geotiff_gpu` (chunked) | `read_vrt` (eager) | `read_vrt` (chunked) |
| Step | `open_geotiff` (eager) | `_read_geotiff_dask` | `_read_geotiff_gpu` (eager) | `_read_geotiff_gpu` (chunked) | `_read_vrt` (eager) | `_read_vrt` (chunked) |
| ---- | ---------------------- | ------------------- | -------------------------- | ---------------------------- | ------------------ | -------------------- |
| 1. source / kwarg validation | shared (`_validate_dispatch_kwargs` then dispatches) | shared (`_validate_dispatch_kwargs`, `_validate_chunks_arg`) | shared (`_validate_dispatch_kwargs`, `_validate_chunks_arg`) | shared (`_validate_dispatch_kwargs`, `_validate_chunks_arg`) | shared (`_validate_dispatch_kwargs`, `_validate_chunks_arg`); duplicated inline overview-level / `missing_sources` / `band_nodata` value rejections | shared (`_validate_dispatch_kwargs`); duplicated inline overview-level / `missing_sources` / `band_nodata` value rejections |
| 2. metadata parse | shared (`read_to_array` -> `_parse_cog_http_meta` for cloud (with `.tif.ovr` sidecar discovery via `discover_remote_sidecar`), `parse_header` + `parse_all_ifds` + sidecar otherwise) | shared (`_read_geo_info` for local, `_parse_cog_http_meta` for HTTP/fsspec, both with `.tif.ovr` sidecar discovery via `discover_remote_sidecar` -- #2239) | shared (`extract_geo_info_with_overview_inheritance`, `select_overview_ifd`); duplicated inline IFD + sidecar load lifted from `_read_geo_info` | shared (`extract_geo_info_with_overview_inheritance`); duplicated inline IFD + sidecar handling | duplicated (`_parse_vrt` + `_read_vrt_internal` -- VRT-specific, no shared metadata parser) | duplicated (`_parse_vrt` + per-chunk `_vrt_chunk_read`) |
| 3. transform / georef classification | shared (`_populate_attrs_from_geo_info` via `_finalize_eager_read`) | shared (`_populate_attrs_from_geo_info` via `_finalize_lazy_read_attrs`) | shared (`_populate_attrs_from_geo_info` via `_finalize_eager_read`) | shared (`_populate_attrs_from_geo_info` via `_finalize_lazy_read_attrs`) | shared (`_vrt_to_synthetic_geo_info` -> `_finalize_lazy_read_attrs`); documented divergence: per-band nodata sentinel selection runs before the helper, and `vrt_holes` is injected through `attrs_in` because `GeoInfo` has no slot for it | shared (`_vrt_to_synthetic_geo_info` -> `_finalize_lazy_read_attrs`); same documented divergence |
| 4. pixel decode | shared (`read_to_array`) | shared (per-chunk `read_to_array` / `_fetch_decode_cog_http_tiles`) | duplicated (inline GDS / KvikIO / nvCOMP path with CPU fallback via `read_to_array`) | duplicated (inline GDS + per-chunk delayed; HTTP / fsspec / stripped layouts fall back to `read_geotiff_dask`) | duplicated (`_read_vrt_internal._read_data` per source) | duplicated (per-chunk `_vrt_chunk_read` decodes only sources intersecting the window) |
| 4. pixel decode | shared (`read_to_array`) | shared (per-chunk `read_to_array` / `_fetch_decode_cog_http_tiles`) | duplicated (inline GDS / KvikIO / nvCOMP path with CPU fallback via `read_to_array`) | duplicated (inline GDS + per-chunk delayed; HTTP / fsspec / stripped layouts fall back to `_read_geotiff_dask`) | duplicated (`_read_vrt_internal._read_data` per source) | duplicated (per-chunk `_vrt_chunk_read` decodes only sources intersecting the window) |
| 5. orientation / photometric | shared (`read_to_array` applies both) | shared (per chunk via `read_to_array`); rejects non-default orientation on HTTP COG dask path | shared on CPU-fallback (`read_to_array`); duplicated on pure GPU path (`_apply_orientation_gpu`, `_apply_orientation_geo_info`, inline MinIsWhite inversion) | shared on CPU-fallback; duplicated on disk-to-GPU per-chunk path (`_decode_window_gpu_direct`); rejects orientation != 1 in `_gds_chunk_path_available` | duplicated (inline NaN masking in `_vrt._read_data` for float sources; VRT does not carry an orientation tag) | duplicated (per chunk same as eager VRT) |
| 6. nodata mask + dtype cast | shared (`_apply_eager_nodata_mask` + `_validate_dtype_cast` via `_finalize_eager_read`) | duplicated (per-chunk mask inline in `_delayed_read_window`); shared `_validate_dtype_cast` on graph dtype | shared (`_apply_eager_nodata_mask` via `_finalize_eager_read`) on both stripped and tiled paths | duplicated (per-chunk mask inline in `_chunk_task`); shared `_validate_dtype_cast` | duplicated (`_apply_integer_sentinel_mask_with_presence` for per-band integer sentinels, plus inline float-NaN proxy and pre-cast dtype tracking); shared `_validate_dtype_cast` | duplicated (per-chunk integer sentinel mask via `_apply_integer_sentinel_mask_with_presence`); shared `_validate_dtype_cast` |
| 7. attrs finalization | shared (`_finalize_eager_read` -> `_validate_read_geo_info` + `_populate_attrs_from_geo_info` + `_set_nodata_attrs`) | shared (`_finalize_lazy_read_attrs`); documented divergence: `nodata_pixels_present` stays unset on lazy outputs (issue #2135) | shared (`_finalize_eager_read`); GPU MinIsWhite picks `mask_sentinel` from three local stashes (`_mw_mask_nodata`, `_cpu_fallback_geo._mask_nodata`, or raw `nodata`) | shared (`_finalize_lazy_read_attrs`); same `nodata_pixels_present` divergence as the CPU dask path | shared (`_finalize_lazy_read_attrs`); documented divergences: `vrt_holes` injected via `attrs_in` seed; per-band nodata selection runs before the helper; `nodata_pixels_present` stamped post-helper from a VRT-aware scan (`_vrt_mask_with_presence` / `_vrt_scan_for_sentinel`) | shared (`_finalize_lazy_read_attrs`); same VRT divergences as the eager VRT path |
Expand All @@ -138,11 +138,11 @@ now and that the call-site comments justify.
The TIFF write contract is the inverse of the read contract: validate the
DataArray, resolve transform / CRS / nodata from the attrs, lay out the
output, encode, and emit bytes. Steps 4 and 5 (decode, orientation) have no
write analogue; `to_geotiff` and `write_geotiff_gpu` always emit
write analogue; `to_geotiff` and `_write_geotiff_gpu` always emit
Orientation = 1 and rely on the writer assembler (`_writer.write`) for
photometric handling.

| Step | `to_geotiff` (CPU eager / dask) | `write_geotiff_gpu` | `write_vrt` |
| Step | `to_geotiff` (CPU eager / dask) | `_write_geotiff_gpu` | `build_vrt` |
| ---- | ------------------------------- | ------------------- | ----------- |
| 1. source / kwarg validation | shared (`_validate_tile_size_arg`, `_validate_3d_writer_dims`, `_validate_writer_spatial_shape`, `_validate_nodata_arg`, `_validate_no_rotated_affine`); duplicated inline compression / `compression_level` / `cog` / `overview_levels` / `bigtiff` / `streaming_buffer_bytes` / `max_z_error` / `photometric` / `allow_internal_only_jpeg` / `allow_experimental_codecs` value rejections | shared (`_validate_tile_size_arg`, `_validate_3d_writer_dims`, `_validate_writer_spatial_shape`, `_validate_nodata_arg`, `_validate_no_rotated_affine`); duplicated inline GPU-specific kwarg rejections (`predictor`, `compression`, `cog`, etc.) | shared (`_validate_nodata_arg`); duplicated inline `path` / `vrt_path` shim, `crs` / `crs_wkt` shim, source path validation |
| 2. metadata parse | N/A (no source to parse; reads attrs off the DataArray) | N/A | duplicated (reads geokeys from the first source file to inherit CRS / nodata; lives in `_vrt.write_vrt`) |
Expand Down
6 changes: 3 additions & 3 deletions docs/source/reference/release_gate_geotiff.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Local GeoTIFF read and write
``T_full * Affine.translation(col_off, row_off)`` (no float
drift), and the canonical non-transform release attrs unchanged.
Covered for both ``open_geotiff(window=...)`` and
``read_geotiff_dask(window=...)``.
``open_geotiff(window=..., chunks=...)``.
- ``xrspatial/geotiff/tests/release_gates/test_stable_features.py``
(windowed-reads section)
- `#2341`_
Expand All @@ -194,7 +194,7 @@ Local GeoTIFF read and write
- `#2341`_
* - ``reader.dask`` -- eager / dask parity
- stable
- ``open_geotiff(path)`` and ``read_geotiff_dask(path)`` return the
- ``open_geotiff(path)`` and ``open_geotiff(path, chunks=...)`` return the
same pixels, ``dims``, ``coords``, and the seven release-attr
keys (``transform``, ``crs``, ``crs_wkt``, ``nodata``,
``masked_nodata``, ``georef_status``, ``raster_type``) across
Expand Down Expand Up @@ -557,7 +557,7 @@ VRT supported subset
- ``xrspatial/geotiff/tests/release_gates/test_stable_features.py``
(VRT presence meta-gate)
- `#2321`_
* - ``write_vrt``
* - ``build_vrt``
- advanced
- Writer rejects source-incompatibility cases at the writer boundary.
- ``xrspatial/geotiff/tests/vrt/test_validation.py``
Expand Down
23 changes: 10 additions & 13 deletions docs/source/user_guide/geotiff_safe_io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,9 @@ the read and write paths:
:class:`xarray.DataArray` for single-band input and a 3D one for
multi-band input. The binary file-like form is restricted to the
eager numpy reader; dask, GPU, VRT, and remote-URL paths require
a string.
* - :func:`xrspatial.geotiff.read_vrt`
- Dedicated entry point for reading a GDAL ``.vrt`` mosaic over a
set of GeoTIFF sources. Tier: ``advanced``. The VRT path honours
a documented subset of the GDAL VRT schema; unsupported features
raise ``VRTUnsupportedError`` or
a string. A ``.vrt`` source reads a GDAL mosaic (tier:
``advanced``) over a documented subset of the GDAL VRT schema;
unsupported features raise ``VRTUnsupportedError`` or
:class:`xrspatial.geotiff.UnsupportedGeoTIFFFeatureError` at
graph-build time rather than producing wrong pixels. Both error
classes live in :mod:`xrspatial.geotiff._errors`.
Expand All @@ -53,13 +50,13 @@ the read and write paths:
Cloud-optimized GeoTIFF layout. Pass ``allow_experimental_codecs=True``
to opt into ``lerc``, ``jpeg2000`` / ``j2k``, or ``lz4``; pass
``allow_internal_only_jpeg=True`` to opt into the
internal-only ``jpeg`` codec.
* - :func:`xrspatial.geotiff.write_geotiff_gpu`
- GPU writer. Tier: ``experimental``. Use the CPU writer for
anything you intend to round-trip through external tools.
* - :func:`xrspatial.geotiff.write_vrt`
- Emit a GDAL ``.vrt`` over local GeoTIFF sources. Tier:
``advanced``.
internal-only ``jpeg`` codec. Pass ``gpu=True`` (or pass
CuPy-backed data) for the GPU writer (tier: ``experimental``);
use the CPU path for anything you round-trip through external
tools.
* - :func:`xrspatial.geotiff.build_vrt`
- Emit a GDAL ``.vrt`` over a list of existing local GeoTIFF
sources. Tier: ``advanced``.

A dask-backed read is just ``open_geotiff(source, chunks=...)`` -- there
is no separate ``read_geotiff_dask`` name on the public surface. The
Expand Down
Loading
Loading