Skip to content

geotiff: extract read_geotiff_dask to _backends/dask.py (#1886)#1897

Merged
brendancol merged 2 commits into
mainfrom
issue-1886
May 15, 2026
Merged

geotiff: extract read_geotiff_dask to _backends/dask.py (#1886)#1897
brendancol merged 2 commits into
mainfrom
issue-1886

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Step 8 of #1813's multi-PR refactor of xrspatial/geotiff/__init__.py. Pure code motion; no public API change.

Moved into a new xrspatial/geotiff/_backends/dask.py:

  • read_geotiff_dask — dask read entry point (~350 lines), with the HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion detection, window clipping, max_pixels guard, and 50k-task graph cap intact.
  • _delayed_read_window — per-chunk reader closure (~100 lines) that fans out via dask.delayed.

Function bodies are unchanged except for adjusted relative imports (._X.._X). Two helpers still living in __init__.py (read_vrt, _read_geo_info) are lazy-imported inside the function bodies to avoid a circular import; later PRs in #1813 (_backends/vrt.py, then writer extraction) will move them out and the lazy imports can become static.

Knock-on changes

Diff stats

  • __init__.py: 2962 → 2504 (-458 lines).
  • New _backends/dask.py: 481 lines.

Test plan

Closes #1886. Refs #1813.

Step 8 of #1813's multi-PR refactor of __init__.py. Pure code motion;
no public API change.

Moved into a new xrspatial/geotiff/_backends/dask.py:

- read_geotiff_dask: dask read entry point (~350 lines), with the
  HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion
  detection, window clipping, max_pixels guard, and 50k-task graph
  cap intact.
- _delayed_read_window: the per-chunk reader closure (~100 lines)
  that fans out via dask.delayed.

Function bodies unchanged except for adjusted relative imports
(``._X`` -> ``.._X``). Two helpers still living in __init__.py
(``read_vrt`` and ``_read_geo_info``) are lazy-imported inside the
function bodies to avoid a circular import; later PRs in #1813 will
move them out and the lazy imports can become static.

Promotes the matching lazy import on the other side: _backends/gpu.py's
``from .. import read_geotiff_dask`` (added for the same circular-
import reason in #1895) becomes a static top-of-module
``from .dask import read_geotiff_dask`` now that the target is at a
sibling path.

Test changes
============

- test_dask_no_op_astype_1624.py monkeypatches ``_read_to_array``; the
  target moves from ``xrspatial.geotiff`` to
  ``xrspatial.geotiff._backends.dask`` so the spy still intercepts the
  per-chunk worker. Same pattern as the GPU spy update in #1895.
- Restored the ``_apply_nodata_mask_gpu`` re-export from __init__.py
  (with a ``# noqa: F401`` marker). PR #1895's review-fix commit
  dropped this re-import after concluding it had no internal callers;
  it overlooked five test imports across ``test_nodata_nan_int_1774``
  and ``test_nodata_out_of_range_1581`` that exercise the helper
  directly. Restoring the re-export fixes 6 tests that were failing
  on main since #1895 merged.

Diff stats
==========

- __init__.py: 2962 -> 2504 (-458 lines), plus 1 re-import line.
- New _backends/dask.py: 481 lines.

Verification: pixel-parity matrix from #1889 (192 cells), runtime
identity (9), dask int nodata chunks (#1597), attrs parity (#1548),
dask no-op astype (#1624), gpu chunks out-of-core (#1876), full
geotiff suite minus the 8 pre-existing main failures -- 2862/2862 pass.

Refs #1813.
Copilot AI review requested due to automatic review settings May 15, 2026 04:11
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the GeoTIFF dask backend by moving read_geotiff_dask (and its delayed window reader) out of the very large xrspatial/geotiff/__init__.py into a dedicated backend module, while preserving the existing public API via re-exports.

Changes:

  • Extracts read_geotiff_dask and _delayed_read_window into xrspatial/geotiff/_backends/dask.py and re-exports read_geotiff_dask from xrspatial.geotiff.
  • Updates the GPU backend to use a static import of read_geotiff_dask from the sibling backend module.
  • Adjusts the dask astype regression test’s monkeypatch target to patch the correct module-global alias after the move; restores _apply_nodata_mask_gpu re-export.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_backends/dask.py New module containing the extracted dask read entry point and per-window delayed reader.
xrspatial/geotiff/_backends/gpu.py Switches GPU chunked fallback to statically import read_geotiff_dask from the new dask backend module.
xrspatial/geotiff/__init__.py Re-exports read_geotiff_dask from the new backend module and restores _apply_nodata_mask_gpu re-export.
xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py Updates monkeypatch target to patch _read_to_array in the new defining module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

_gpu_apply_window_band,
_gpu_decode_single_band_tiles,
)
from .dask import read_geotiff_dask
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +71 to +74
from ._backends._gpu_helpers import ( # noqa: F401 -- _apply_nodata_mask_gpu re-export
_apply_nodata_mask_gpu,
_is_gpu_data,
)
Two cleanups Copilot flagged:

1. ``_backends/gpu.py`` module docstring still described the
   ``read_geotiff_dask`` import as lazy. The import was promoted to a
   static top-of-module ``from .dask import read_geotiff_dask`` in this
   PR; updated the docstring to reflect the current strategy and note
   why the original lazy form existed.

2. ``__init__.py`` had the ``# noqa: F401`` applied to a multi-name
   import, suppressing lint signal for ``_is_gpu_data`` as well as the
   intentional ``_apply_nodata_mask_gpu`` re-export. Split into two
   imports so the ``noqa`` marks only the re-export line. ``_is_gpu_data``
   keeps full lint coverage; a future genuinely-unused import on it
   would surface normally.
@brendancol brendancol merged commit a935735 into main May 15, 2026
10 checks passed
@brendancol brendancol deleted the issue-1886 branch May 15, 2026 04:40
brendancol added a commit that referenced this pull request May 15, 2026
…#1888) (#1899)

* geotiff: extract writers to _writers/, reduce __init__.py to dispatch (#1888)

Final step (10) of #1813's multi-PR refactor of __init__.py. Pure
code motion; no public API change.

Created the _writers/ subpackage with three modules:

- _writers/eager.py: ``to_geotiff`` (public eager writer),
  ``_write_single_tile`` (per-tile worker), and ``_write_vrt_tiled``
  (deprecated vrt_tiled=True path). ~870 lines.
- _writers/gpu.py: ``write_geotiff_gpu`` (nvCOMP-backed GPU writer
  with CPU fallback). ~500 lines.
- _writers/vrt.py: ``write_vrt`` (mosaic XML generator with crs_wkt
  deprecation handling). ~90 lines.

__init__.py re-exports the three public writer names. The two
private helpers tests reach for (``_write_single_tile``,
``_apply_nodata_mask_gpu``) keep their package-level re-exports with
``# noqa: F401`` markers since the test files import them directly.

Final shape of __init__.py: 529 lines (down from 1905 pre-PR and
4902 at the start of #1813 series). It now holds only:

- Module docstring + ``__all__`` (lines 1-112).
- ``_read_geo_info``: metadata-only header parse used by
  ``open_geotiff`` and lazy-imported by ``_backends/dask.py``.
- ``open_geotiff``: the public auto-dispatching read entry point that
  routes between the four read backends and ``read_vrt``.
- ``plot_geotiff``: deprecated wrapper that emits a
  ``DeprecationWarning`` pointing to ``da.xrs.plot()``.

Test updates
============

- test_to_geotiff_gpu_fallback_1674.py monkeypatches
  ``write_geotiff_gpu`` and ``_is_gpu_data`` as resolved by
  ``to_geotiff``. The patch targets move from ``xrspatial.geotiff``
  to ``xrspatial.geotiff._writers.eager`` so the spy still intercepts
  the real call sites. Same module-global-rebind pattern as #1895 and
  #1897.

Net change since the start of #1813
====================================

- __init__.py: 4902 -> 529 lines (-4373; -89%).
- New per-responsibility modules under xrspatial/geotiff/:
  _runtime.py (#1880), _crs.py (#1881), _validation.py (#1882),
  _attrs.py (#1883), _backends/_gpu_helpers.py (#1884),
  _backends/gpu.py (#1885), _backends/dask.py (#1886),
  _backends/vrt.py (#1887), _writers/{eager,gpu,vrt}.py (this PR).
- One new regression-backstop test:
  test_backend_pixel_parity_matrix_1813.py (#1879), 192 parametrised
  cells covering pixel/coord/attrs parity across the four backends
  and four read entry points.

Verification: 2862/2862 of the geotiff test suite pass (the 8
remaining pre-existing main failures from #1517 and #1776 are
unrelated to #1813 and were excluded as deselected throughout).

Closes #1888. Closes #1813.

* geotiff: address Copilot review on _writers extraction (#1899)

Drop the unused module-level and in-function imports Copilot flagged:

eager.py
- Drop ``import math`` (no math.* calls in the file).
- Drop ``RASTER_PIXEL_IS_POINT`` re-import (only RASTER_PIXEL_IS_AREA
  is referenced; the POINT path lives in _writers/_vrt_tiled below
  and uses its own internal constants).
- Drop ``_validate_dtype_cast`` re-import (never called; the writer
  side does not validate dtype casts post-decode).
- Drop ``from .vrt import write_vrt`` re-import; ``_write_vrt_tiled``
  does a lazy ``from .._vrt import write_vrt as _write_vrt_fn`` at
  its single call site so the module-top import was dead.
- Drop the redundant ``import os`` inside ``_write_vrt_tiled``; the
  module-level ``import os`` already covers the os.path / os.makedirs
  calls in that body.

gpu.py
- Drop ``import math`` and ``import os`` (no math.* / os.* references
  in the file body).
- Drop ``GeoTransform`` re-import at module level (only used in
  eager.py's _write_single_tile, which still imports it).
- Drop ``_validate_dtype_cast`` re-import (never called).
- Drop ``from .._writer import write`` re-import; only docstring
  references mention it.
- Drop the in-function ``GeoTransform as _GT`` alias inside the
  ``_writer`` import block (the alias was carried over verbatim from
  __init__.py but the function body never references _GT).
- Drop the in-function ``from .._dtypes import numpy_to_tiff_dtype``
  (also unreferenced in the body).

All 2862 geotiff tests still pass; pyflakes is clean on these files
except for the ``cupy.ndarray`` type annotation in
``write_geotiff_gpu``'s signature, which works fine under
``from __future__ import annotations`` and matched the same surface
the original __init__.py had through earlier #1813 PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: extract read_geotiff_dask into _backends/dask.py (#1813 step 8)

2 participants