geotiff: extract read_geotiff_dask to _backends/dask.py (#1886)#1897
Merged
Conversation
Step 8 of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Moved into a new xrspatial/geotiff/_backends/dask.py: - read_geotiff_dask: dask read entry point (~350 lines), with the HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion detection, window clipping, max_pixels guard, and 50k-task graph cap intact. - _delayed_read_window: the per-chunk reader closure (~100 lines) that fans out via dask.delayed. Function bodies unchanged except for adjusted relative imports (``._X`` -> ``.._X``). Two helpers still living in __init__.py (``read_vrt`` and ``_read_geo_info``) are lazy-imported inside the function bodies to avoid a circular import; later PRs in #1813 will move them out and the lazy imports can become static. Promotes the matching lazy import on the other side: _backends/gpu.py's ``from .. import read_geotiff_dask`` (added for the same circular- import reason in #1895) becomes a static top-of-module ``from .dask import read_geotiff_dask`` now that the target is at a sibling path. Test changes ============ - test_dask_no_op_astype_1624.py monkeypatches ``_read_to_array``; the target moves from ``xrspatial.geotiff`` to ``xrspatial.geotiff._backends.dask`` so the spy still intercepts the per-chunk worker. Same pattern as the GPU spy update in #1895. - Restored the ``_apply_nodata_mask_gpu`` re-export from __init__.py (with a ``# noqa: F401`` marker). PR #1895's review-fix commit dropped this re-import after concluding it had no internal callers; it overlooked five test imports across ``test_nodata_nan_int_1774`` and ``test_nodata_out_of_range_1581`` that exercise the helper directly. Restoring the re-export fixes 6 tests that were failing on main since #1895 merged. Diff stats ========== - __init__.py: 2962 -> 2504 (-458 lines), plus 1 re-import line. - New _backends/dask.py: 481 lines. Verification: pixel-parity matrix from #1889 (192 cells), runtime identity (9), dask int nodata chunks (#1597), attrs parity (#1548), dask no-op astype (#1624), gpu chunks out-of-core (#1876), full geotiff suite minus the 8 pre-existing main failures -- 2862/2862 pass. Refs #1813.
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the GeoTIFF dask backend by moving read_geotiff_dask (and its delayed window reader) out of the very large xrspatial/geotiff/__init__.py into a dedicated backend module, while preserving the existing public API via re-exports.
Changes:
- Extracts
read_geotiff_daskand_delayed_read_windowintoxrspatial/geotiff/_backends/dask.pyand re-exportsread_geotiff_daskfromxrspatial.geotiff. - Updates the GPU backend to use a static import of
read_geotiff_daskfrom the sibling backend module. - Adjusts the dask astype regression test’s monkeypatch target to patch the correct module-global alias after the move; restores
_apply_nodata_mask_gpure-export.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_backends/dask.py |
New module containing the extracted dask read entry point and per-window delayed reader. |
xrspatial/geotiff/_backends/gpu.py |
Switches GPU chunked fallback to statically import read_geotiff_dask from the new dask backend module. |
xrspatial/geotiff/__init__.py |
Re-exports read_geotiff_dask from the new backend module and restores _apply_nodata_mask_gpu re-export. |
xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py |
Updates monkeypatch target to patch _read_to_array in the new defining module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _gpu_apply_window_band, | ||
| _gpu_decode_single_band_tiles, | ||
| ) | ||
| from .dask import read_geotiff_dask |
Comment on lines
+71
to
+74
| from ._backends._gpu_helpers import ( # noqa: F401 -- _apply_nodata_mask_gpu re-export | ||
| _apply_nodata_mask_gpu, | ||
| _is_gpu_data, | ||
| ) |
Two cleanups Copilot flagged: 1. ``_backends/gpu.py`` module docstring still described the ``read_geotiff_dask`` import as lazy. The import was promoted to a static top-of-module ``from .dask import read_geotiff_dask`` in this PR; updated the docstring to reflect the current strategy and note why the original lazy form existed. 2. ``__init__.py`` had the ``# noqa: F401`` applied to a multi-name import, suppressing lint signal for ``_is_gpu_data`` as well as the intentional ``_apply_nodata_mask_gpu`` re-export. Split into two imports so the ``noqa`` marks only the re-export line. ``_is_gpu_data`` keeps full lint coverage; a future genuinely-unused import on it would surface normally.
4 tasks
4 tasks
brendancol
added a commit
that referenced
this pull request
May 15, 2026
…#1888) (#1899) * geotiff: extract writers to _writers/, reduce __init__.py to dispatch (#1888) Final step (10) of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Created the _writers/ subpackage with three modules: - _writers/eager.py: ``to_geotiff`` (public eager writer), ``_write_single_tile`` (per-tile worker), and ``_write_vrt_tiled`` (deprecated vrt_tiled=True path). ~870 lines. - _writers/gpu.py: ``write_geotiff_gpu`` (nvCOMP-backed GPU writer with CPU fallback). ~500 lines. - _writers/vrt.py: ``write_vrt`` (mosaic XML generator with crs_wkt deprecation handling). ~90 lines. __init__.py re-exports the three public writer names. The two private helpers tests reach for (``_write_single_tile``, ``_apply_nodata_mask_gpu``) keep their package-level re-exports with ``# noqa: F401`` markers since the test files import them directly. Final shape of __init__.py: 529 lines (down from 1905 pre-PR and 4902 at the start of #1813 series). It now holds only: - Module docstring + ``__all__`` (lines 1-112). - ``_read_geo_info``: metadata-only header parse used by ``open_geotiff`` and lazy-imported by ``_backends/dask.py``. - ``open_geotiff``: the public auto-dispatching read entry point that routes between the four read backends and ``read_vrt``. - ``plot_geotiff``: deprecated wrapper that emits a ``DeprecationWarning`` pointing to ``da.xrs.plot()``. Test updates ============ - test_to_geotiff_gpu_fallback_1674.py monkeypatches ``write_geotiff_gpu`` and ``_is_gpu_data`` as resolved by ``to_geotiff``. The patch targets move from ``xrspatial.geotiff`` to ``xrspatial.geotiff._writers.eager`` so the spy still intercepts the real call sites. Same module-global-rebind pattern as #1895 and #1897. Net change since the start of #1813 ==================================== - __init__.py: 4902 -> 529 lines (-4373; -89%). - New per-responsibility modules under xrspatial/geotiff/: _runtime.py (#1880), _crs.py (#1881), _validation.py (#1882), _attrs.py (#1883), _backends/_gpu_helpers.py (#1884), _backends/gpu.py (#1885), _backends/dask.py (#1886), _backends/vrt.py (#1887), _writers/{eager,gpu,vrt}.py (this PR). - One new regression-backstop test: test_backend_pixel_parity_matrix_1813.py (#1879), 192 parametrised cells covering pixel/coord/attrs parity across the four backends and four read entry points. Verification: 2862/2862 of the geotiff test suite pass (the 8 remaining pre-existing main failures from #1517 and #1776 are unrelated to #1813 and were excluded as deselected throughout). Closes #1888. Closes #1813. * geotiff: address Copilot review on _writers extraction (#1899) Drop the unused module-level and in-function imports Copilot flagged: eager.py - Drop ``import math`` (no math.* calls in the file). - Drop ``RASTER_PIXEL_IS_POINT`` re-import (only RASTER_PIXEL_IS_AREA is referenced; the POINT path lives in _writers/_vrt_tiled below and uses its own internal constants). - Drop ``_validate_dtype_cast`` re-import (never called; the writer side does not validate dtype casts post-decode). - Drop ``from .vrt import write_vrt`` re-import; ``_write_vrt_tiled`` does a lazy ``from .._vrt import write_vrt as _write_vrt_fn`` at its single call site so the module-top import was dead. - Drop the redundant ``import os`` inside ``_write_vrt_tiled``; the module-level ``import os`` already covers the os.path / os.makedirs calls in that body. gpu.py - Drop ``import math`` and ``import os`` (no math.* / os.* references in the file body). - Drop ``GeoTransform`` re-import at module level (only used in eager.py's _write_single_tile, which still imports it). - Drop ``_validate_dtype_cast`` re-import (never called). - Drop ``from .._writer import write`` re-import; only docstring references mention it. - Drop the in-function ``GeoTransform as _GT`` alias inside the ``_writer`` import block (the alias was carried over verbatim from __init__.py but the function body never references _GT). - Drop the in-function ``from .._dtypes import numpy_to_tiff_dtype`` (also unreferenced in the body). All 2862 geotiff tests still pass; pyflakes is clean on these files except for the ``cupy.ndarray`` type annotation in ``write_geotiff_gpu``'s signature, which works fine under ``from __future__ import annotations`` and matched the same surface the original __init__.py had through earlier #1813 PRs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 8 of #1813's multi-PR refactor of
xrspatial/geotiff/__init__.py. Pure code motion; no public API change.Moved into a new
xrspatial/geotiff/_backends/dask.py:read_geotiff_dask— dask read entry point (~350 lines), with the HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion detection, window clipping,max_pixelsguard, and 50k-task graph cap intact._delayed_read_window— per-chunk reader closure (~100 lines) that fans out viadask.delayed.Function bodies are unchanged except for adjusted relative imports (
._X→.._X). Two helpers still living in__init__.py(read_vrt,_read_geo_info) are lazy-imported inside the function bodies to avoid a circular import; later PRs in #1813 (_backends/vrt.py, then writer extraction) will move them out and the lazy imports can become static.Knock-on changes
_backends/gpu.pystatic import. PR geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885) #1895 addedfrom .. import read_geotiff_daskinside_read_geotiff_gpu_chunkedbecauseread_geotiff_daskwas still in__init__.pyand a static top-of-module import would have circular-loaded. With the target now at a sibling path, the lazy import becomes a staticfrom .dask import read_geotiff_dask.test_dask_no_op_astype_1624.pymonkeypatch target. The spy that tracks_read_to_arraycalls in the per-chunk worker now patchesxrspatial.geotiff._backends.dask._read_to_arrayinstead ofxrspatial.geotiff._read_to_array. Same pattern as the GPU spy update in geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885) #1895._apply_nodata_mask_gpure-export restored. PR geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885) #1895's review-fix commit dropped this re-import from__init__.pyafter concluding it had no internal callers; it overlooked five test imports acrosstest_nodata_nan_int_1774andtest_nodata_out_of_range_1581that exercise the helper directly. Restoring the re-export (with a# noqa: F401marker) fixes 6 tests that have been failing on main since geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885) #1895 merged.Diff stats
__init__.py: 2962 → 2504 (-458 lines)._backends/dask.py: 481 lines.Test plan
read_to_arraymonkeypatch ontest_predictor2_big_endian_gpu_1517+tile_size_positive_works) — 2862/2862 green._apply_nodata_mask_gpuimport regressions) now pass again.Closes #1886. Refs #1813.