Skip to content

GeoTIFF: add shared read-finalization helpers (PR B of #2162) #2177

@brendancol

Description

@brendancol

Parent: #2162
Sibling in wave 1: #2175

Goal

Add two private helpers in xrspatial/geotiff/_attrs.py that capture the read-finalization pipelines duplicated across backends. This issue only adds the helpers and unit-tests them in isolation. No backend gets migrated here. That happens in waves 2 (#2178 dask, #2179 eager) and 3 (#2180 VRT).

Adding helpers first lets all downstream migration PRs depend on a frozen signature instead of inventing slightly different shapes.

Scope

_finalize_eager_read

```python
def _finalize_eager_read(
arr,
*,
geo_info,
nodata,
mask_sentinel,
mask_nodata,
dtype,
window,
name,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate, populate attrs, mask, cast, and build an eager DataArray."""
```

Pipeline:

  1. `_validate_read_geo_info(geo_info, window=window, allow_rotated=..., allow_unparseable_crs=...)`
  2. `_populate_attrs_from_geo_info(attrs, geo_info, window=window)`
  3. Mask `arr` using `mask_sentinel` when `mask_nodata` is True. Record `nodata_pixels_present` as a bool.
  4. Cast dtype; record `nodata_dtype_cast`.
  5. `_set_nodata_attrs(attrs, nodata, masked=..., pixels_present=..., dtype_cast=...)`
  6. Return `xarray.DataArray(...)` with coords from the helper.

Why `mask_sentinel` is a parameter: the three GPU eager sites derive it three different ways (`_mw_mask_nodata` local, `_cpu_fallback_geo._mask_nodata`, raw `nodata`). The helper must not read it off `geo_info`.

`_finalize_lazy_read_attrs`

```python
def _finalize_lazy_read_attrs(
*,
geo_info,
nodata,
mask_nodata,
dtype,
window,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate and populate attrs for dask-style lazy reads.
Returns the attrs dict. Does NOT build a DataArray (the caller
assembles the dask graph)."""
```

Pipeline:

  1. `_validate_read_geo_info(...)`
  2. `_populate_attrs_from_geo_info(attrs, geo_info, window=window)`
  3. `_set_nodata_attrs(attrs, nodata, masked=(mask_nodata and float dtype), pixels_present=None, dtype_cast=...)`
  4. Return `attrs`.

`pixels_present=None` is the documented dask contract from #2135. The signature accepts None explicitly. Computing pixels-present eagerly here would force `.compute()` and break the lazy contract.

Tests

Add `xrspatial/geotiff/tests/test_finalization_helpers_2162.py`. Synthesize `GeoInfo` fixtures and assert:

  • Eager helper populates `attrs['nodata']`, `attrs['nodata_pixels_present']`, `attrs['nodata_dtype_cast']`, and `attrs['georef_status']` correctly across float and int input dtypes.
  • Eager helper respects `mask_nodata=False` (no masking, no `nodata_pixels_present`).
  • Lazy helper populates the same attrs minus `nodata_pixels_present`.
  • Both helpers propagate `_validate_read_geo_info` rejections for ambiguous geo_info.
  • Both helpers handle `mask_sentinel != nodata` (the GPU minIsWhite inversion case).

Files

  • `xrspatial/geotiff/_attrs.py` (add two helpers)
  • `xrspatial/geotiff/tests/test_finalization_helpers_2162.py` (new)

Constraints

  • No public API change.
  • Helpers are private (underscore prefix).
  • The eager helper returns a `DataArray`; the lazy helper returns an attrs dict. Wave 2 and wave 3 callers depend on these shapes.
  • `_validate_read_geo_info` runs first. Partial attrs must not leak when validation fails.

Out of scope

  • Migrating any backend to call these helpers (waves 2 and 3).
  • Changes to the existing shared helpers (`_validate_read_geo_info`, `_populate_attrs_from_geo_info`, `_set_nodata_attrs`).

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions