Skip to content

geotiff: eager vs lazy nodata masking diverge on dtype and masked_nodata for int sources with no sentinel pixels #2990

@brendancol

Description

@brendancol

Describe the bug

open_geotiff(..., masked=True) on an integer-dtype source that declares a nodata sentinel returns a different result from the eager path than from the lazy (dask) path when the read window has no pixels matching the sentinel.

Same file, same masked=True, only chunks= differs:

  • Eager: dtype=uint16, masked_nodata=False, nodata_pixels_present=False
  • Lazy: dtype=float64, masked_nodata=True, nodata_pixels_present absent

The lazy path declares float64 and stamps masked_nodata=True from the graph dtype before any chunk is decoded (xrspatial/geotiff/_backends/dask.py near lines 424 and 521). The eager path only promotes the integer array to float64 when at least one sentinel pixel actually matches (_apply_eager_nodata_mask in xrspatial/geotiff/_attrs.py near line 1483); with no match it keeps the integer dtype and reports masked_nodata=False.

This matters because masked_nodata is read as semantic state, not just dtype decoration. The writer's _should_restore_nan_sentinel keys the NaN-to-sentinel rewrite off it, and the GPU writer reads it too. The two backends disagree on whether masking happened for the same input.

Expected behavior

The eager and lazy paths should report the same masked_nodata and the same dtype for the same input. rioxarray's open_rasterio(..., masked=True) always promotes an integer source to float regardless of whether a sentinel pixel is present, so the lazy path's behavior is the correct reference. The eager path should match: when masked=True on a maskable integer source with a declared sentinel, promote to float64 unconditionally and report masked_nodata=True, even when no sentinel pixel matches.

nodata_pixels_present is a separate signal and is allowed to stay absent on the lazy path, since a strict per-chunk reduction would force an eager compute. This issue does not propose forcing eager compute on the lazy path. It only proposes making the masked_nodata flag and the output dtype consistent.

Reproduction

Write a uint16 GeoTIFF declaring nodata=9999 whose pixels are all in the range 1..50, then open it with masked=True both eager and with chunks= set, and compare dtype and masked_nodata.

Additional context

Found during a code review of the geotiff backends. The eager promotion gate on a matching sentinel pixel is the divergent behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdaskDask backend / chunked arraysgeotiffGeoTIFF module

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions