Skip to content

read_geotiff_dask drops nodata mask when sentinel only appears in non-first chunks #1597

@brendancol

Description

@brendancol

Summary

read_geotiff_dask declares the output dtype as float64 for integer rasters paired with an in-range nodata sentinel, but per-chunk dtype handling in _delayed_read_window only promotes a chunk to float64 when that chunk actually contains a sentinel pixel. When the sentinel falls in a non-first chunk, dask preallocates the concatenated output from the first chunk's dtype (uint16), then casts subsequent float64 chunks back to uint16, replacing NaN with 0 and emitting RuntimeWarning: invalid value encountered in cast.

Net effect: the declared float64 array silently becomes uint16 at compute time, and nodata pixels become 0 instead of NaN. The numpy eager path is correct.

Repro

import numpy as np, tempfile, os
from xrspatial.geotiff import open_geotiff
from xrspatial.geotiff._writer import write

with tempfile.TemporaryDirectory() as d:
    path = os.path.join(d, 't.tif')
    arr = np.arange(64, dtype=np.uint16).reshape(8, 8) + 1
    arr[6:8, 6:8] = 65535  # sentinel only in bottom-right
    write(arr, path, nodata=65535, compression='none', tiled=False)

    eager = open_geotiff(path)
    dk = open_geotiff(path, chunks=4)
    r = dk.compute()
    print(eager.dtype, np.isnan(eager.values).sum())  # float64 4
    print(dk.dtype, r.dtype, (r.values[6:8, 6:8] == 0).all())  # float64 uint16 True

Root cause

In xrspatial/geotiff/__init__.py:

  • read_geotiff_dask computes effective_dtype = float64 for masked-int rasters (line 1510 onward) and declares each dask block with dtype=target_dtype (line 1654).
  • _delayed_read_window (line 1714) only calls arr.astype(np.float64) inside if mask.any():; if no sentinel pixel is in the chunk, arr stays at the file's integer dtype.
  • The per-chunk arr.astype(target_dtype) cast only runs when the user passed an explicit dtype= kwarg (the caller threads target_dtype=target_dtype if dtype is not None else None, line 1650), so the float promotion is not enforced on chunks the mask missed.

Proposed fix

Always cast the chunk to the resolved effective_dtype (float64 for masked-int paths) before returning, regardless of whether that chunk's mask hit. Thread the effective dtype unconditionally through _delayed_read_window. The optional out-of-range guard already preserved at line 1511 keeps integer-dtype output when the sentinel can never match.

Scope

Categories: 4 (dtype/nodata semantics), 5 (backend-inconsistent metadata: eager numpy and GPU promote correctly; dask path does not).
Severity: HIGH -- silent NaN -> 0 conversion in real masked rasters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions