Skip to content

read_vrt(band=N) reports band 0's nodata in attrs and skips masking for band N #1598

@brendancol

Description

@brendancol

Summary

read_vrt(path, band=N) for N > 0 unconditionally uses vrt.bands[0].nodata to populate attrs['nodata'] and to drive the integer-with-nodata float64/NaN promotion. When band N has a different nodata sentinel than band 0:

  • attrs['nodata'] advertises band 0's value (wrong).
  • The integer-to-float64 promotion mask is built from band 0's sentinel, so band N's actual sentinel pixels stay as literal integers instead of becoming NaN.
  • The returned array dtype stays integer when it should have been promoted to float64.

The non-VRT readers (open_geotiff, read_geotiff_dask, read_geotiff_gpu) all read the per-band nodata from the file's IFD for the selected band; only the VRT path has this bug.

Repro

import numpy as np, tempfile, os
from xrspatial.geotiff import read_vrt
from xrspatial.geotiff._writer import write

with tempfile.TemporaryDirectory() as d:
    a = np.array([[1, 2], [3, 65535]], dtype=np.uint16)
    b = np.array([[7, 8], [9, 65000]], dtype=np.uint16)
    pa, pb = os.path.join(d,'a.tif'), os.path.join(d,'b.tif')
    write(a, pa, nodata=65535, compression='none', tiled=False)
    write(b, pb, nodata=65000, compression='none', tiled=False)
    vrt = os.path.join(d, 'm.vrt')
    with open(vrt, 'w') as f:
        f.write(f'''<VRTDataset rasterXSize="2" rasterYSize="2">
  <GeoTransform>0,1,0,0,0,-1</GeoTransform>
  <VRTRasterBand dataType="UInt16" band="1">
    <NoDataValue>65535</NoDataValue>
    <SimpleSource><SourceFilename>{pa}</SourceFilename><SourceBand>1</SourceBand>
      <SrcRect xOff="0" yOff="0" xSize="2" ySize="2"/>
      <DstRect xOff="0" yOff="0" xSize="2" ySize="2"/></SimpleSource>
  </VRTRasterBand>
  <VRTRasterBand dataType="UInt16" band="2">
    <NoDataValue>65000</NoDataValue>
    <SimpleSource><SourceFilename>{pb}</SourceFilename><SourceBand>1</SourceBand>
      <SrcRect xOff="0" yOff="0" xSize="2" ySize="2"/>
      <DstRect xOff="0" yOff="0" xSize="2" ySize="2"/></SimpleSource>
  </VRTRasterBand>
</VRTDataset>''')

    r = read_vrt(vrt, band=1)
    print(r.dtype, r.attrs.get('nodata'), r.values.tolist())
    # Currently: uint16, 65535.0, [[7,8],[9,65000]]
    # Expected: float64, 65000.0, [[7,8],[9,NaN]]

Root cause

In xrspatial/geotiff/__init__.py::read_vrt, around line 2735:

nodata = None
if vrt.bands:
    nodata = vrt.bands[0].nodata

This always reads bands[0] rather than bands[band if band is not None else 0]. The downstream integer-promotion block (lines 2749 onward) then uses the wrong sentinel.

Proposed fix

When band is not None, source the nodata sentinel from vrt.bands[band].nodata. The internal _vrt.read_vrt already uses the per-band sentinel inside its source-read loop, so this only patches the public-layer attr emission and post-decode integer promotion.

Scope

Categories: 4 (dtype/nodata semantics).
Severity: MEDIUM -- requires multi-band VRT with per-band sentinels, but result is silently wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions