Skip to content

geotiff: open_geotiff(overview_level>=1) drops CRS, transform, and georeferenced coords #1640

@brendancol

Description

@brendancol

Describe the bug

open_geotiff(path, overview_level=N) with N >= 1 returns a DataArray with
its geographic metadata silently dropped. The CRS, the transform attr, and
the georeferenced y/x coordinates are all replaced with defaults (unit
transform, integer pixel indices, crs=None). This happens even when the file
was written by xrspatial.geotiff.to_geotiff(..., cog=True) itself, because
to_geotiff writes the GeoKeys, ModelPixelScale, and ModelTiepoint only on the
level-0 IFD. Most GDAL-style COG writers do the same.

The reader passes a single selected IFD to
xrspatial.geotiff._geotags.extract_geo_info. When the overview IFD has no
geokeys, the helper returns a default GeoTransform() with has_georef=False
and no CRS, and _populate_attrs_from_geo_info + _geo_to_coords then build a
non-georeferenced DataArray.

Downstream spatial ops break: slope, aspect, distance, and anything that reads
attrs['res'] or derives pixel sizes from the coord arrays gets the wrong
answer (typically off by the overview's reduction factor) with no warning.

Repro

import numpy as np
import xarray as xr
import tempfile
import os
from xrspatial.geotiff import to_geotiff, open_geotiff

arr = np.arange(1024 * 1024, dtype=np.float32).reshape(1024, 1024)
y = np.arange(1024, dtype=np.float64) * (-0.5) + 200.0
x = np.arange(1024, dtype=np.float64) *   0.5 + 100.0
da = xr.DataArray(arr, dims=['y', 'x'],
                  coords={'y': y, 'x': x},
                  attrs={'crs': 4326})

with tempfile.TemporaryDirectory() as tmp:
    path = os.path.join(tmp, 'cog.tif')
    to_geotiff(da, path, cog=True, overview_levels=[1, 2, 3])

    r0 = open_geotiff(path, overview_level=0)
    r1 = open_geotiff(path, overview_level=1)
    print('level 0 transform:', r0.attrs['transform'])
    print('level 1 transform:', r1.attrs.get('transform'))
    print('level 1 crs:      ', r1.attrs.get('crs'))

Output:

level 0 transform: (0.5, 0.0, 99.75, 0.0, -0.5, 200.25)
level 1 transform: (1.0, 0.0, 0.0, 0.0, -1.0, 0.0)   # default; should scale pixel size by 2 around the same origin
level 1 crs:       None                              # CRS dropped

All four read backends are affected the same way (open_geotiff numpy,
open_geotiff(chunks=N) dask, open_geotiff(gpu=True) cupy, and
open_geotiff(gpu=True, chunks=N) dask+cupy).

Expected behavior

Reading any overview level should produce a DataArray with the same CRS as the
full-resolution read, a transform whose pixel size scales by the overview
reduction factor, and y/x coords covering the same geographic extent as level
0.

Fix

When an overview IFD lacks ModelPixelScale / ModelTiepoint /
ModelTransformation or its GeoKeyDirectory, fall back to the geo info from the
first full-resolution IFD in the file, then rescale the pixel size by the
integer ratio width_full / width_overview (and the same for height). Put the
fix in a shared helper so all four backends inherit it.

Additional context

  • select_overview_ifd already filters out mask and page IFDs before indexing.
  • extract_geo_info is shared by the numpy, dask, cupy, and dask+cupy read
    paths.
  • No existing tests assert on transform/crs/coords for overview_level >= 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions