Skip to content

Writer emits structurally wrong CRS metadata for compound / vertical / geocentric EPSG codes #2418

@brendancol

Description

@brendancol

Describe the bug

The stable-path EPSG writer in to_geotiff writes structurally wrong CRS metadata for compound, vertical, and geocentric CRSs. The public docstring at xrspatial/geotiff/_writers/eager.py:120 says integer EPSG codes are stable. Validation in xrspatial/geotiff/_crs.py:86 accepts any code pyproj can resolve. But _model_type_from_epsg in xrspatial/geotiff/_geotags.py:1418 only branches on is_geographic and treats everything else as projected, and build_geo_tags at xrspatial/geotiff/_geotags.py:1555 only knows how to emit GeographicTypeGeoKey (2048) or ProjectedCSTypeGeoKey (3072).

So a valid integer EPSG goes in, passes validation, and gets written into the wrong GeoKey slot. The xrspatial reader hides it on round-trip by re-resolving the integer through pyproj. rasterio / GDAL cannot recover the CRS.

Reproduction

EPSG:6349 is "NAD83(2011) + NAVD88 height", a compound horizontal + vertical CRS. pyproj reports is_geographic=True because pyproj surfaces the horizontal sub-CRS through that property, so _model_type_from_epsg returns MODEL_TYPE_GEOGRAPHIC and the writer stores 6349 in GeographicTypeGeoKey.

import numpy as np, xarray as xr, tempfile, rasterio
from xrspatial.geotiff import to_geotiff, open_geotiff

arr = xr.DataArray(np.zeros((4, 4), dtype=np.float32))
with tempfile.NamedTemporaryFile(suffix=".tif", delete=False) as tmp:
    path = tmp.name

to_geotiff(arr, path, crs=6349)

with rasterio.open(path) as src:
    print("rasterio CRS:", src.crs)
    print("rasterio to_epsg:", src.crs.to_epsg())

da = open_geotiff(path)
print("xrspatial crs attr:", da.attrs.get("crs"))

Output:

rasterio CRS: GEOGCS["unknown",DATUM["unnamed",SPHEROID["unretrievable - using WGS84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["unknown",0.0174532925199433],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","6349"]]
rasterio to_epsg: None
xrspatial crs attr: 6349

rasterio sees a GeoTIFF that claims AUTHORITY=EPSG:6349, but the GeoKeys describe an unknown geographic CRS with the WGS84 ellipsoid, so to_epsg() returns None. xrspatial's own reader masks the corruption by resolving the integer back through pyproj.

The same failure mode applies to other non-horizontal CRSs the writer currently accepts:

  • Geocentric CRS (e.g. EPSG:4978): is_geographic and is_projected are both False, so the writer falls into the else branch and tags it as a projected 2D CRS.
  • Vertical-only CRS (e.g. EPSG:5773): same as above.
  • Other compound horizontal+vertical CRSs (e.g. EPSG:5498).

Expected behavior

Two options:

  1. The writer recognises non-representable CRSs (compound, vertical-only, geocentric) and rejects them with a clear error pointing callers at the WKT-only fallback. The stable-path contract then guarantees that any integer EPSG accepted on the input side produces a GeoTIFF whose CRS round-trips through rasterio / GDAL with to_epsg() returning the input code.

  2. The writer implements proper GeoKey emission for those families (VerticalCSTypeGeoKey, geocentric ModelType, compound CRS via citation + sub-CRS keys).

Option (1) is the smaller fix and matches the existing posture of "EPSG codes are strongly preferred for interop; the WKT-only path emits 32767".

Additional context

Any test added for this fix has to validate output through rasterio (or another external GeoTIFF reader), not only through xrspatial's own reader. The current reader masks the bug by re-resolving the stored integer through pyproj.

Related: issue #1768 (WKT-only path), issue #2277 (model type classification heuristic).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions