Skip to content

geotiff: fail closed on ambiguous geospatial metadata (CRS, transform, coords, nodata) #1987

@brendancol

Description

@brendancol

Reason or Problem

The reader has tightened over time (test_crs_fail_closed_1929.py, test_crs_fail_closed_gpu_1929.py, test_coord_regularity_1720.py, test_rotated_transform_attr_1764.py) but several "guess and continue" paths remain. The reviewer's position, which I agree with: for foundational I/O, refusing ambiguous input is better than manufacturing a plausible-looking raster.

Specific cases still loose:

  • Invalid CRS codes — to_geotiff(crs=True) writes EPSG=1 silently (already filed as geotiff: to_geotiff(crs=True) writes EPSG=1 silently; bool slips through int check #1971).
  • Unparseable CRS strings — partial WKT may be accepted and emitted unchanged on read, leading to mismatched crs vs crs_wkt attrs.
  • Rotated transforms — currently flagged in attrs but the reader does not reject; downstream functions assume axis-aligned and may produce wrong results.
  • Non-uniform coords on write — to_geotiff accepts coords that disagree with the implied transform; sentinel checks exist but coverage is uneven.
  • Mixed band metadata — VRT inputs with bands that declare different nodata sentinels are silently flattened.
  • Conflicting crs and crs_wkt on write — no consistency check.
  • Conflicting nodata aliases — attrs['nodata'] set to one value and attrs['nodatavals'] set to another. _resolve_nodata_attr picks one and ignores the other.

Proposal

Each of the above becomes a hard error by default, with an opt-in escape hatch where the legacy behavior was documented.

Design:

Add a validation layer in _validation.py (which already exists) and call it from every read and write entry point. The validator raises with a specific error class per case (subclasses of a new GeoTIFFAmbiguousMetadataError). Each case gets its own test in test_crs_fail_closed_* style — one new test file per case is fine, but prefer adding to existing fail-closed files where the topic matches.

Per-case rough plan:

  • invalid CRS codes — covered by geotiff: to_geotiff(crs=True) writes EPSG=1 silently; bool slips through int check #1971, link from this issue.
  • unparseable CRS strings — raise on read, raise on write. No silent passthrough.
  • rotated transforms — read raises by default; opt-in allow_rotated=True keeps existing attr-flag behavior for users who want the array without geo-aware downstream ops.
  • non-uniform coords — write raises; the existing sentinel exemption for int-dtype coords stays (geotiff: hotfix require_transform_for_georeferenced for int-coord arrays #1969).
  • mixed band metadata in VRT — read raises; opt-in band_nodata='first' retains current.
  • conflicting crs/crs_wkt — write raises if both are set and disagree (compare EPSG → WKT after canonicalization).
  • conflicting nodata aliases — write raises if attrs['nodata'] and attrs['nodatavals'] disagree. _FillValue is a CF alias and continues to be deprioritized.

Usage:
Users see a typed error pointing at the ambiguous field. Existing legitimate workflows surface through explicit opt-in kwargs.

Value:
A user complaining that "xrspatial gave me the wrong CRS" almost always traces to a guess we should have refused. Reviewer's words: "refusing ambiguous input is better than manufacturing plausible-looking rasters."

Stakeholders and Impacts

  • Read paths and write paths gain a validation step.
  • Downstream code that relied on the guess may need to set the new opt-in kwargs.
  • Existing fail-closed tests get extended; no expected removal.

Drawbacks

This is a behavior change. Some files that read today will start raising. Each case needs a release-note entry and a migration suggestion.

Alternatives

Per-case warnings instead of errors. Already tried — wkt_only_crs_warning is one example. Warnings get filtered out and users continue to be surprised.

Unresolved Questions

  • Single base error class with subclasses, or one error class per case? Probably subclasses, but worth confirming for except clarity.
  • Should the opt-in kwargs go on every entry point or live on a central strict_metadata= flag (defaulting to True)?

Additional Notes or Context

Source of this proposal: code review feedback on the geotiff module, suggestion 4 of 5. Each case should land as a separate PR per the project convention of one fix per PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinput-validationInput validation and error messages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions