Skip to content

Route sidecar georef tags through the sidecar's bytes (#2315)#2320

Merged
brendancol merged 2 commits into
mainfrom
issue-2315
May 22, 2026
Merged

Route sidecar georef tags through the sidecar's bytes (#2315)#2320
brendancol merged 2 commits into
mainfrom
issue-2315

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #2315.

  • Add an optional sidecar_origin mapping to extract_geo_info_with_overview_inheritance. When a sidecar IFD declares its own GeoKeyDirectory, ModelPixelScale, ModelTiepoint, or ModelTransformation, the helper parses those tags against the sidecar's bytes and byte order instead of the base file's.
  • Wire the mapping through both call sites (_reader.py and _read_geo_info in __init__.py) so the eager and metadata-only paths agree.
  • Conservative gate: a sidecar IFD with no georef tags (the GDAL convention) keeps today's behavior and inherits from the base file's level-0 IFD.

Backend coverage

Read-side only. The fix applies to every read backend that shares the metadata parse: numpy eager, dask+numpy via _read_geo_info, and cupy / dask+cupy which call back into the same metadata helper. No GPU or dask-specific code changes.

Test plan

  • New regression test test_sidecar_own_geokeys_2315.py covers both branches:
    • Sidecar with its own ModelPixelScale, ModelTiepoint, and GeoKeyDirectory: the sidecar's georef wins.
    • Sidecar with georef tags stripped (GDAL convention): the base file's georef is inherited.
  • Both open_geotiff (eager) and _read_geo_info (metadata-only / dask graph builder) exercised.
  • Existing test_sidecar_ovr_2112.py suite still passes (28 tests).
  • Wider sidecar / overview / geo_info selection (648 tests) passes.

The local eager reader and the metadata-only path both swap pixel
bytes over to the sidecar when the selected IFD lives there, but the
georef extraction call right below the swap still passed the base
file's data + header.byte_order. This worked under the usual GDAL
convention -- sidecar IFDs carry no geokeys -- but silently broke when
a sidecar declared its own GeoKeyDirectory or model tags.

extract_geo_info_with_overview_inheritance now accepts an optional
sidecar_origin mapping from id(ifd) to (data, byte_order). When the
selected IFD is in the mapping AND declares one of the georef-bearing
tags (33550, 33922, 34264, 34735), the helper resolves tag offsets
against the sidecar's bytes. Sidecars without geokeys still inherit
from the base file's level-0 IFD as before.

Both call sites (_reader.py and __init__.py:_read_geo_info) now build
and forward the mapping.

Includes a synthetic regression test covering both cases: sidecar
with its own ModelPixelScale / ModelTiepoint / GeoKeyDirectory, and
sidecar with those tags stripped (today's GDAL-convention behavior).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 22, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Route sidecar georef tags through the sidecar's bytes (#2315)

Blockers

  • None.

Suggestions

  • xrspatial/geotiff/_cog_http.py:242 -- the HTTP / fsspec metadata path calls extract_geo_info_with_overview_inheritance(ifd, ifds, header_bytes, header.byte_order, ...) without the new sidecar_origin kwarg. A sidecar IFD over HTTP that declares its own georef payload still parses against the base file's bytes here. A sibling rockout is reportedly handling the HTTP / byte-order side of this finding; if that is the plan, leave a cross-reference comment in the source so future maintainers see the symmetry argument. Otherwise the one-line fix is to thread the sidecar's IFD ids and (sidecar.data, sidecar.header.byte_order) through discover_remote_sidecar's return into a mapping and pass it through.

Nits

  • xrspatial/geotiff/_geotags.py extract_geo_info_with_overview_inheritance body -- the inheritance branch (~line 1174) still calls extract_geo_info(base_ifd, data, byte_order, ...). If base_ifd somehow comes from the sidecar (uncommon: the base file would need to have no full-res IFD of its own), the inherited georef parses against the wrong buffer. The new sidecar_origin map is the natural place to fix it -- the same lookup you do for ifd would work for base_ifd. Low priority, but a 3-line guard is cheap.

  • xrspatial/geotiff/tests/test_sidecar_own_geokeys_2315.py _mark_first_ifd_as_overview assumes the new NewSubfileType entry (tag 254) sorts before every existing tag in the IFD, so it inserts at position 0. True for to_geotiff output today (smallest tag emitted is 256 / ImageWidth). Brittle if a future writer change adds a tag with id <= 254. Worth an assert that the first existing tag id is > 254, or a bisect-style insert.

What looks good

  • The fix is conservatively gated on _ifd_has_georef_payload(ifd). The GDAL-convention case (sidecar with no geokeys) is unchanged.
  • Both eager and metadata-only call sites are updated symmetrically.
  • The regression test exercises both branches: sidecar wins when it has its own geokeys, base inherits when the sidecar has none.
  • Signature change is backward compatible (new kwarg defaults to None).

Checklist

  • Algorithm matches reference
  • NaN handling: not applicable (metadata path, no array ops)
  • Edge cases covered (sidecar with vs without geokeys)
  • Dask: the _read_geo_info path used by the dask graph builder is exercised
  • No premature materialization (metadata-only changes)
  • Benchmark coverage: not applicable
  • README feature matrix: not applicable (bug fix, no new function)
  • Docstrings present and updated

* `_geotags.py`: when a base IFD is also tracked in `sidecar_origin`
  (uncommon: a file with no full-resolution IFD on the base side),
  resolve its georef against the sidecar's bytes too.  Mirrors the
  lookup applied to the selected IFD.
* `test_sidecar_own_geokeys_2315.py`: assert the first existing IFD
  tag id is > 254 before inserting the NewSubfileType entry at
  position 0, so a future writer change cannot silently break the
  fixture's "insert at front" assumption.
* `_cog_http.py`: leave a cross-reference comment noting that the
  `sidecar_origin` routing added in #2315 is not threaded through
  the HTTP / fsspec metadata path yet; that gap is tracked by the
  separate HTTP sidecar-byte-order finding.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Disposition of the round-1 findings:

  • Suggestion (_cog_http.py:242, HTTP sidecar): deferred to the sibling HTTP / dask sidecar-byte-order rockout that already owns this surface. Added a cross-reference comment in the source so the symmetry argument is visible to future maintainers.
  • Nit (base_ifd inheritance): fixed in commit 9d8a3e3. The inheritance branch now applies the same sidecar_origin lookup to base_ifd that the main branch applies to ifd.
  • Nit (test fixture insert assumption): fixed in commit 9d8a3e3. _mark_first_ifd_as_overview now asserts the first existing IFD tag id is > 254 before inserting NewSubfileType at position 0, so a future writer change that emits a tag <= 254 fails loudly instead of silently producing an out-of-order IFD.

All 34 sidecar-focused tests still pass after the follow-up.

@brendancol brendancol merged commit 8052140 into main May 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sidecar georeferencing extraction ignores sidecar's own geokeys

1 participant