Skip to content

Fix COG overview block ordering (#2308)#2309

Merged
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:issue-2308
May 22, 2026
Merged

Fix COG overview block ordering (#2308)#2309
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:issue-2308

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2308.

What changed

_assemble_cog_layout in xrspatial/geotiff/_write_layout.py now emits the on-disk pixel data in COG-spec order: smallest overview first, then progressively larger overviews, with the main-resolution image's tile blocks last. The IFD chain still walks [main, ov1, ov2, ...] in the conventional order, so the change is invisible to readers that just enumerate IFDs. Only the byte placement of the tile data moves, which is what strict validators inspect.

The fix is local to _assemble_cog_layout: build a pixel_emission_order list that puts the overview indices in reverse, with index 0 (main) last, then use that order for both the offset-precomputation pass and the final append pass. Per-level offset patching is unchanged because each IFD looks up its tile/strip base from level_pixel_offsets[level_idx].

The streaming writer does not produce COG output (cog=True always routes through the eager _write -> _assemble_tiff -> _assemble_cog_layout path), and the GPU writer funnels through the same helper, so a single change covers all writer paths.

Lines touched in _write_layout.py: roughly 30, all inside the existing function. No new helpers, no signature changes.

Evidence

Before, with the issue's repro:

The following errors were found:
- The offset of the first block of overview of index 0 should be after the one of the overview of index 1
- The offset of the first block of the main resolution image should be after the one of the overview of index 1
valid: False
errors: ['The offset of the first block of overview of index 0 should be after the one of the overview of index 1',
         'The offset of the first block of the main resolution image should be after the one of the overview of index 1']
warnings: []

After:

valid: True
errors: []
warnings: []

New test

xrspatial/geotiff/tests/test_overview_block_order_2308.py parses the TIFF directly via the existing _header.parse_all_ifds helpers, extracts the min tile-offset per IFD, and asserts ov_smallest_min < ... < ov_largest_min < main_min. Covers single-band and 3-band cases plus a three-overview-level case. A rio-cogeo-gated row reuses the skip pattern from test_cog_writer_compliance.py so contributor laptops without rio-cogeo see a clean skip while CI runs the strict check.

Test plan

  • pytest xrspatial/geotiff/tests/test_overview_block_order_2308.py (5/5 pass locally with rio-cogeo installed)
  • pytest xrspatial/geotiff/tests/ (5187 pass, 68 skip, 1 xpassed)
  • Issue repro: cog_validate(path, strict=False) returns (True, [], [])

Follow-up

The xfail(strict=False) marker on test_external_cog_validator in test_cog_writer_compliance.py now XPASSES (the gate's reason "writer emits overview tile blocks in wrong order; see #2308" no longer applies). The task scope is to leave that marker for a separate one-line follow-up PR once this lands, per the PR #2304 separation noted in the issue. No CI flag in setup.cfg makes XPASS strict, so this does not break the build.

Scope

Reorder pixel-data emission in _assemble_cog_layout so on-disk tile
blocks run smallest-overview first, then progressively larger, with
the main-resolution image last. The IFD chain still walks
[main, ov1, ov2, ...] in the conventional order; only the byte-level
placement of the tile blocks changes.

rio-cogeo cog_validate now returns valid=True on the issue's repro.
Before this fix it reported two block-order errors:
  - The offset of the first block of overview of index 0 should be
    after the one of the overview of index 1
  - The offset of the first block of the main resolution image should
    be after the one of the overview of index 1

Adds a layout-invariant test that parses the file directly and
asserts the min tile-offset per IFD runs ov_smallest < ... < ov_largest
< main_resolution, plus a rio-cogeo-gated row using the same skip
semantics as test_cog_writer_compliance.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 22, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix COG overview block ordering (#2308)

Blockers

None.

Suggestions

None.

Nits

  • _write_layout.py:312 -- the n_parts == 1 branch only triggers for callers that bypass the upstream is_cog and len(ifd_specs) > 1 gate. A one-line comment saying the [0] branch exists for direct unit tests would help future readers.

What looks good

  • The fix reorders only the byte placement of tile/strip data and leaves the IFD chain order and per-IFD tile-offset patching alone.
  • pixel_emission_order is built once and reused by both the offset-precomputation pass and the append pass, so they cannot drift.
  • level_pixel_offsets[level_idx] stays indexed by the original IFD position, so the third pass needs no changes.
  • _write_streaming has no cog parameter and the GPU writer funnels through the same _assemble_tiff -> _assemble_cog_layout call site, so one change covers every writer path that emits COG.
  • The new test parses files with _header.parse_all_ifds instead of reimplementing TIFF parsing, and the rio-cogeo gate reuses the skip pattern from test_cog_writer_compliance.py.
  • Existing geotiff tests pass (5187 pass / 68 skip / 1 xpassed locally).

Notes

  • The xpassed test is test_external_cog_validator, expected per the PR body. The xfail removal is deferred to a follow-up. setup.cfg has no xfail_strict flag, so XPASS does not turn the suite red.

Checklist

  • Algorithm matches the COG spec block-order requirement
  • Backends consistent (CPU eager and GPU funnel through the same helper; streaming does not produce COG)
  • NaN handling unchanged (this PR does not touch pixel values, only byte placement)
  • Edge cases covered (single-band, 3-band, 2-level, 3-level overviews, rio-cogeo strict=False)
  • Dask chunk boundaries n/a (dask + COG materialises eagerly)
  • No new copies or materialisations
  • No new function, so no README matrix or benchmark needed
  • Docstrings on new test helpers

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

The single nit from the prior pass is addressed in 760c7f5: the n_parts == 1 fallback now carries a one-line comment explaining that the upstream gate (is_cog and len(ifd_specs) > 1) keeps real callers off this branch and the entry exists for direct unit tests.

No new findings on a re-read. Tests still pass locally (73 in the cog suite + 1 xpassed).

@brendancol brendancol marked this pull request as ready for review May 22, 2026 15:33
@brendancol brendancol merged commit 86dcd8c into xarray-contrib:main May 22, 2026
7 checks passed
brendancol added a commit that referenced this pull request May 22, 2026
)

The xfail was added in #2304 as a placeholder so the rio-cogeo CI gate
could land green while the overview block ordering bug was being fixed.
#2309 landed that writer fix, so the test now passes -- drop the marker
and the accompanying comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff(cog=True) writes overview IFDs in wrong block order (rio-cogeo validator failure)

1 participant