Reject distinct per-band nodatavals on GeoTIFF write (#2514)#2519
Merged
Conversation
A TIFF stores one GDAL_NODATA tag per file, so per-band tuples with multiple distinct concrete sentinels cannot round-trip. The legacy resolver flattened to the first usable entry and the silent drop turned the remaining bands' sentinel cells into real data on the output file. * Add _check_write_distinct_per_band_nodatavals in _validation.py: raise ConflictingNodataError whenever attrs['nodatavals'] holds 2+ distinct concrete values, with or without an accompanying scalar attrs['nodata']. The explicit nodata= kwarg still short-circuits. * Harden _resolve_nodata_attr in _attrs.py with the same distinct- value check as defense-in-depth, so the helper never silently picks a winner if a future bypass path skips validation. * Add seven unit tests in tests/unit/test_metadata.py covering the dangerous cases (with/without scalar, three-band, error fires before file write) and the safe cases (repeated, None mixed with one concrete, all-NaN, explicit kwarg override).
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Reject distinct per-band nodatavals on GeoTIFF write (#2514)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
- The defensive raise in
_resolve_nodata_attr(xrspatial/geotiff/_attrs.py:1310-1320) has no direct unit test. Current tests all hit the writer-side_check_write_distinct_per_band_nodatavalsfirst, so the resolver's defensive branch is never exercised on its own. Add one test that calls_resolve_nodata_attr({'nodatavals': (-9999.0, -8888.0)})directly and assertsConflictingNodataError. Cheap to add, keeps the two layers honest if they drift.
Nits (optional improvements)
- The error message in
_validation.py:1166-1175and the one in_attrs.py:1311-1319are nearly identical but not byte-identical. Either share a single message builder or accept the duplication intentionally; if the latter, a comment cross-referencing the other site would help future readers. - In
_attrs.py:1283-1291theusable: list = []annotation lacks a parameter, e.g.usable: list[tuple[float, object]] = []. Minor type-hint precision.
What looks good
- Validator is registered alongside the existing
_check_write_conflicting_nodata, so all three write entry points (CPU eager, GPU, streaming) pick it up via the shared context-dict pattern. - Tests assert the error fires before any file is written (
assert not os.path.exists(out_path)), which is the right contract. - Safe cases (repeated value,
None+ single concrete, all-NaN, explicit kwarg override) are all covered. - Float equality dedup is correct here: the TIFF GDAL_NODATA tag stores one scalar, so any unequal pair is a real collision.
Checklist
- Algorithm matches reference/paper (n/a, validation tightening)
- All implemented backends produce consistent results (shared validator)
- NaN handling is correct (NaN entries skipped, all-NaN tuple still allowed)
- Edge cases are covered by tests (3-band, repeated, all-NaN,
None+ single) - Dask chunk boundaries handled correctly (n/a, write metadata check, no compute)
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed (not needed, validation only)
- README feature matrix updated if applicable (n/a, no new function)
- Docstrings present and accurate
#2514) * Move the distinct-per-band-nodatavals error text to a shared helper ``_distinct_per_band_nodatavals_msg`` in ``_errors.py`` so the write-side validator and the resolver's defense-in-depth raise stay byte-identical. Both sites now call the same builder. * Tighten ``usable: list`` -> ``usable: list[tuple[float, Any]]`` in ``_resolve_nodata_attr`` for type-hint precision. * Add a direct unit test for the resolver's defensive raise so the branch is exercised without going through the writer-side validator (which normally fires first).
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Reject distinct per-band nodatavals on GeoTIFF write (#2514)
Reviewing commit eb88dace against the previous round.
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
None.
What looks good
- The shared message builder
_distinct_per_band_nodatavals_msglives next toConflictingNodataErrorin_errors.py:107-127, and both_validation._check_write_distinct_per_band_nodatavalsand_attrs._resolve_nodata_attrnow call it. Wording stays byte-identical at both sites. - Type hint tightened to
usable: list[tuple[float, Any]]in_attrs.py:1296— reuses the existingAnyimport. test_resolve_nodata_attr_raises_on_distinct_per_band(tests/unit/test_metadata.py:1899) hits the resolver directly, and the same test asserts theattrs['nodata']short-circuit case so the precedence contract stays documented.- Full geotiff test suite (5921 passed, 74 skipped, 1 xfailed) stays green.
Checklist
- Algorithm matches reference/paper (n/a, validation tightening)
- All implemented backends produce consistent results (shared validator)
- NaN handling is correct
- Edge cases are covered by tests
- Dask chunk boundaries handled correctly (n/a)
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed (n/a)
- README feature matrix updated if applicable (n/a)
- Docstrings present and accurate
All three findings from the first round are resolved. No new findings.
# Conflicts: # xrspatial/geotiff/_validation.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2514.
Summary
GDAL_NODATAtag per file. The legacy resolver flattened a multi-entryattrs['nodatavals']tuple to its first usable value and silently dropped the rest, so any band declaring a different sentinel had its missing-data cells written as real data on the output file._check_write_distinct_per_band_nodatavalsin_validation.pyraisesConflictingNodataErrorwhenevernodatavalsholds 2+ distinct concrete values, with or without an accompanying scalarattrs['nodata']. The earlier check covered only the case where every tuple entry disagreed with the scalar; the dangerous case (one entry happens to match) slipped through._resolve_nodata_attrin_attrs.pycarries the same distinct-value check as defense-in-depth, so any internal bypass of validation surfaces the error rather than silently picking a winner.Nonemixed with a single concrete value, and explicitto_geotiff(..., nodata=X)override.Backend coverage
The validation is shared by all write paths: CPU eager (
_writers/eager.py), GPU (_writers/gpu.py), and tiled / VRT-driven writing through the same eager validation. No backend-specific code was changed.Test plan
pytest -q xrspatial/geotiff/tests/unit/test_metadata.py -k "distinct_per_band or repeated_concrete or none_and_single or all_nan_nodatavals or distinct_nodatavals_explicit"(7 new tests)pytest -q xrspatial/geotiff/tests/unit/test_metadata.py -k "nodata"(31 tests, all pass)pytest -q xrspatial/geotiff/tests -m 'not slow'(5920 passed, 74 skipped)