Accumulate color_ramp stats during the streaming dask write#3600
Merged
Conversation
to_geotiff(dask_data, path, color_ramp=...) executed the source graph twice: once in _write_streaming for the pixels and again in _finite_stats for the sidecar statistics (measured 32 chunk executions for a 16-chunk source). The streaming writer already materialises every pixel exactly once, so thread an optional chunk_observer through its three materialisation sites (row band, wide-raster segment, strip band) and fold the buffers into a StreamingStats accumulator (Chan's parallel mean/M2 combine, float64 accumulators, population stddev to match _finite_stats). The GPU and VRT write paths keep the separate stats pass; docs updated to say which paths pay it.
brendancol
commented
Jul 2, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Accumulate color_ramp stats during the streaming dask write
Blockers (must fix before merge)
None found.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_symbology.py:271--n_b = vals.sizeis a numpy scalar (np.intp), soself._countbecomes a numpy int after the first update. The moment math is safe (every product folds into float64 left-to-right), but coercing withint(vals.size)keeps the running count in unbounded Python ints and removes any doubt about numpy-scalar arithmetic at very large pixel counts. -
xrspatial/geotiff/tests/write/test_color_ramp_single_pass_3597.py:203--rel=1e-12compares Chan's combine against dask'snanstd(a different float64 reduction order). It passes here with margin, but two distinct algorithms agreeing to 1e-12 is tighter than the test needs;1e-9would still catch real regressions without inviting a platform-specific flake.
Nits (optional improvements)
-
xrspatial/geotiff/_writer.py:1219(and the two sibling sites) -- the observer call is three near-identical two-line blocks. Fine as is; a tiny_observe(buf)local would make the partition contract greppable from one place, but the writer's style leans explicit, so either way.
What looks good
- The observer hook sits after the CuPy
.get()/np.asarrayand before the dtype cast and NaN-to-sentinel restore, so the accumulator sees logical values on all three paths (row band_writer.py:1218, wide segment_writer.py:1269, strip band_writer.py:1392), and the buffers partition the raster even when the segmented path recomputes source chunks. The segmented-path test pins exactly this with full-width chunks. StreamingStatsmatches_finite_statssemantics point for point: finite-only, nodata excluded unless NaN, population stddev,Noneon empty. The NaN-nodata normalization in__init__mirrors thehas_nodatagate in_finite_stats.- The wiring is gated so nobody pays who didn't opt in:
color_ramp_rangeset, multiband, categorical,pack=True, and the GPU/VRT paths all keep their previous behavior, and the docstrings now say which paths still run the separate stats pass. - Execution-count tests use a zero-size-guarded counter under a lock, so they are robust to dask meta-inference calls and threaded schedulers.
- 13 new tests cover row-band, strip, and segmented paths, int/nodata/all-NaN/multiband gating, accumulator parity, and a dask+cupy
gpu=Falseleg run on a CUDA host.
Checklist
- Algorithm matches reference (Chan et al. parallel mean/M2 combine; population stddev matches GDAL's STATISTICS_STDDEV convention)
- All implemented backends produce consistent results (dask+numpy, dask+cupy via streaming; eager numpy/cupy unchanged)
- NaN handling is correct (finite-only accumulation; all-NaN write emits no sidecars)
- Edge cases are covered by tests (int dtypes, nodata sentinel, constant buffers, empty accumulator)
- Dask chunk boundaries handled correctly (buffers partition the raster; recomputed chunks on the segmented path do not double-count)
- No premature materialization or unnecessary copies (all-valid buffers skip the boolean-index copy; no astype(float64) copy)
- Benchmark exists or is not needed (no ASV geotiff benchmarks exist; execution counts are pinned by tests instead)
- README feature matrix updated (not applicable -- no new public function)
- Docstrings present and accurate (
chunk_observer,stats=, and the color_ramp/color_ramp_range path notes)
brendancol
commented
Jul 2, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review after 94f20a5:
- Suggestion 1 (numpy-scalar count): fixed.
updatenow coercesn_b = int(vals.size)so the running count and the moment arithmetic stay in Python ints / float64. - Suggestion 2 (tolerance): fixed. The two cross-algorithm
StreamingStatsvs_finite_statsassertions now userel=1e-9. - Nit (shared
_observehelper in_writer.py): dismissed. Three explicit two-line guards match the writer's existing style of repeating the get/asarray/astype/restore blocks per path; a closure would save four lines and add indirection.
38 symbology/color-ramp tests pass after the changes; flake8/isort clean. No remaining findings.
…e-geotiff-2026-07-01
brendancol
added a commit
that referenced
this pull request
Jul 2, 2026
brendancol
added a commit
that referenced
this pull request
Jul 3, 2026
deep-sweep security pass 2026-07-02 against xrspatial/geotiff. Delta since 2026-07-01 is write-path/docs only (#3595, #3600, #3592, #3593, #3588); no untrusted-input parser changed. Re-verified the read/parse surface (header caps, safe_xml DOCTYPE rejection, cog_http byte caps, sidecar SSRF hardening, decompression-bomb cap, VRT path-traversal containment, LZW/GPU kernel bounds). No new CRITICAL/HIGH/MEDIUM. Only state CSV updated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3597
to_geotiff(dask_data, path, color_ramp=...)executed the source graph twice: the streaming write computed every chunk, then_finite_statsrandask.computeover the same source to get the sidecar statistics. A countingmap_blockslayer on a 16-chunk source measured 32 chunk executions. After this change it's 16._write_streamingtakes an optionalchunk_observercallback, called with each materialized buffer at its three materialization sites (row band, wide-raster column segment, strip band), after the CuPy transfer and before the dtype cast / NaN-to-sentinel restore. The buffers partition the raster, so the observer sees each pixel exactly once, even on the segmented wide path where source chunks can be computed more than once.StreamingStatsaccumulator in_symbology.pyfolds those buffers into min/max/mean/stddev (Chan's parallel mean/M2 combine, float64 accumulators, population stddev, same finite/nodata exclusion as_finite_stats).write_symbology_sidecarsaccepts it through a newstats=parameter.gpu=True) and VRT write paths keep the separate stats pass: the GPU writer fully materializes anyway, and VRT writes per-tile through a different code path. Docstrings now say which paths pay the extra pass and wherecolor_ramp_rangestill helps.Backend coverage: dask+numpy and dask+cupy (
gpu=False) go through the accumulator. numpy/cupy eager writes are unchanged; the data is already in memory there and stats are one cheap pass.Bench on this machine: 8192x8192 float32 deflate round trip (read chunks=1024, write with stats) went from 1.53s to 1.44s. The bigger effect is that total source reads halve, which matters when the source graph is expensive to re-execute (HTTP COGs, cold storage, long compute pipelines).
Test plan:
test_color_ramp_single_pass_3597.py: execution counters on the row-band, strip, and segmented wide paths; stats parity against_finite_statsand plain numpy; nodata sentinel and int dtype exclusion; all-NaN and multiband gating;color_ramp_rangestill skips stats; a dask+cupygpu=Falsestreaming leg (run on a CUDA host).