Skip to content

Accumulate color_ramp stats during the streaming dask write#3600

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-performance-geotiff-2026-07-01
Jul 2, 2026
Merged

Accumulate color_ramp stats during the streaming dask write#3600
brendancol merged 4 commits into
mainfrom
deep-sweep-performance-geotiff-2026-07-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3597

to_geotiff(dask_data, path, color_ramp=...) executed the source graph twice: the streaming write computed every chunk, then _finite_stats ran dask.compute over the same source to get the sidecar statistics. A counting map_blocks layer on a 16-chunk source measured 32 chunk executions. After this change it's 16.

  • _write_streaming takes an optional chunk_observer callback, called with each materialized buffer at its three materialization sites (row band, wide-raster column segment, strip band), after the CuPy transfer and before the dtype cast / NaN-to-sentinel restore. The buffers partition the raster, so the observer sees each pixel exactly once, even on the segmented wide path where source chunks can be computed more than once.
  • A new StreamingStats accumulator in _symbology.py folds those buffers into min/max/mean/stddev (Chan's parallel mean/M2 combine, float64 accumulators, population stddev, same finite/nodata exclusion as _finite_stats). write_symbology_sidecars accepts it through a new stats= parameter.
  • The GPU (gpu=True) and VRT write paths keep the separate stats pass: the GPU writer fully materializes anyway, and VRT writes per-tile through a different code path. Docstrings now say which paths pay the extra pass and where color_ramp_range still helps.

Backend coverage: dask+numpy and dask+cupy (gpu=False) go through the accumulator. numpy/cupy eager writes are unchanged; the data is already in memory there and stats are one cheap pass.

Bench on this machine: 8192x8192 float32 deflate round trip (read chunks=1024, write with stats) went from 1.53s to 1.44s. The bigger effect is that total source reads halve, which matters when the source graph is expensive to re-execute (HTTP COGs, cold storage, long compute pipelines).

Test plan:

  • 13 new tests in test_color_ramp_single_pass_3597.py: execution counters on the row-band, strip, and segmented wide paths; stats parity against _finite_stats and plain numpy; nodata sentinel and int dtype exclusion; all-NaN and multiband gating; color_ramp_range still skips stats; a dask+cupy gpu=False streaming leg (run on a CUDA host).
  • Full geotiff write suite passes (1216 passed, 1 skipped); whole geotiff suite 5371 passed.
  • flake8 + isort clean on touched files.

to_geotiff(dask_data, path, color_ramp=...) executed the source graph
twice: once in _write_streaming for the pixels and again in
_finite_stats for the sidecar statistics (measured 32 chunk executions
for a 16-chunk source). The streaming writer already materialises every
pixel exactly once, so thread an optional chunk_observer through its
three materialisation sites (row band, wide-raster segment, strip band)
and fold the buffers into a StreamingStats accumulator (Chan's parallel
mean/M2 combine, float64 accumulators, population stddev to match
_finite_stats). The GPU and VRT write paths keep the separate stats
pass; docs updated to say which paths pay it.
@brendancol brendancol added performance PR touches performance-sensitive code geotiff GeoTIFF module dask Dask backend / chunked arrays sweep-performance Found by /sweep-performance labels Jul 2, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Accumulate color_ramp stats during the streaming dask write

Blockers (must fix before merge)

None found.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_symbology.py:271 -- n_b = vals.size is a numpy scalar (np.intp), so self._count becomes a numpy int after the first update. The moment math is safe (every product folds into float64 left-to-right), but coercing with int(vals.size) keeps the running count in unbounded Python ints and removes any doubt about numpy-scalar arithmetic at very large pixel counts.
  • xrspatial/geotiff/tests/write/test_color_ramp_single_pass_3597.py:203 -- rel=1e-12 compares Chan's combine against dask's nanstd (a different float64 reduction order). It passes here with margin, but two distinct algorithms agreeing to 1e-12 is tighter than the test needs; 1e-9 would still catch real regressions without inviting a platform-specific flake.

Nits (optional improvements)

  • xrspatial/geotiff/_writer.py:1219 (and the two sibling sites) -- the observer call is three near-identical two-line blocks. Fine as is; a tiny _observe(buf) local would make the partition contract greppable from one place, but the writer's style leans explicit, so either way.

What looks good

  • The observer hook sits after the CuPy .get() / np.asarray and before the dtype cast and NaN-to-sentinel restore, so the accumulator sees logical values on all three paths (row band _writer.py:1218, wide segment _writer.py:1269, strip band _writer.py:1392), and the buffers partition the raster even when the segmented path recomputes source chunks. The segmented-path test pins exactly this with full-width chunks.
  • StreamingStats matches _finite_stats semantics point for point: finite-only, nodata excluded unless NaN, population stddev, None on empty. The NaN-nodata normalization in __init__ mirrors the has_nodata gate in _finite_stats.
  • The wiring is gated so nobody pays who didn't opt in: color_ramp_range set, multiband, categorical, pack=True, and the GPU/VRT paths all keep their previous behavior, and the docstrings now say which paths still run the separate stats pass.
  • Execution-count tests use a zero-size-guarded counter under a lock, so they are robust to dask meta-inference calls and threaded schedulers.
  • 13 new tests cover row-band, strip, and segmented paths, int/nodata/all-NaN/multiband gating, accumulator parity, and a dask+cupy gpu=False leg run on a CUDA host.

Checklist

  • Algorithm matches reference (Chan et al. parallel mean/M2 combine; population stddev matches GDAL's STATISTICS_STDDEV convention)
  • All implemented backends produce consistent results (dask+numpy, dask+cupy via streaming; eager numpy/cupy unchanged)
  • NaN handling is correct (finite-only accumulation; all-NaN write emits no sidecars)
  • Edge cases are covered by tests (int dtypes, nodata sentinel, constant buffers, empty accumulator)
  • Dask chunk boundaries handled correctly (buffers partition the raster; recomputed chunks on the segmented path do not double-count)
  • No premature materialization or unnecessary copies (all-valid buffers skip the boolean-index copy; no astype(float64) copy)
  • Benchmark exists or is not needed (no ASV geotiff benchmarks exist; execution counts are pinned by tests instead)
  • README feature matrix updated (not applicable -- no new public function)
  • Docstrings present and accurate (chunk_observer, stats=, and the color_ramp/color_ramp_range path notes)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 94f20a5:

  • Suggestion 1 (numpy-scalar count): fixed. update now coerces n_b = int(vals.size) so the running count and the moment arithmetic stay in Python ints / float64.
  • Suggestion 2 (tolerance): fixed. The two cross-algorithm StreamingStats vs _finite_stats assertions now use rel=1e-9.
  • Nit (shared _observe helper in _writer.py): dismissed. Three explicit two-line guards match the writer's existing style of repeating the get/asarray/astype/restore blocks per path; a closure would save four lines and add indirection.

38 symbology/color-ramp tests pass after the changes; flake8/isort clean. No remaining findings.

@brendancol brendancol merged commit a7e94a1 into main Jul 2, 2026
10 checks passed
brendancol added a commit that referenced this pull request Jul 2, 2026
brendancol added a commit that referenced this pull request Jul 3, 2026
deep-sweep security pass 2026-07-02 against xrspatial/geotiff. Delta
since 2026-07-01 is write-path/docs only (#3595, #3600, #3592, #3593,
#3588); no untrusted-input parser changed. Re-verified the read/parse
surface (header caps, safe_xml DOCTYPE rejection, cog_http byte caps,
sidecar SSRF hardening, decompression-bomb cap, VRT path-traversal
containment, LZW/GPU kernel bounds). No new CRITICAL/HIGH/MEDIUM. Only
state CSV updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dask Dask backend / chunked arrays geotiff GeoTIFF module performance PR touches performance-sensitive code sweep-performance Found by /sweep-performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff color_ramp computes a dask source twice (write pass + stats pass)

1 participant