Skip to content

Stream tile writes per dask chunk segment to bound peak memory in to_geotiff#1489

Merged
brendancol merged 1 commit intomainfrom
issue-1485
May 5, 2026
Merged

Stream tile writes per dask chunk segment to bound peak memory in to_geotiff#1489
brendancol merged 1 commit intomainfrom
issue-1485

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1485.

Summary

write_streaming() materialised one full tile-row per dask compute call. On a 100k-pixel-wide float32 raster with tile_size=256 that is ~100 MB per row; at float64 with three bands it is ~600 MB. The function name implied a tighter ceiling.

This PR adds a streaming_buffer_bytes parameter (default 256 MB) to to_geotiff and threads it to write_streaming. When a tile-row footprint exceeds the budget, it is split into horizontal segments at tile-column boundaries. Each segment is computed, written, and freed before the next.

Single-segment rasters keep the original behaviour (one compute per tile-row), so the common case is unchanged.

Strip layout, COG, GPU, and eager paths are untouched.

Test plan

  • Existing test_streaming_write.py (23 tests) still pass.
  • New TestStreamingBufferBudget (4 tests):
    • Round-trip a 1024x8192 float64 dask raster, default budget vs ~2-tile-col segments.
    • 4 MB budget on a 16-MB-per-tile-row raster: writes successfully, max compute footprint observed at 4 MB.
    • Sub-tile budget (1 byte) clamps up to one tile column.
    • 512x2048 three-band float64 round trip with segmented writes.

…geotiff (#1485)

`write_streaming` previously materialised one full tile-row per dask
compute call. For wide rasters that buffer dwarfs a single tile and
defeats the point of streaming.

Add `streaming_buffer_bytes` (default 256 MB) to `to_geotiff` and
thread it through to `write_streaming`. Tile-rows whose footprint
exceeds the budget are split into horizontal segments at tile-column
boundaries; each segment is computed, written, and freed before the
next.

Behaviour is unchanged for rasters that fit in one segment (the common
case). Strip layout, COG, GPU, and eager paths are not touched.

Tests added in `test_streaming_write.py::TestStreamingBufferBudget`:
round-trip equality, a 4 MB tight-budget regression on a 16 MB tile-row,
sub-tile clamping, and a 3-band multiband case.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 5, 2026
@brendancol brendancol merged commit 5ac78a7 into main May 5, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stream tile writes per dask chunk segment to bound peak memory in to_geotiff

1 participant