Stream tile writes per dask chunk segment to bound peak memory in to_geotiff#1489
Merged
brendancol merged 1 commit intomainfrom May 5, 2026
Merged
Stream tile writes per dask chunk segment to bound peak memory in to_geotiff#1489brendancol merged 1 commit intomainfrom
brendancol merged 1 commit intomainfrom
Conversation
…geotiff (#1485) `write_streaming` previously materialised one full tile-row per dask compute call. For wide rasters that buffer dwarfs a single tile and defeats the point of streaming. Add `streaming_buffer_bytes` (default 256 MB) to `to_geotiff` and thread it through to `write_streaming`. Tile-rows whose footprint exceeds the budget are split into horizontal segments at tile-column boundaries; each segment is computed, written, and freed before the next. Behaviour is unchanged for rasters that fit in one segment (the common case). Strip layout, COG, GPU, and eager paths are not touched. Tests added in `test_streaming_write.py::TestStreamingBufferBudget`: round-trip equality, a 4 MB tight-budget regression on a 16 MB tile-row, sub-tile clamping, and a 3-band multiband case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1485.
Summary
write_streaming()materialised one full tile-row per dask compute call. On a 100k-pixel-wide float32 raster withtile_size=256that is ~100 MB per row; at float64 with three bands it is ~600 MB. The function name implied a tighter ceiling.This PR adds a
streaming_buffer_bytesparameter (default 256 MB) toto_geotiffand threads it towrite_streaming. When a tile-row footprint exceeds the budget, it is split into horizontal segments at tile-column boundaries. Each segment is computed, written, and freed before the next.Single-segment rasters keep the original behaviour (one compute per tile-row), so the common case is unchanged.
Strip layout, COG, GPU, and eager paths are untouched.
Test plan
test_streaming_write.py(23 tests) still pass.TestStreamingBufferBudget(4 tests):