Author of Proposal: @brendan
Reason or Problem
to_geotiff calls .compute() on dask-backed DataArrays, materializing the
full raster into RAM before writing. For rasters that don't fit in memory, this
makes to_geotiff unusable unless you go through the VRT tiled output path
(see #1083).
Proposal
Add a streaming write path for monolithic .tif output from dask inputs.
Write tiles sequentially as each dask chunk computes, then seek back to patch
the IFD offset/bytecount arrays. This is how GDAL handles it internally.
This is a follow-up to the VRT output work in #1083. The VRT path solves the
immediate problem for most users, but some workflows genuinely need a single
file.
Design: Not yet specced. Needs investigation into how the current
_writer.py assembles IFDs and whether the two-pass approach (write tiles,
then patch offsets) can be grafted onto the existing code without a rewrite.
Drawbacks
- Adds complexity to the writer for a use case the VRT path already covers.
- TIFF offset patching is fiddly and easy to get wrong.
Alternatives
Author of Proposal: @brendan
Reason or Problem
to_geotiffcalls.compute()on dask-backed DataArrays, materializing thefull raster into RAM before writing. For rasters that don't fit in memory, this
makes
to_geotiffunusable unless you go through the VRT tiled output path(see #1083).
Proposal
Add a streaming write path for monolithic
.tifoutput from dask inputs.Write tiles sequentially as each dask chunk computes, then seek back to patch
the IFD offset/bytecount arrays. This is how GDAL handles it internally.
This is a follow-up to the VRT output work in #1083. The VRT path solves the
immediate problem for most users, but some workflows genuinely need a single
file.
Design: Not yet specced. Needs investigation into how the current
_writer.pyassembles IFDs and whether the two-pass approach (write tiles,then patch offsets) can be grafted onto the existing code without a rewrite.
Drawbacks
Alternatives
gdal_translateif they need asingle file.