What
_assemble_standard_layout and _assemble_cog_layout in xrspatial/geotiff/_writer.py build up the output TIFF file in a bytearray, then return bytes(output) at the end. The bytes(...) call copies the entire bytearray contents, transiently doubling peak memory.
Lines:
xrspatial/geotiff/_writer.py:1035 (_assemble_standard_layout)
xrspatial/geotiff/_writer.py:1118 (_assemble_cog_layout)
Why it matters
The eager (non-streaming) writer materializes the full output buffer in memory. The bytearray-to-bytes conversion at the end of the assembly path doubles peak memory transiently. Measured:
output (bytearray) size: 95.37 MB
bytes(output) peak: 95.37 MB extra
So a 1 GB TIFF write adds 1 GB of peak memory beyond the bytearray itself. For users writing large COGs from memory, this can push a borderline-OK write into OOM territory.
Fix
Return the bytearray directly. _write_bytes calls f.write(file_bytes) which accepts any buffer-protocol object, and the post-write parse_header(file_bytes[:16]) validation also accepts bytearray slicing. The bytes(...) call is a pure copy with no contract benefit.
The streaming writer (write_streaming) already writes pixel data directly to the file handle and does not have this issue.
Reproduction
import io, numpy as np, tracemalloc
from xrspatial.geotiff import to_geotiff
import xarray as xr
arr = xr.DataArray(np.random.randint(0, 255, (10000, 10000), dtype=np.uint8), dims=['y', 'x'])
out = io.BytesIO()
tracemalloc.start()
to_geotiff(arr, out, compression='none', tiled=False)
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"output size {len(out.getvalue())/(1024*1024):.1f} MB, peak {peak/(1024*1024):.1f} MB")
Observed: peak ~3x output size. After fix: peak ~2x output size.
Severity
MEDIUM. Real measurable memory overhead, easy fix, narrow blast radius.
What
_assemble_standard_layoutand_assemble_cog_layoutinxrspatial/geotiff/_writer.pybuild up the output TIFF file in abytearray, then returnbytes(output)at the end. Thebytes(...)call copies the entire bytearray contents, transiently doubling peak memory.Lines:
xrspatial/geotiff/_writer.py:1035(_assemble_standard_layout)xrspatial/geotiff/_writer.py:1118(_assemble_cog_layout)Why it matters
The eager (non-streaming) writer materializes the full output buffer in memory. The bytearray-to-bytes conversion at the end of the assembly path doubles peak memory transiently. Measured:
So a 1 GB TIFF write adds 1 GB of peak memory beyond the bytearray itself. For users writing large COGs from memory, this can push a borderline-OK write into OOM territory.
Fix
Return the
bytearraydirectly._write_bytescallsf.write(file_bytes)which accepts any buffer-protocol object, and the post-writeparse_header(file_bytes[:16])validation also accepts bytearray slicing. Thebytes(...)call is a pure copy with no contract benefit.The streaming writer (
write_streaming) already writes pixel data directly to the file handle and does not have this issue.Reproduction
Observed: peak ~3x output size. After fix: peak ~2x output size.
Severity
MEDIUM. Real measurable memory overhead, easy fix, narrow blast radius.