You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
xrspatial.geotiff exposes a hardened public surface (open_geotiff, to_geotiff, write_geotiff_gpu, write_vrt, read_geotiff_gpu, read_geotiff_dask, read_vrt) but several lower-level functions remain importable and callable directly. These lower-level entry points skip validation that the public wrappers perform, so direct callers can produce subtly different files (or read subtly different DataArrays) without any warning.
The lower-level entry points are not prefixed with _ and are not marked private in the module docstring, so static analysis treats them as importable API.
Callable lower-level entry points
Confirmed via grep -n 'def write\|def _write\|def to_geotiff\|def read\|def open' across _writer.py, _writers/*.py, and _reader.py:
xrspatial.geotiff._writer.write (_writer.py:1638) — array-level CPU writer that to_geotiff calls after coercion.
xrspatial.geotiff._writer.write_streaming (_writer.py:1999) — dask streaming writer that to_geotiff calls on dask-backed DataArrays.
xrspatial.geotiff._reader.read_to_array (_reader.py:3188) — array-level reader that open_geotiff calls and then post-processes. __init__.py imports it as _read_to_array to discourage external use, but the public name on _reader remains read_to_array.
Two helpers in _writers/eager.py already use the underscore convention and are correctly private:
_write_single_tile (_writers/eager.py:763)
_write_vrt_tiled (_writers/eager.py:832)
Validation gaps between to_geotiff and write / write_streaming
Concrete checks that to_geotiff runs (_writers/eager.py:56) but write / write_streaming skip:
Compression name validation against the canonical list.to_geotiff checks compression.lower() in _VALID_COMPRESSIONS (_writers/eager.py:312) and surfaces the full list on the error. write calls _compression_tag(compression) (_writer.py:1744), which raises but without the canonical list, so a typo like compression='zstandard' produces a less actionable error.
JPEG opt-in gate.to_geotiff rejects compression='jpeg' unless allow_internal_only_jpeg=True (_writers/eager.py:330, issue write_geotiff_gpu emits JPEG TIFFs that other readers reject #1845). write and write_streaming accept JPEG without the gate, so a direct caller can silently produce a file that does not round-trip through libtiff / GDAL / rasterio.
max_z_error codec pairing.to_geotiff rejects negative values and rejects non-zero values with non-LERC codecs (_writers/eager.py:348). write accepts both and passes them to _write_tiled / _write_stripped where they are silently ignored on non-LERC codecs.
crs argument validation.to_geotiff calls _validate_crs_arg(crs) (_writers/eager.py:549) to reject bool and other non-int types. write takes a raw crs_epsg: int | None and writes whatever it receives — passing crs_epsg=True round-trips as EPSG=1.
NaN-to-sentinel rewrite with defensive copy.to_geotiff rewrites NaN pixels to the nodata sentinel via arr.copy() to avoid mutating the caller's buffer (_writers/eager.py:714). write requires the caller to have done this already; a direct caller that passes a NaN-containing float array writes NaN bytes to disk, and the resulting file decodes with NaN pixels even where the sentinel tag is set.
3D dim-order validation and band-first reorder.to_geotiff runs _validate_3d_writer_dims(data.dims) (_writers/eager.py:625, issue to_geotiff: silent data corruption on 3D input with non-whitelisted leading dim #1812) and moves a leading band dim to the trailing axis (_writers/eager.py:627, :690). write assumes the array is already band-last; a direct caller passing (band, y, x) writes a file whose y axis carries band data.
float16 / bool_ auto-promotion.to_geotiff promotes float16 -> float32 and bool_ -> uint8 (_writers/eager.py:702). write calls numpy_to_tiff_dtype on the raw dtype; a float16 direct call raises from inside the dtype mapper instead of being handled.
Georeferenced-transform fail-closed.to_geotiff calls _require_transform_for_georeferenced(data, geo_transform) (_writers/eager.py:573, issue geotiff: 1xN / Nx1 georeferenced writes silently strip transform #1945) to refuse silently writing a non-georeferenced TIFF when coords look georeferenced but a transform could not be derived. write accepts geo_transform=None and writes an identity-transform file.
mask_nodata attr.open_geotiff writes attrs['masked_nodata'] (__init__.py:697, issue Bug: attrs['masked_nodata'] reports True when masking was disabled #2092) so the writer can decide whether to reverse the NaN promotion. Direct callers building a DataArray themselves can desync this attr from the buffer state.
attrs['transform'] and attrs['crs'] population.open_geotiff calls _populate_attrs_from_geo_info (__init__.py:633). Without it, the writer-side bit-stable round-trip path that depends on attrs['transform'] does not work.
Options
Option A: push validation down
Move the missing checks into write / write_streaming / read_to_array so direct callers cannot skip them.
Pros: hardens every entry point in one pass. Direct callers benefit. The public wrappers become thinner.
Cons: some validations need state that only the public layer has (e.g. data.attrs for _should_restore_nan_sentinel, data.dims for _validate_3d_writer_dims). Pushing those down forces re-shaping the array-level API to accept that state explicitly. Duplicate work when the public wrapper already validated.
Option B: formalize private-only
Rename write -> _write, write_streaming -> _write_streaming, read_to_array -> _read_to_array on the source module. Update the module docstring on _writer.py and _reader.py to spell out the private contract. Optionally add a runtime check that walks the call stack and warns on calls from outside xrspatial.geotiff.
Pros: cheap, no API reshape, makes the intent explicit to static analysers and code reviewers.
Cons: does not actually fix the footgun for callers who ignore the warning. Breaks any existing external imports (release-note item).
Recommended: hybrid
Push down the checks whose absence produces subtly different files, because these are the silent-divergence cases:
Compression-name validation against _VALID_COMPRESSIONS.
JPEG opt-in gate.
max_z_error codec pairing.
crs_epsg / crs_wkt type validation (reject bool, reject unparseable WKT without the opt-in).
NaN-to-sentinel rewrite with defensive copy (gated on restore_sentinel, which write already accepts).
Rename the rest to underscore-prefix because they need DataArray state the array-level entry point does not have:
write -> _write
write_streaming -> _write_streaming
read_to_array -> _read_to_array (and update the __init__.py import to drop the rename alias)
Update module docstrings on _writer.py and _reader.py to say the module is private. The _writers/ package already does this (_writers/eager.py:1).
Tests
Each lower-level entry point should produce byte-identical output to the public wrapper for matching inputs, or the divergence should be intentional and documented:
test_write_vs_to_geotiff_byte_parity: feed write(arr, ...) and to_geotiff(xr.DataArray(arr), ...) the same inputs and assert byte-equal output files for every compression in _VALID_COMPRESSIONS.
test_write_streaming_vs_to_geotiff_byte_parity: same idea for dask-backed inputs, comparing the streaming path to to_geotiff with the same chunk size.
If the renames are accepted: test_no_public_lowercase_write_reexport that imports xrspatial.geotiff and asserts the renamed names are absent from the namespace.
Acceptance
All eight write-side validation gaps and four read-side gaps either fail at the array level or are explicitly out of scope (with a docstring note explaining why).
Lower-level functions that need DataArray state are renamed to underscore-prefix and the module docstrings reflect the private contract.
Byte-parity tests pass for write vs to_geotiff and write_streaming vs to_geotiff.
Release notes call out the renames as a breaking change for any external import of xrspatial.geotiff._writer.write / write_streaming / xrspatial.geotiff._reader.read_to_array.
Problem
xrspatial.geotiffexposes a hardened public surface (open_geotiff,to_geotiff,write_geotiff_gpu,write_vrt,read_geotiff_gpu,read_geotiff_dask,read_vrt) but several lower-level functions remain importable and callable directly. These lower-level entry points skip validation that the public wrappers perform, so direct callers can produce subtly different files (or read subtly different DataArrays) without any warning.The lower-level entry points are not prefixed with
_and are not marked private in the module docstring, so static analysis treats them as importable API.Callable lower-level entry points
Confirmed via
grep -n 'def write\|def _write\|def to_geotiff\|def read\|def open'across_writer.py,_writers/*.py, and_reader.py:xrspatial.geotiff._writer.write(_writer.py:1638) — array-level CPU writer thatto_geotiffcalls after coercion.xrspatial.geotiff._writer.write_streaming(_writer.py:1999) — dask streaming writer thatto_geotiffcalls on dask-backed DataArrays.xrspatial.geotiff._reader.read_to_array(_reader.py:3188) — array-level reader thatopen_geotiffcalls and then post-processes.__init__.pyimports it as_read_to_arrayto discourage external use, but the public name on_readerremainsread_to_array.Two helpers in
_writers/eager.pyalready use the underscore convention and are correctly private:_write_single_tile(_writers/eager.py:763)_write_vrt_tiled(_writers/eager.py:832)Validation gaps between
to_geotiffandwrite/write_streamingConcrete checks that
to_geotiffruns (_writers/eager.py:56) butwrite/write_streamingskip:to_geotiffcheckscompression.lower() in _VALID_COMPRESSIONS(_writers/eager.py:312) and surfaces the full list on the error.writecalls_compression_tag(compression)(_writer.py:1744), which raises but without the canonical list, so a typo likecompression='zstandard'produces a less actionable error.to_geotiffrejectscompression='jpeg'unlessallow_internal_only_jpeg=True(_writers/eager.py:330, issue write_geotiff_gpu emits JPEG TIFFs that other readers reject #1845).writeandwrite_streamingaccept JPEG without the gate, so a direct caller can silently produce a file that does not round-trip through libtiff / GDAL / rasterio.max_z_errorcodec pairing.to_geotiffrejects negative values and rejects non-zero values with non-LERC codecs (_writers/eager.py:348).writeaccepts both and passes them to_write_tiled/_write_strippedwhere they are silently ignored on non-LERC codecs.crsargument validation.to_geotiffcalls_validate_crs_arg(crs)(_writers/eager.py:549) to rejectbooland other non-int types.writetakes a rawcrs_epsg: int | Noneand writes whatever it receives — passingcrs_epsg=Trueround-trips as EPSG=1.to_geotiffrewrites NaN pixels to the nodata sentinel viaarr.copy()to avoid mutating the caller's buffer (_writers/eager.py:714).writerequires the caller to have done this already; a direct caller that passes a NaN-containing float array writes NaN bytes to disk, and the resulting file decodes with NaN pixels even where the sentinel tag is set.to_geotiffruns_validate_3d_writer_dims(data.dims)(_writers/eager.py:625, issue to_geotiff: silent data corruption on 3D input with non-whitelisted leading dim #1812) and moves a leadingbanddim to the trailing axis (_writers/eager.py:627,:690).writeassumes the array is already band-last; a direct caller passing(band, y, x)writes a file whoseyaxis carries band data.float16/bool_auto-promotion.to_geotiffpromotesfloat16->float32andbool_->uint8(_writers/eager.py:702).writecallsnumpy_to_tiff_dtypeon the raw dtype; afloat16direct call raises from inside the dtype mapper instead of being handled.to_geotiffcalls_require_transform_for_georeferenced(data, geo_transform)(_writers/eager.py:573, issue geotiff: 1xN / Nx1 georeferenced writes silently strip transform #1945) to refuse silently writing a non-georeferenced TIFF when coords look georeferenced but a transform could not be derived.writeacceptsgeo_transform=Noneand writes an identity-transform file.to_geotiffcalls_validate_crs_fallback(wkt_fallback, allow_unparseable_crs)(_writers/eager.py:734, issue geotiff: CRS resolution should fail closed on unparseable strings by default #1929) to refuse writing an unparseable CRS string toGTCitationGeoKey.writewrites whatevercrs_wktit is handed.Validation gaps between
open_geotiffandread_to_arrayread_to_arrayreturns(np.ndarray, GeoInfo).open_geotiffthen does work the direct caller has to replicate by hand:open_geotiffcalls_validate_read_geo_info(__init__.py:626, issue geotiff: fail closed on ambiguous geospatial metadata (CRS, transform, coords, nodata) #1987) before populating attrs. Direct callers ofread_to_arrayskip this and may consume aGeoInfofor a file that the public surface would reject.open_geotiffmasks the nodata sentinel to NaN, promoting integer arrays tofloat64when the sentinel matches a real pixel (__init__.py:642). Direct callers receive the raw sentinel bytes and have to write the same logic to avoid the same geotiff: int(nan) crash on integer TIFF with GDAL_NODATA="nan" #1774 / MinIsWhite inversion runs before nodata mask, swapping data and nodata pixels #1809 traps.mask_nodataattr.open_geotiffwritesattrs['masked_nodata'](__init__.py:697, issue Bug: attrs['masked_nodata'] reports True when masking was disabled #2092) so the writer can decide whether to reverse the NaN promotion. Direct callers building a DataArray themselves can desync this attr from the buffer state.attrs['transform']andattrs['crs']population.open_geotiffcalls_populate_attrs_from_geo_info(__init__.py:633). Without it, the writer-side bit-stable round-trip path that depends onattrs['transform']does not work.Options
Option A: push validation down
Move the missing checks into
write/write_streaming/read_to_arrayso direct callers cannot skip them.Pros: hardens every entry point in one pass. Direct callers benefit. The public wrappers become thinner.
Cons: some validations need state that only the public layer has (e.g.
data.attrsfor_should_restore_nan_sentinel,data.dimsfor_validate_3d_writer_dims). Pushing those down forces re-shaping the array-level API to accept that state explicitly. Duplicate work when the public wrapper already validated.Option B: formalize private-only
Rename
write->_write,write_streaming->_write_streaming,read_to_array->_read_to_arrayon the source module. Update the module docstring on_writer.pyand_reader.pyto spell out the private contract. Optionally add a runtime check that walks the call stack and warns on calls from outsidexrspatial.geotiff.Pros: cheap, no API reshape, makes the intent explicit to static analysers and code reviewers.
Cons: does not actually fix the footgun for callers who ignore the warning. Breaks any existing external imports (release-note item).
Recommended: hybrid
Push down the checks whose absence produces subtly different files, because these are the silent-divergence cases:
_VALID_COMPRESSIONS.max_z_errorcodec pairing.crs_epsg/crs_wkttype validation (rejectbool, reject unparseable WKT without the opt-in).restore_sentinel, whichwritealready accepts).float16/bool_auto-promotion._writer.py:1740and:2058for issue to_geotiff writes invalid TIFF for zero-height/zero-width arrays instead of failing closed #2075 — use that as the template).Rename the rest to underscore-prefix because they need DataArray state the array-level entry point does not have:
write->_writewrite_streaming->_write_streamingread_to_array->_read_to_array(and update the__init__.pyimport to drop the rename alias)Update module docstrings on
_writer.pyand_reader.pyto say the module is private. The_writers/package already does this (_writers/eager.py:1).Tests
Each lower-level entry point should produce byte-identical output to the public wrapper for matching inputs, or the divergence should be intentional and documented:
test_write_vs_to_geotiff_byte_parity: feedwrite(arr, ...)andto_geotiff(xr.DataArray(arr), ...)the same inputs and assert byte-equal output files for every compression in_VALID_COMPRESSIONS.test_write_streaming_vs_to_geotiff_byte_parity: same idea for dask-backed inputs, comparing the streaming path toto_geotiffwith the same chunk size.test_write_rejects_jpeg_without_opt_in: regression for gap Add Dask Support to Viewshed #2.test_write_rejects_negative_max_z_error: regression for gap Add documentation on edge effects #3.test_write_rejects_bool_crs_epsg: regression for gap Add dask array support #4.test_write_nan_to_sentinel_does_not_mutate_input: regression for gap add curvature #5.test_write_promotes_float16_and_bool: regression for gap add enhanced focal stats #7.test_no_public_lowercase_write_reexportthat importsxrspatial.geotiffand asserts the renamed names are absent from the namespace.Acceptance
writevsto_geotiffandwrite_streamingvsto_geotiff.xrspatial.geotiff._writer.write/write_streaming/xrspatial.geotiff._reader.read_to_array.