geotiff: include GB allocation estimate in max_pixels error message#2554
Conversation
The PixelSafetyLimitError raised by ``_check_dimensions`` and the VRT per-source resample-intermediate guard now report the byte cost of the request alongside the pixel count. Estimate uses 4 bytes/pixel (float32), matching the convention in MAX_PIXELS_DEFAULT's docstring. Existing assertions on "exceed the safety limit" are preserved.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: geotiff: include GB allocation estimate in max_pixels error message
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_layout.py:43._gb_hintdivides by1024 ** 3(GiB), but the docstring at line 31 says "~1 billion pixels, which is ~4 GB for float32 single-band", which is decimal GB (/ 10**9). With this PR, the error forMAX_PIXELS_DEFAULTreads "~3.73 GB at 4 bytes/pixel" rather than the "~4 GB" the docstring promises. Two options:- Switch the divisor to
1000 ** 3so the message lines up with the existing "~4 GB" docstring claim, or - Update the docstring to "~3.73 GiB" so the unit is unambiguous.
The decimal switch is the smaller change. It also matches what users see from
df,du, S3, and most other tools that report sizes in GB. - Switch the divisor to
Nits (optional improvements)
-
xrspatial/geotiff/tests/vrt/test_window.py:454.test_per_source_cap_error_includes_gb_hintaccepts the hint from either the dimension guard or the per-source guard, since the dimension guard fires first at these dimensions. That leaves the per-source-specific message without direct coverage. Either tighten the test to force the per-source path (setmax_pixelsbetween the output dimensions and the sub-window product), or add a second case so a future regression in the per-source string is caught. Low priority because the existing assertion still proves the hint reaches the error.
What looks good
_PIXEL_HINT_BYTES = 4is documented inline, so the assumption is auditable.- Existing
match="exceed the safety limit"assertions are preserved, so no test churn from the message change. - The inline
from ._layout import _gb_hintin_vrt.pymatches the existing pattern in that file (lines 1294 and 1371 do the same with_reader). test_gb_hint_helper_rounds_to_two_decimalspins the format string directly, so any future change to the hint surface is caught.
Checklist
- Algorithm matches reference/paper. N/A, pure error-message change.
- All implemented backends produce consistent results. Shared dimension guard, no backend split.
- NaN handling is correct. N/A.
- Edge cases are covered by tests. Zero-pixel case covered.
- Dask chunk boundaries handled correctly. N/A.
- No premature materialization or unnecessary copies. Error path only.
- Benchmark exists or is not needed. Not needed.
- README feature matrix updated (if applicable). N/A.
- Docstrings present and accurate. See Suggestion above.
Addresses the /review-pr feedback on the initial commit: * Switch _gb_hint to decimal GB (10**9) so the printed number matches the "~4 GB" claim in MAX_PIXELS_DEFAULT's docstring (and what df / du / S3 report). * Plumb the decoded dtype through _check_dimensions and _gb_hint so the byte estimate uses dtype.itemsize. The hint now scales: f32 -> 4 bytes/pixel, f64 -> 8, u8 -> 1, etc. Real call sites in _decode.py and _vrt.py pass the dtype; dtype= remains optional with the float32 fallback for callers without a resolved dtype. * Add monkeypatched test isolating the VRT per-source error path, closing the nit about that string being covered only via the dimension guard's overlap. * New tests assert dtype scaling at the helper, layout-guard, and VRT-source-guard levels.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review pass (commit d159684)
Review dispositions
-
Suggestion (decimal GB vs GiB): Fixed.
_gb_hintnow divides by10**9soMAX_PIXELS_DEFAULTreports~4.00 GB at 4 bytes/pixel, matching the docstring. -
Nit (per-source guard coverage): Fixed.
test_per_source_cap_error_includes_gb_hintnow monkeypatches_check_dimensionsto a no-op so the per-source f-string is asserted in isolation.
Additional change from user follow-up
The user asked that the byte estimate use the actual dtype of the data (so f64 reports 8 bytes/pixel, u8 reports 1, etc.) instead of a fixed float32 assumption.
_gb_hint(pixels, dtype=None)readsdtype.itemsizewhen supplied; the float32 default is now a fallback for callers that have no dtype resolved yet._check_dimensions(width, height, samples, max_pixels, dtype=None)forwardsdtypeto_gb_hint.- Plumbing:
_read_stripsand_read_tilesin_decode.pyalready haddtypein scope; both call sites now pass it.read_vrtselects bands before the dimension check and passesselected_bands[0].dtype. The per-source VRT error usesvrt_band.dtype.
Example outputs
- f64, 50000x50000:
~20.00 GB at 8 bytes/pixelvs limit~8.00 GB at 8 bytes/pixel - u8, 50000x50000:
~2.50 GB at 1 bytes/pixelvs limit~1.00 GB at 1 bytes/pixel - i16, 50000x50000:
~5.00 GB at 2 bytes/pixelvs limit~2.00 GB at 2 bytes/pixel
Tests
- New
test_error_message_uses_passed_dtype_for_gb_hintcovers f64 and u8 dtypes through the layout guard. test_gb_hint_helper_rounds_to_two_decimalsnow also exercises f64, u8, and i16 at the helper.- Full
xrspatial/geotiff/tests/suite: 5990 passed, 81 skipped, 1 xfailed.
The PixelSafetyLimitError message now ends with a 'To read this file,
try one of:' list with three recovery options:
* Pass a larger max_pixels= if the allocation is acceptable.
* Read a sub-region with window=(r0, c0, r1, c1),
e.g. window=(0, 0, 1024, 1024).
* Read lazily in chunks (dask-aware):
- dask installed -> chunks=N where N is sized to fit max_pixels.
- dask missing -> pip / conda install hint plus the same
chunks= value to use once installed.
The chunk-side suggestion is the largest square chunk whose pixel
count fits under max_pixels for the band count, capped at 1024 to
stay near common COG tile sizes (256 / 512 / 1024). For the default
1e9 / 1-band case that is exactly 1024.
Detection uses ``importlib.util.find_spec`` so dask is not imported
just to format an error.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up: actionable recovery hints (commit 8695083)
Per user follow-up, the PixelSafetyLimitError message now ends with a small "To read this file, try one of:" list rather than just naming max_pixels=. The list adapts to whether dask is installed.
Output shape
TIFF image dimensions (50000 x 50000 x 1 = 2,500,000,000 pixels, ~10.00 GB at 4 bytes/pixel) exceed the safety limit of 1,000,000,000 pixels (~4.00 GB at 4 bytes/pixel).
To read this file, try one of:
* Pass a larger max_pixels= if the allocation is acceptable.
* Read a sub-region with window=(r0, c0, r1, c1), e.g. window=(0, 0, 1024, 1024).
* Read lazily in chunks with chunks=1024 so each decoded buffer stays under max_pixels.
When dask is not importable, the third line switches to:
* Install dask (`pip install dask` or `conda install -c conda-forge dask`) to read the file lazily via chunks=1024.
Notes
- Detection uses
importlib.util.find_spec('dask')so dask is not imported just to format an error. _suggest_chunk_sidepicks the largest square chunk whose pixel count fits undermax_pixelsfor the band count, capped at 1024 to stay near common COG tile sizes (256 / 512 / 1024). For the default 1e9 / 1-band case that resolves to 1024.- Multi-band reduces the per-side budget (4 bands at 1M pixels -> 500), so the suggestion stays under the cap.
- Pathological
max_pixels=0orsamples=0inputs fall back to safe values rather than dividing by zero.
Tests
test_error_message_includes_window_suggestion— window= line and concrete example.test_error_message_suggests_chunks_when_dask_installed— monkeypatchedfind_specto force the dask-present branch.test_error_message_recommends_install_when_dask_missing— monkeypatched to force the dask-absent branch.test_suggested_chunk_side_scales_with_max_pixels— exercises the chunksize heuristic at the default budget, a tight budget, a multi-band budget, and pathological inputs.- Full
xrspatial/geotiff/tests/suite: 5994 passed, 81 skipped, 1 xfailed.
#2556) * geotiff: add bbox= to open_geotiff + mention in safety hint (#2555) * ``open_geotiff(..., bbox=(x_min, y_min, x_max, y_max))`` resolves the geographic bbox to a pixel ``window=`` via the existing ``_read_geo_info`` + ``_extent_to_window`` helpers. The resolution is a header-only metadata read; the result is forwarded through the existing backend dispatch, so eager / dask / GPU / VRT all share one surface without learning a new kwarg. * Mutually exclusive with ``window=`` (raises ValueError when both are passed). Requires a georeferenced, axis-aligned source; rotated affines must be cleared via ``allow_rotated=True`` first. * The ``PixelSafetyLimitError`` recovery hint (added in #2554) gains a fourth bullet recommending ``bbox=`` for geographic sub-region reads. * ``bbox`` slots into ``_CANONICAL_ORDER`` immediately after ``window`` so the two sub-region kwargs stay grouped on the public surface. Closes #2555. * Address review nits: NaN, rotated, overview_level, pixel parity (#2555) * Reject non-finite bbox coordinates in _bbox_to_window with a clear "must contain finite coordinates" error. NaN previously slipped past the ordering check and surfaced as a confusing integer-cast failure inside _extent_to_window. * Reorder the rotated-affine vs no-georef guards. allow_rotated=True drops the rotation AND sets has_georef=False, so checking has_georef first hid the more specific rotated-affine message; now the rotated path errors first with guidance to use window= for pixel-space windowing. * Tests: - NaN bbox rejected on each axis (4 cases) and for +/-inf. - Rotated-affine file (built via the existing test_crs writer) rejected with the targeted message after allow_rotated=True. - bbox + overview_level=N round-trip on a COG with 2x/4x reductions confirms each successive level shrinks the bbox slice monotonically, pinning the overview-aware resolution. - Pixel-value parity vs the equivalent slice of the full read so a row/col swap in the resolver would surface.
Summary
PixelSafetyLimitErrornow reports a~X.XX GB at 4 bytes/pixelestimate next to the pixel count, for both the requested allocation and the safety limit.MAX_PIXELS_DEFAULT's docstring. The "at 4 bytes/pixel" suffix spells out the assumption.Closes #2553.
Backend coverage
CPU error-message change only. numpy / cupy / dask+numpy / dask+cupy paths all share the same dimension guard, so no backend-specific handling is needed.
Test plan
test_security.pyassertions on "exceed the safety limit" still pass.test_error_message_includes_gb_estimatecovers the layout-path message.test_gb_hint_helper_rounds_to_two_decimalscovers the helper directly.test_per_source_cap_error_includes_gb_hintcovers the VRT path.xrspatial/geotiff/tests/suite passes (5989 passed, 81 skipped, 1 xfailed).