perf(geotiff): vectorise unpack_bits for bps=2/4/12 (#1713)#1721
Merged
brendancol merged 1 commit intoMay 12, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves GeoTIFF decode performance by vectorizing the unpack_bits sub-byte unpacking logic in xrspatial/geotiff/_compression.py for BitsPerSample values 2, 4, and 12, replacing Python per-byte loops with NumPy strided operations while preserving the previous “written positions only” semantics for short input buffers.
Changes:
- Vectorized
unpack_bitsimplementations forbps=2,bps=4, andbps=12to eliminate Python-loop overhead. - Added a regression/compatibility test suite that compares the new implementation against an in-test reference copy of the original loop behavior (including short-buffer corner cases).
- Updated the internal performance sweep state record to reflect the audit finding and fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/_compression.py |
Replaces loop-based sub-byte unpacking for bps=2/4/12 with vectorized NumPy logic while preserving prior write/garbage semantics. |
xrspatial/geotiff/tests/test_unpack_bits_vectorised_1713.py |
Adds parametrized equivalence tests vs an inlined reference implementation plus boundary and error-path spot checks. |
.claude/sweep-performance-state.csv |
Records the audit finding (#1713) and its resolution in the performance sweep tracking file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Sub-byte pixel decoding for BitsPerSample in {2, 4, 12} was running
Python loops over packed bytes, which was 100-200x slower than the
numpy-strided equivalent on realistic tile sizes (165 ms vs 3 ms for
2M bps=4 pixels on this machine).
Vectorise the three branches with masked-and-shifted slices. The
written-position contract is preserved bit for bit: positions covered
by the input buffer match the prior loop, and positions past the
buffer are left as np.empty initial values exactly as before.
Includes the deep-sweep state CSV update for Pass 7 of geotiff.
fc3d43d to
fe3450b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Vectorises
unpack_bitsinxrspatial/geotiff/_compression.pyfor thethree sub-byte branches that previously walked the packed input byte by
byte in Python (
bps=2,bps=4,bps=12). Thebps=1branch alreadycalled
np.unpackbitsand is untouched.Microbench on one thread (bps=4, 2M pixels):
bps=2 and bps=12 show similar speedups on equivalent input sizes.
The vectorised implementation preserves the original loop's
written-position contract bit for bit: positions covered by the input
buffer match the prior loop, and positions past the input are left at
their
np.emptyinitial values exactly as before.Closes #1713.
Test plan
xrspatial/geotiff/tests/test_unpack_bits_vectorised_1713.pycompare against an in-line reference implementation acrossbps in {2, 4, 12},pixel_count in {0, 1, 2, 3, 4, 7, 8, 100, 10000}, anddata_factor in {0.0, 0.5, 1.0, 1.5, 2.0}. 135 cases, all pass.bps=12with exactly 3 input bytes decodes 1 pair (the original strict-less-than guard).bps=1byte-pattern still returns[1,0,1,0,1,1,0,0,0,0,0,0,1,1,1,1]for0b10101100, 0b00001111.xrspatial/geotiff/tests/test_compression.py,test_features.py,test_reader.py,test_jpeg.py,test_decompression_caps.pypass (test_features.py::TestPalettedeselected; matplotlib/py3.14 recursion, pre-existing).