Skip to content

perf(geotiff): vectorise unpack_bits for bps=2/4/12 (#1713)#1721

Merged
brendancol merged 1 commit into
mainfrom
deep-sweep-performance-geotiff-2026-05-12-aa0e4ef9
May 12, 2026
Merged

perf(geotiff): vectorise unpack_bits for bps=2/4/12 (#1713)#1721
brendancol merged 1 commit into
mainfrom
deep-sweep-performance-geotiff-2026-05-12-aa0e4ef9

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Vectorises unpack_bits in xrspatial/geotiff/_compression.py for the
three sub-byte branches that previously walked the packed input byte by
byte in Python (bps=2, bps=4, bps=12). The bps=1 branch already
called np.unpackbits and is untouched.

Microbench on one thread (bps=4, 2M pixels):

  • before: ~165 ms
  • after: ~3 ms (~55x)

bps=2 and bps=12 show similar speedups on equivalent input sizes.

The vectorised implementation preserves the original loop's
written-position contract bit for bit: positions covered by the input
buffer match the prior loop, and positions past the input are left at
their np.empty initial values exactly as before.

Closes #1713.

Test plan

  • New parametrised tests in xrspatial/geotiff/tests/test_unpack_bits_vectorised_1713.py compare against an in-line reference implementation across bps in {2, 4, 12}, pixel_count in {0, 1, 2, 3, 4, 7, 8, 100, 10000}, and data_factor in {0.0, 0.5, 1.0, 1.5, 2.0}. 135 cases, all pass.
  • Boundary case: bps=12 with exactly 3 input bytes decodes 1 pair (the original strict-less-than guard).
  • bps=1 byte-pattern still returns [1,0,1,0,1,1,0,0,0,0,0,0,1,1,1,1] for 0b10101100, 0b00001111.
  • Existing xrspatial/geotiff/tests/test_compression.py, test_features.py, test_reader.py, test_jpeg.py, test_decompression_caps.py pass (test_features.py::TestPalette deselected; matplotlib/py3.14 recursion, pre-existing).

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 19:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves GeoTIFF decode performance by vectorizing the unpack_bits sub-byte unpacking logic in xrspatial/geotiff/_compression.py for BitsPerSample values 2, 4, and 12, replacing Python per-byte loops with NumPy strided operations while preserving the previous “written positions only” semantics for short input buffers.

Changes:

  • Vectorized unpack_bits implementations for bps=2, bps=4, and bps=12 to eliminate Python-loop overhead.
  • Added a regression/compatibility test suite that compares the new implementation against an in-test reference copy of the original loop behavior (including short-buffer corner cases).
  • Updated the internal performance sweep state record to reflect the audit finding and fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_compression.py Replaces loop-based sub-byte unpacking for bps=2/4/12 with vectorized NumPy logic while preserving prior write/garbage semantics.
xrspatial/geotiff/tests/test_unpack_bits_vectorised_1713.py Adds parametrized equivalence tests vs an inlined reference implementation plus boundary and error-path spot checks.
.claude/sweep-performance-state.csv Records the audit finding (#1713) and its resolution in the performance sweep tracking file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sub-byte pixel decoding for BitsPerSample in {2, 4, 12} was running
Python loops over packed bytes, which was 100-200x slower than the
numpy-strided equivalent on realistic tile sizes (165 ms vs 3 ms for
2M bps=4 pixels on this machine).

Vectorise the three branches with masked-and-shifted slices. The
written-position contract is preserved bit for bit: positions covered
by the input buffer match the prior loop, and positions past the
buffer are left as np.empty initial values exactly as before.

Includes the deep-sweep state CSV update for Pass 7 of geotiff.
@brendancol brendancol force-pushed the deep-sweep-performance-geotiff-2026-05-12-aa0e4ef9 branch from fc3d43d to fe3450b Compare May 12, 2026 19:29
@brendancol brendancol merged commit 1ab1d0e into main May 12, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(geotiff): vectorise sub-byte unpack_bits for bps=2/4/12

2 participants