Skip to content

encodings/fsst: add OptimizedDecompressor for throughput#7485

Open
joseph-isaacs wants to merge 1 commit intodevelopfrom
ji/fsst-optimized-decompressor
Open

encodings/fsst: add OptimizedDecompressor for throughput#7485
joseph-isaacs wants to merge 1 commit intodevelopfrom
ji/fsst-optimized-decompressor

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

Summary

  • Adds OptimizedDecompressor in encodings/fsst/src/decompressor.rs, replacing the default fsst-rs decompressor in the canonicalization path
  • Processes compressed codes in 32-code blocks (4×8-byte chunks) using SWAR escape detection — checks all 8 bytes of a u64 for 0xFF in a single branchless operation
  • Pre-converts the symbol table to u64 at construction time (~2.3KB total, fits in L1 cache), eliminating Symbol::to_u64() per lookup
  • Re-enters the 32-code fast path after each escape rather than permanently dropping to scalar processing
  • Runtime dispatch to a BMI1/BMI2/POPCNT-optimized codepath on x86-64

Test plan

  • Existing unit tests in decompressor.rs cover: basic decompression, escape codes, empty input, matches baseline on random strings, matches baseline on full-byte-range inputs with many escapes, large corpus correctness
  • cargo test -p vortex-fsst
  • cargo bench -p vortex-fsst --bench fsst_compress -- decompress_fsst to observe throughput improvement

🤖 Generated with Claude Code

Replaces the default fsst-rs decompressor with a hand-tuned version that
processes compressed codes in 32-code blocks using SWAR escape detection,
pre-converts symbol table entries to u64 to eliminate per-lookup conversions,
and uses runtime BMI1/BMI2/POPCNT dispatch on x86-64.

Signed-off-by: Joe Isaacs <joe@spiraldb.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 16, 2026

Merging this PR will improve performance by 19.89%

⚡ 9 improved benchmarks
✅ 1154 untouched benchmarks
⏩ 1457 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.1, 0.5)] 1,154.5 µs 990.4 µs +16.57%
Simulation take_map[(0.1, 1.0)] 2 ms 1.7 ms +19.08%
Simulation patched_take_10k_contiguous_not_patches 258.4 µs 228.1 µs +13.29%
Simulation patched_take_10k_first_chunk_only 302 µs 271.8 µs +11.14%
Simulation patched_take_10k_dispersed 316 µs 285.8 µs +10.58%
Simulation patched_take_10k_random 270.3 µs 240 µs +12.64%
Simulation patched_take_10k_contiguous_patches 258.1 µs 227.7 µs +13.32%
Simulation take_10k_first_chunk_only 270.6 µs 225.7 µs +19.89%
Simulation take_10k_dispersed 284.4 µs 239.5 µs +18.76%

Comparing ji/fsst-optimized-decompressor (c1f032a) with develop (12f63a4)

Open in CodSpeed

Footnotes

  1. 1457 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@robert3005
Copy link
Copy Markdown
Contributor

if this ends up being a perf win we should migrate this to fsst-rs imho

@joseph-isaacs joseph-isaacs reopened this Apr 16, 2026
@joseph-isaacs joseph-isaacs added the do not merge Pull requests that are not intended to merge label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Pull requests that are not intended to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants