encodings/fsst: add OptimizedDecompressor for throughput by joseph-isaacs · Pull Request #7485 · vortex-data/vortex

joseph-isaacs · 2026-04-16T15:37:41Z

Summary

Adds OptimizedDecompressor in encodings/fsst/src/decompressor.rs, replacing the default fsst-rs decompressor in the canonicalization path
Processes compressed codes in 32-code blocks (4×8-byte chunks) using SWAR escape detection — checks all 8 bytes of a u64 for 0xFF in a single branchless operation
Pre-converts the symbol table to u64 at construction time (~2.3KB total, fits in L1 cache), eliminating Symbol::to_u64() per lookup
Re-enters the 32-code fast path after each escape rather than permanently dropping to scalar processing
Runtime dispatch to a BMI1/BMI2/POPCNT-optimized codepath on x86-64

Test plan

Existing unit tests in decompressor.rs cover: basic decompression, escape codes, empty input, matches baseline on random strings, matches baseline on full-byte-range inputs with many escapes, large corpus correctness
cargo test -p vortex-fsst
cargo bench -p vortex-fsst --bench fsst_compress -- decompress_fsst to observe throughput improvement

🤖 Generated with Claude Code

Replaces the default fsst-rs decompressor with a hand-tuned version that processes compressed codes in 32-code blocks using SWAR escape detection, pre-converts symbol table entries to u64 to eliminate per-lookup conversions, and uses runtime BMI1/BMI2/POPCNT dispatch on x86-64. Signed-off-by: Joe Isaacs <joe@spiraldb.com>

codspeed-hq · 2026-04-16T15:42:08Z

Merging this PR will improve performance by 19.89%

⚡ 9 improved benchmarks
✅ 1154 untouched benchmarks
⏩ 1457 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`take_map[(0.1, 0.5)]`	1,154.5 µs	990.4 µs	+16.57%
⚡	Simulation	`take_map[(0.1, 1.0)]`	2 ms	1.7 ms	+19.08%
⚡	Simulation	`patched_take_10k_contiguous_not_patches`	258.4 µs	228.1 µs	+13.29%
⚡	Simulation	`patched_take_10k_first_chunk_only`	302 µs	271.8 µs	+11.14%
⚡	Simulation	`patched_take_10k_dispersed`	316 µs	285.8 µs	+10.58%
⚡	Simulation	`patched_take_10k_random`	270.3 µs	240 µs	+12.64%
⚡	Simulation	`patched_take_10k_contiguous_patches`	258.1 µs	227.7 µs	+13.32%
⚡	Simulation	`take_10k_first_chunk_only`	270.6 µs	225.7 µs	+19.89%
⚡	Simulation	`take_10k_dispersed`	284.4 µs	239.5 µs	+18.76%

_{Comparing ji/fsst-optimized-decompressor (c1f032a) with develop (12f63a4)}

1457 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

robert3005 · 2026-04-16T15:42:30Z

if this ends up being a perf win we should migrate this to fsst-rs imho

joseph-isaacs closed this Apr 16, 2026

joseph-isaacs reopened this Apr 16, 2026

joseph-isaacs added the do not merge Pull requests that are not intended to merge label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encodings/fsst: add OptimizedDecompressor for throughput#7485

encodings/fsst: add OptimizedDecompressor for throughput#7485
joseph-isaacs wants to merge 1 commit intodevelopfrom
ji/fsst-optimized-decompressor

joseph-isaacs commented Apr 16, 2026

Uh oh!

codspeed-hq bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

robert3005 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented Apr 16, 2026

Summary

Test plan

Uh oh!

codspeed-hq bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 19.89%

Performance Changes

Footnotes

Uh oh!

robert3005 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Apr 16, 2026 •

edited

Loading