Skip to content

perf(fastlanes): fuse bit-packed compare into a transposed mask + unt…

48da899
Select commit
Loading
Failed to load commit list.
Open

perf(fastlanes): fuse bit-packed compare into a transposed mask + untranspose #8239

perf(fastlanes): fuse bit-packed compare into a transposed mask + unt…
48da899
Select commit
Loading
Failed to load commit list.
CodSpeed HQ / CodSpeed Performance Analysis failed Jun 3, 2026 in 0s

26 benchmarks regressed

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 240 improved benchmarks
❌ 26 regressed benchmarks
✅ 1241 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation pushdown_compare[(1000, 16, 4)] 141.6 µs 345.8 µs -59.04%
Simulation pushdown_compare[(1000, 4, 4)] 142.5 µs 345.3 µs -58.74%
Simulation pushdown_compare[(1000, 64, 4)] 142.7 µs 345 µs -58.64%
Simulation pushdown_compare[(1000, 4, 8)] 145.9 µs 349.6 µs -58.27%
Simulation pushdown_compare[(1000, 64, 8)] 148 µs 351.3 µs -57.89%
Simulation pushdown_compare[(1000, 16, 8)] 154.4 µs 357.3 µs -56.78%
Simulation pushdown_compare[(10000, 64, 4)] 214.2 µs 417 µs -48.64%
Simulation pushdown_compare[(10000, 64, 8)] 221.6 µs 424.2 µs -47.75%
Simulation pushdown_compare[(10000, 4, 4)] 221.2 µs 418.1 µs -47.1%
Simulation pushdown_compare[(10000, 16, 4)] 221.4 µs 418.3 µs -47.08%
Simulation pushdown_compare[(10000, 4, 8)] 227.1 µs 423.6 µs -46.39%
Simulation pushdown_compare[(10000, 16, 8)] 263.8 µs 459.6 µs -42.61%
Simulation eq_pushdown_low_match 955.2 µs 1,152.4 µs -17.12%
Simulation eq_pushdown_high_match 1.1 ms 1.2 ms -15.7%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 298.8 µs 350.9 µs -14.84%
Simulation decompress_fsst[(10000, 16, 4)] 509.3 µs 579.9 µs -12.17%
Simulation fsst_decompress_string 3.1 ms 3.5 ms -11.95%
Simulation chunked_into_canonical[(10, 10000, 16, 4)] 5.2 ms 5.9 ms -11.93%
Simulation chunked_canonicalize_into[(10, 10000, 16, 4)] 5.2 ms 5.9 ms -11.89%
Simulation decompress_fsst[(10000, 16, 8)] 561.9 µs 631.9 µs -11.07%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/confident-hamilton-mZIEo (48da899) with claude/confident-hamilton-mZIEo-benches (10939a6)

Open in CodSpeed