perf(fastlanes): fuse bit-packed compare into a transposed mask + untranspose #8239
+182
−12
CodSpeed HQ / CodSpeed Performance Analysis
failed
Jun 3, 2026 in 0s
26 benchmarks regressed
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.
⚡ 240 improved benchmarks
❌ 26 regressed benchmarks
✅ 1241 untouched benchmarks
Warning
Please fix the performance issues or acknowledge them on CodSpeed.
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | pushdown_compare[(1000, 16, 4)] |
141.6 µs | 345.8 µs | -59.04% |
| ❌ | Simulation | pushdown_compare[(1000, 4, 4)] |
142.5 µs | 345.3 µs | -58.74% |
| ❌ | Simulation | pushdown_compare[(1000, 64, 4)] |
142.7 µs | 345 µs | -58.64% |
| ❌ | Simulation | pushdown_compare[(1000, 4, 8)] |
145.9 µs | 349.6 µs | -58.27% |
| ❌ | Simulation | pushdown_compare[(1000, 64, 8)] |
148 µs | 351.3 µs | -57.89% |
| ❌ | Simulation | pushdown_compare[(1000, 16, 8)] |
154.4 µs | 357.3 µs | -56.78% |
| ❌ | Simulation | pushdown_compare[(10000, 64, 4)] |
214.2 µs | 417 µs | -48.64% |
| ❌ | Simulation | pushdown_compare[(10000, 64, 8)] |
221.6 µs | 424.2 µs | -47.75% |
| ❌ | Simulation | pushdown_compare[(10000, 4, 4)] |
221.2 µs | 418.1 µs | -47.1% |
| ❌ | Simulation | pushdown_compare[(10000, 16, 4)] |
221.4 µs | 418.3 µs | -47.08% |
| ❌ | Simulation | pushdown_compare[(10000, 4, 8)] |
227.1 µs | 423.6 µs | -46.39% |
| ❌ | Simulation | pushdown_compare[(10000, 16, 8)] |
263.8 µs | 459.6 µs | -42.61% |
| ❌ | Simulation | eq_pushdown_low_match |
955.2 µs | 1,152.4 µs | -17.12% |
| ❌ | Simulation | eq_pushdown_high_match |
1.1 ms | 1.2 ms | -15.7% |
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
298.8 µs | 350.9 µs | -14.84% |
| ❌ | Simulation | decompress_fsst[(10000, 16, 4)] |
509.3 µs | 579.9 µs | -12.17% |
| ❌ | Simulation | fsst_decompress_string |
3.1 ms | 3.5 ms | -11.95% |
| ❌ | Simulation | chunked_into_canonical[(10, 10000, 16, 4)] |
5.2 ms | 5.9 ms | -11.93% |
| ❌ | Simulation | chunked_canonicalize_into[(10, 10000, 16, 4)] |
5.2 ms | 5.9 ms | -11.89% |
| ❌ | Simulation | decompress_fsst[(10000, 16, 8)] |
561.9 µs | 631.9 µs | -11.07% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/confident-hamilton-mZIEo (48da899) with claude/confident-hamilton-mZIEo-benches (10939a6)
Loading