Thread scope dtype through stats rewrites #7958
Performance Regression: -14.75%
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.
⚠️ Different runtime environments detected
Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.
⚡ 10 improved benchmarks
❌ 69 regressed benchmarks
✅ 1137 untouched benchmarks
⏩ 5 skipped benchmarks1
Warning
Please fix the performance issues or acknowledge them on CodSpeed.
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
299.5 µs | 353.9 µs | -15.36% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
308.1 µs | 400.7 µs | -23.11% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
197.9 µs | 284.7 µs | -30.49% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(10, 1000)] |
1.9 ms | 2.2 ms | -12.45% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
358.6 µs | 461.6 µs | -22.31% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
212.1 µs | 300.6 µs | -29.44% |
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(100, 100)] |
411.4 µs | 501.5 µs | -17.96% |
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
188.2 µs | 307.3 µs | -38.78% |
| ❌ | Simulation | chunked_varbinview_opt_into_canonical[(100, 100)] |
467.2 µs | 565.2 µs | -17.34% |
| ❌ | Simulation | chunked_varbinview_opt_into_canonical[(1000, 10)] |
240.4 µs | 324.7 µs | -25.95% |
| ❌ | Simulation | encode_primitives[u8, (10000, 2)] |
313.9 µs | 358.6 µs | -12.45% |
| ❌ | Simulation | encode_primitives[u8, (10000, 32)] |
318.4 µs | 360.8 µs | -11.75% |
| ❌ | Simulation | encode_primitives[u8, (10000, 4)] |
314.3 µs | 358 µs | -12.2% |
| ❌ | Simulation | encode_primitives[u8, (10000, 512)] |
335.2 µs | 377.1 µs | -11.11% |
| ❌ | Simulation | encode_primitives[u8, (10000, 8)] |
315.2 µs | 358.1 µs | -11.98% |
| ❌ | Simulation | varbinview_large |
130.1 µs | 174.5 µs | -25.46% |
| ❌ | Simulation | execute_scalar_struct_simple |
407.6 µs | 464.3 µs | -12.2% |
| ❌ | Simulation | binary_search_vortex |
485 ns | 727.2 ns | -33.31% |
| ❌ | Simulation | take_search[(0.005, 0.05)] |
131 µs | 168.5 µs | -22.26% |
| ❌ | Simulation | take_search[(0.005, 0.1)] |
246.6 µs | 320.5 µs | -23.07% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/stats-7707/typed-stats-rewrite-api (dd555d9) with ngates/stats-7707/min-max-aggregate-fns (30b42c6)
Footnotes
-
5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩