Introduce basic binary compressor schemes and refactor some other bits#8153
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.013x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.013x ➖, 1↑ 2↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.000x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.991x ➖, 0↑ 0↓)
datafusion / parquet (0.957x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.937x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.967x ➖, 0↑ 0↓)
duckdb / parquet (0.953x ➖, 1↑ 0↓)
No file size changes detected. Full attributed analysis
|
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.1 µs | 307.8 µs | -11.27% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.4 µs | 188.1 µs | +19.84% |
| ❌ | Simulation | encode_varbin[(1000, 32)] |
147.3 µs | 164.8 µs | -10.63% |
| ❌ | Simulation | encode_varbin[(1000, 2)] |
140.5 µs | 157 µs | -10.51% |
| ❌ | Simulation | encode_varbin[(1000, 4)] |
141.5 µs | 158.4 µs | -10.71% |
| ❌ | Simulation | encode_varbin[(1000, 8)] |
142.3 µs | 158.6 µs | -10.3% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/binary-compressor (f105d74) with develop (7e8f23a)
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.012x ➖, 0↑ 0↓)
datafusion / parquet (1.000x ➖, 2↑ 2↓)
datafusion / arrow (1.003x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.989x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.006x ➖, 0↑ 0↓)
duckdb / parquet (1.032x ➖, 0↑ 4↓)
duckdb / duckdb (1.001x ➖, 0↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.005x ➖, 1↑ 3↓)
datafusion / vortex-compact (1.005x ➖, 0↑ 3↓)
datafusion / parquet (0.989x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.996x ➖, 1↑ 4↓)
duckdb / vortex-compact (0.996x ➖, 2↑ 0↓)
duckdb / parquet (1.000x ➖, 2↑ 0↓)
duckdb / duckdb (0.993x ➖, 1↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.837x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.962x ➖, 1↑ 0↓)
datafusion / parquet (0.912x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.910x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.973x ➖, 0↑ 0↓)
duckdb / parquet (0.982x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.001x ➖, 0↑ 0↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.937x ➖ How to read Verdict and Engines
unknown / unknown (0.999x ➖, 4↑ 1↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.997x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.996x ➖, 0↑ 0↓)
datafusion / parquet (0.993x ➖, 0↑ 0↓)
datafusion / arrow (0.987x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.993x ➖, 0↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 1↓)
duckdb / duckdb (0.991x ➖, 0↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.066x ➖, 1↑ 11↓)
datafusion / parquet (1.062x ➖, 0↑ 7↓)
duckdb / vortex-file-compressed (1.083x ➖, 1↑ 11↓)
duckdb / parquet (1.031x ➖, 0↑ 1↓)
duckdb / duckdb (0.997x ➖, 2↑ 1↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
Full attributed analysis
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.997x ➖, 0↑ 0↓)
datafusion / parquet (0.987x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
duckdb / duckdb (0.992x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.802x ➖, 7↑ 0↓)
datafusion / vortex-compact (0.846x ➖, 2↑ 1↓)
datafusion / parquet (0.832x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (1.048x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.970x ➖, 0↑ 1↓)
duckdb / parquet (1.012x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 0.996x ➖ How to read Verdict and Engines
unknown / unknown (0.994x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.060x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.114x ➖, 0↑ 4↓)
datafusion / parquet (0.986x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 1↓)
Full attributed analysis
|
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
f7b2b1c to
f105d74
Compare
Summary
The important piece of this PR is the new
BinaryDictSchemeandBinaryConstantScheme, it also includes a bunch of simplifications in the compressor, including moving types into the file that actually includes their functionality, and removes some unnecessary utility functions.