Use Vec<u64> instead of BTreeSet for splits#8194
Conversation
853709e to
9216741
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (3)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.014x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.014x ➖, 1↑ 1↓)
File Size Changes (351 files changed, -98.0% overall, 1↑ 350↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.031x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.977x ➖, 1↑ 0↓)
datafusion / parquet (0.982x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.982x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.000x ➖, 0↑ 1↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
File Size Changes (347 files changed, -91.1% overall, 0↑ 347↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.011x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.002x ➖, 0↑ 0↓)
datafusion / parquet (1.008x ➖, 1↑ 2↓)
datafusion / arrow (1.015x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.012x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.024x ➖, 0↑ 0↓)
duckdb / parquet (1.003x ➖, 1↑ 1↓)
duckdb / duckdb (1.021x ➖, 0↑ 0↓)
File Size Changes (331 files changed, -98.7% overall, 0↑ 331↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.981x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.987x ➖, 2↑ 1↓)
datafusion / parquet (1.006x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (0.995x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.991x ➖, 3↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 1↓)
duckdb / duckdb (0.997x ➖, 1↑ 3↓)
File Size Changes (301 files changed, -98.6% overall, 0↑ 301↓)
Totals:
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.023x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.041x ➖, 0↑ 1↓)
datafusion / parquet (1.032x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.023x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.071x ➖, 0↑ 0↓)
duckdb / parquet (1.078x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.001x ➖, 0↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
File Size Changes (347 files changed, -91.5% overall, 0↑ 347↓)
Totals:
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.905x ➖ How to read Verdict and Engines
unknown / unknown (0.978x ➖, 9↑ 1↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.854x ✅, 21↑ 0↓)
datafusion / vortex-compact (0.850x ✅, 22↑ 0↓)
datafusion / parquet (0.879x ✅, 16↑ 0↓)
datafusion / arrow (0.850x ✅, 21↑ 0↓)
duckdb / vortex-file-compressed (0.879x ✅, 18↑ 0↓)
duckdb / vortex-compact (0.888x ✅, 15↑ 0↓)
duckdb / parquet (0.934x ➖, 2↑ 0↓)
duckdb / duckdb (0.906x ➖, 7↑ 0↓)
File Size Changes (301 files changed, -86.3% overall, 0↑ 301↓)
Totals:
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.080x ➖, 2↑ 25↓)
datafusion / parquet (1.064x ➖, 0↑ 11↓)
duckdb / vortex-file-compressed (1.011x ➖, 7↑ 3↓)
duckdb / parquet (1.031x ➖, 1↑ 1↓)
duckdb / duckdb (1.034x ➖, 0↑ 1↓)
File Size Changes (348 files changed, -38.1% overall, 102↑ 246↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.077x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.121x ➖, 0↑ 5↓)
datafusion / parquet (1.024x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.999x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.959x ➖, 0↑ 0↓)
duckdb / parquet (1.019x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.918x ➖, 3↑ 0↓)
datafusion / parquet (0.922x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.978x ➖, 0↑ 0↓)
duckdb / parquet (0.959x ➖, 0↑ 0↓)
duckdb / duckdb (0.963x ➖, 0↑ 0↓)
File Size Changes (344 files changed, -97.7% overall, 12↑ 332↓)
Totals:
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.003x ➖ How to read Verdict and Engines
unknown / unknown (1.016x ➖, 0↑ 10↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.013x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.931x ➖, 1↑ 1↓)
datafusion / parquet (1.090x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.925x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.913x ➖, 0↑ 0↓)
duckdb / parquet (0.900x ➖, 0↑ 0↓)
Full attributed analysis
|
9216741 to
4381d5d
Compare
Signed-off-by: Mikhail Kot <mikhail@spiraldb.com>
4381d5d to
5bcffde
Compare
Co-authored-by: Joe Isaacs <joe.isaacs@live.co.uk> Signed-off-by: Mikhail Kot <to@myrrc.dev>
| pub struct RowSplits(Vec<u64>); | ||
|
|
||
| impl RowSplits { | ||
| /// Add row to splits | ||
| pub fn push(&mut self, row: u64) { | ||
| self.0.push(row); | ||
| } | ||
|
|
||
| /// Reserve space for "additional" elements | ||
| pub fn reserve(&mut self, additional: usize) { |
There was a problem hiding this comment.
can you put this in its own file?
Most usages of Btreesets for split is iteration, which is better for Vec, since the number of splits is usually under 100. This has biggest impact on random access.
Extract Vec to a separate type for further changes not being API breaks.
Unrelated change: don't append chunked reader ids with format!() to save 5% CPU
time