Use layout file splits when DF re-partitions individual files#7591
Use layout file splits when DF re-partitions individual files#7591
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (3)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.019x ➖ datafusion / vortex-file-compressed (1.019x ➖, 3↑ 6↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.093x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.865x ✅, 7↑ 2↓)
datafusion / parquet (1.068x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.088x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.037x ➖, 0↑ 0↓)
duckdb / parquet (1.083x ➖, 0↑ 3↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.085x ➖, 0↑ 9↓)
datafusion / vortex-compact (1.019x ➖, 0↑ 0↓)
datafusion / parquet (1.004x ➖, 0↑ 0↓)
datafusion / arrow (0.974x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.001x ➖, 0↑ 0↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
duckdb / duckdb (1.002x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.906x ➖, 51↑ 2↓)
datafusion / vortex-compact (0.832x ✅, 83↑ 0↓)
datafusion / parquet (0.872x ✅, 79↑ 0↓)
duckdb / vortex-file-compressed (0.879x ✅, 72↑ 0↓)
duckdb / vortex-compact (0.902x ➖, 47↑ 1↓)
duckdb / parquet (0.913x ➖, 37↑ 0↓)
duckdb / duckdb (0.881x ✅, 66↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.938x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.649x ✅, 4↑ 0↓)
datafusion / parquet (0.824x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.911x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.907x ➖, 0↑ 0↓)
duckdb / parquet (0.973x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.763x ✅, 20↑ 0↓)
datafusion / vortex-compact (0.931x ➖, 8↑ 0↓)
datafusion / parquet (0.798x ✅, 19↑ 0↓)
datafusion / arrow (1.006x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.006x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.005x ➖, 0↑ 0↓)
duckdb / parquet (1.004x ➖, 0↑ 0↓)
duckdb / duckdb (1.000x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.972x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.992x ➖, 0↑ 0↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.119x ➖, 0↑ 5↓)
datafusion / vortex-compact (0.774x ➖, 5↑ 0↓)
datafusion / parquet (0.900x ➖, 4↑ 1↓)
duckdb / vortex-file-compressed (0.913x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.939x ➖, 0↑ 0↓)
duckdb / parquet (0.916x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.111x ❌, 6↑ 16↓)
datafusion / parquet (0.959x ➖, 8↑ 0↓)
duckdb / vortex-file-compressed (0.947x ➖, 7↑ 0↓)
duckdb / parquet (0.986x ➖, 0↑ 1↓)
duckdb / duckdb (0.959x ➖, 3↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
|
@joseph-isaacs do you expect the full benchmarks to behave differently than the SQL ones? |
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.763x ➖, 6↑ 0↓)
datafusion / vortex-compact (0.927x ➖, 1↑ 1↓)
datafusion / parquet (0.919x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.969x ➖, 0↑ 0↓)
duckdb / parquet (0.952x ➖, 0↑ 0↓)
Full attributed analysis
|
|
|
||
| if split_points.first().copied() != Some(0) { | ||
| split_points.insert(0, 0); | ||
| } |
There was a problem hiding this comment.
i thought this was always the case
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
7faa281 to
633f8d1
Compare
## Summary Try and re-capture some of the performance we lost in #7591, only doing the extra work when its actually required. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Summary
Instead of just splitting files arbitrarily, align it with split layouts to make better use of Vortex's internal pruning and other behaviors.