Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
|
I think you want this idea: vortex/vortex-array/src/arrays/dict/array.rs Line 144 in 74d803b |
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com> # Conflicts: # vortex-array/src/arrays/dict/vtable/mod.rs # vortex-duckdb/src/exporter/dict.rs
Merging this PR will improve performance by 24.89%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | bench_many_codes_few_values[1024] |
397.5 µs | 325.1 µs | +22.27% |
| ⚡ | Simulation | bench_many_codes_few_values[2048] |
368.5 µs | 295.9 µs | +24.55% |
| ⚡ | Simulation | bench_many_codes_few_values[4096] |
374.3 µs | 301.7 µs | +24.08% |
| ⚡ | Simulation | bench_many_nulls[0.5] |
361.8 µs | 317.5 µs | +13.96% |
| ⚡ | Simulation | bench_many_nulls[0.9] |
537.1 µs | 456.3 µs | +17.72% |
| ⚡ | Simulation | bench_sparse_coverage[0.01] |
366.4 µs | 293.7 µs | +24.75% |
| ⚡ | Simulation | bench_sparse_coverage[0.1] |
364.8 µs | 292.1 µs | +24.89% |
| ⚡ | Simulation | bench_sparse_coverage[0.5] |
364.8 µs | 292.1 µs | +24.89% |
Comparing ngates/sparse-dict (22be09b) with develop (f3d5f09)
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.005x ➖ datafusion / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.010x ➖, 0↑ 0↓)
datafusion / parquet (0.982x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.040x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.008x ➖, 0↑ 0↓)
duckdb / parquet (0.987x ➖, 1↑ 1↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.010x ➖, 0↑ 1↓)
datafusion / parquet (0.998x ➖, 2↑ 1↓)
datafusion / arrow (1.018x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.141x ❌, 0↑ 16↓)
duckdb / vortex-compact (1.067x ➖, 0↑ 1↓)
duckdb / parquet (1.098x ➖, 0↑ 12↓)
duckdb / duckdb (1.035x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.030x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.030x ➖, 2↑ 4↓)
datafusion / parquet (1.020x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.067x ➖, 0↑ 26↓)
duckdb / vortex-compact (1.093x ➖, 0↑ 40↓)
duckdb / parquet (1.019x ➖, 1↑ 5↓)
duckdb / duckdb (1.059x ➖, 0↑ 18↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.021x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.045x ➖, 0↑ 0↓)
datafusion / parquet (0.984x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.037x ➖, 0↑ 0↓)
duckdb / parquet (1.017x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.979x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.979x ➖, 1↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.011x ➖, 0↑ 0↓)
datafusion / parquet (0.965x ➖, 0↑ 0↓)
datafusion / arrow (0.971x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.986x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.994x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
duckdb / duckdb (0.992x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.019x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.989x ➖, 1↑ 1↓)
datafusion / parquet (1.115x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.056x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.090x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.075x ➖, 0↑ 13↓)
datafusion / parquet (1.069x ➖, 0↑ 10↓)
duckdb / vortex-file-compressed (1.121x ❌, 0↑ 26↓)
duckdb / parquet (1.038x ➖, 0↑ 3↓)
duckdb / duckdb (1.045x ➖, 0↑ 3↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.100x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.011x ➖, 1↑ 0↓)
datafusion / parquet (1.020x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.016x ➖, 0↑ 0↓)
duckdb / parquet (1.099x ➖, 0↑ 0↓)
Full attributed analysis
|
When the number of unique codes is much much smaller than the number of values, we should collect unique codes, filter the values, and remap the codes into the smaller dictionary's value space.
I don't expect this PR to immediately have any impact. But I have some other changes that propagate filter masks deeper into the tree so they actually hit the DictLayout