Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (6)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.002x ➖ datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.998x ➖, 1↑ 0↓)
datafusion / parquet (1.003x ➖, 1↑ 0↓)
datafusion / arrow (1.002x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.053x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.037x ➖, 0↑ 1↓)
duckdb / parquet (1.009x ➖, 3↑ 2↓)
duckdb / duckdb (1.019x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.988x ➖, 1↑ 1↓)
datafusion / vortex-compact (0.993x ➖, 0↑ 0↓)
datafusion / parquet (0.981x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.975x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.028x ➖, 0↑ 1↓)
duckdb / parquet (1.012x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.930x ➖, 30↑ 0↓)
datafusion / vortex-compact (0.975x ➖, 6↑ 0↓)
datafusion / parquet (0.994x ➖, 2↑ 1↓)
duckdb / vortex-file-compressed (0.928x ➖, 23↑ 0↓)
duckdb / vortex-compact (0.920x ➖, 31↑ 0↓)
duckdb / parquet (0.934x ➖, 16↑ 0↓)
duckdb / duckdb (0.914x ➖, 34↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.903x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.762x ➖, 6↑ 1↓)
datafusion / parquet (0.878x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (0.880x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.867x ➖, 1↑ 0↓)
duckdb / parquet (0.882x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.014x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.005x ➖, 0↑ 0↓)
datafusion / parquet (0.995x ➖, 0↑ 0↓)
datafusion / arrow (1.006x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.888x ✅, 16↑ 0↓)
duckdb / vortex-compact (0.924x ➖, 10↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 1↓)
duckdb / duckdb (0.960x ➖, 3↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.886x ✅ unknown / unknown (0.960x ➖, 7↑ 1↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.010x ➖, 0↑ 2↓)
duckdb / vortex-compact (0.997x ➖, 0↑ 0↓)
duckdb / parquet (0.984x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.818x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.824x ➖, 2↑ 0↓)
datafusion / parquet (0.836x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (0.915x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.922x ➖, 1↑ 0↓)
duckdb / parquet (0.929x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.933x ➖, 17↑ 0↓)
datafusion / parquet (0.981x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (1.011x ➖, 3↑ 2↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
duckdb / duckdb (0.995x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.013x ➖ unknown / unknown (1.009x ➖, 0↑ 3↓)
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.772x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.848x ➖, 2↑ 0↓)
datafusion / parquet (0.892x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.025x ➖, 1↑ 1↓)
duckdb / vortex-compact (0.960x ➖, 0↑ 0↓)
duckdb / parquet (0.905x ➖, 0↑ 0↓)
Full attributed analysis
|
bfb7f6c to
2f94e47
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
d45d3ee to
187e742
Compare
187e742 to
d25093e
Compare
1600b75 to
ff9e7bc
Compare
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com> clean up Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com> clean up Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
311ade1 to
682de48
Compare
682de48 to
93bffa7
Compare
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
9c98ea4 to
084fba4
Compare
Summary
Tracking Issue: #6872
The
vortex-btrblockscompressor currently depends on every encoding crate in the workspace, and extension types (Vector, UUID, Tensor, JSON) have no mechanism for type-specific compression.This PR introduces a new
vortex-compressorcrate that extracts the encoding-agnostic compression framework, inverting the dependency graph so that encoding crates can implement a singleSchemetrait and register themselves with the compressor. Additionally,vortex-btrblocksremains the batteries-included assembler, and depends onvortex-compressor.Basically, the entire compressor was rewritten. Below are the major changes, but there were a lot of other things changed that may not be as important but did warrant being included in this PR.
Theoretically, there was a way to hack pluggablity into the existing compressor without a complete rewrite, but I determined that it would not provide the level of expressiveness needed to fully support extension types and encodings as a first-class citizen. I could be wrong, and this was all a waste of time, but also I found a lot of strange things in the existing compressor that didn't make a lot of sense that are eliminated in this new compressor.
Changes
Schemetrait replaces the old type-specificIntegerScheme/FloatScheme/StringSchemetraits andIntCode/FloatCode/StringCodeenums. Schemes are identified by opaqueSchemeId(obtained only viaSchemeExt::id()). The oldCompressor/CompressorExt/CanonicalCompressortraits andIntCompressor/FloatCompressor/StringCompressorstructs are replaced by aCascadingCompressorthat selects from a vec of&'static dyn Scheme.vortex-compressorcrate contains the framework (trait definitions, cascading compressor, stats, sampling) with zero encoding dependencies (other than built-in ones fromvortex-array).ArrayAndStatsbundle replaces the old pattern of passing arrays and stats caches separately. Stats are generated lazily on first access via typed methods (integer_stats(),float_stats(),string_stats()). Each scheme declares any expensive required stats viastats_options()(specifically, distinct values and their frequencies via a hash map), and the compressor merges all eligible schemes' options before generating stats so that expensive computations only run when needed.vortex-btrblocksremains the batteries-included assembler. It depends onvortex-compressorand registers all encoding-specific schemes (BitPacking, FoR, ALP, FSST, etc.).new_excludesvectors. Schemes declaredescendant_exclusions(push) andancestor_exclusions(pull) to prevent incompatible combinations in the cascade chain. The compressor enforces these automatically along with self-exclusion (no scheme appears twice in a chain). We do this specifically to avoid a dependency cycle.compress_childencapsulates cascade budget tracking. Schemes callcompressor.compress_child(array, &ctx, self.id(), child_index)instead of manually building contexts and callingcompress_canonical. If the cascade budget is exhausted, the child is returned as-is.compress_canonicalbranches intoSchemeimplementations (DecimalScheme,TemporalScheme), registered inALL_SCHEMESlike any other scheme.Note that essentially none of the scheme logic was changed (so the estimation and compress logic is all mostly identical to before). The things that changed were just the framework around that.
API Changes
TODO
Testing
TODO