Skip to content

Conversation

@AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Apr 23, 2025

File size before this change is about 21.18GB, after this change its 19.23GB.

The issue we found is that for large files, we tend to converge into not-compressing, or settling for mediocre compressions. With this change we don't keep if the encoding is canonical or otherwise non satisfactory, we don't keep that state and try again on the next chunk.

@AdamGS AdamGS added the benchmark Run benchmarks on this branch label Apr 23, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Apr 23, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Apr 23, 2025

Benchmarks: random_access

Table of Results
name PR 4fd965d base 87756b3 ratio (PR/base) unit
random-access/vortex-tokio-local-disk 1282943 1.37317e+06 0.93429 ns
random-access/parquet-tokio-local-disk 269468541 2.6146e+08 1.03063 ns

@github-actions
Copy link
Contributor

github-actions bot commented Apr 23, 2025

Benchmarks: TPC-H on NVME

Table of Results
name PR 4fd965d base 87756b3 ratio (PR/base) unit
tpch_q01/DataFusion:parquet 137180160 1.44871e+08 0.946912 ns
tpch_q02/DataFusion:parquet 108208761 1.07959e+08 1.00232 ns
tpch_q03/DataFusion:parquet 102644974 1.06584e+08 0.963045 ns
tpch_q04/DataFusion:parquet 60963650 6.07905e+07 1.00285 ns
tpch_q05/DataFusion:parquet 112858941 1.16887e+08 0.965538 ns
tpch_q06/DataFusion:parquet 28719552 2.91357e+07 0.985716 ns
tpch_q07/DataFusion:parquet 140267324 1.36324e+08 1.02892 ns
tpch_q08/DataFusion:parquet 167163343 1.54374e+08 1.08285 ns
tpch_q09/DataFusion:parquet 212742043 2.21206e+08 0.961738 ns
tpch_q10/DataFusion:parquet 147588368 1.47675e+08 0.999415 ns
tpch_q11/DataFusion:parquet 49132919 5.20384e+07 0.944167 ns
tpch_q12/DataFusion:parquet 90412032 9.0012e+07 1.00444 ns
tpch_q13/DataFusion:parquet 189586814 1.8879e+08 1.00422 ns
tpch_q14/DataFusion:parquet 49656280 4.90676e+07 1.012 ns
tpch_q15/DataFusion:parquet 84708857 8.12906e+07 1.04205 ns
tpch_q16/DataFusion:parquet 47659291 4.74285e+07 1.00487 ns
tpch_q17/DataFusion:parquet 142515660 1.4074e+08 1.01262 ns
tpch_q18/DataFusion:parquet 218053978 2.20042e+08 0.990964 ns
tpch_q19/DataFusion:parquet 82771034 8.29844e+07 0.997429 ns
tpch_q20/DataFusion:parquet 96253533 9.86822e+07 0.975389 ns
tpch_q21/DataFusion:parquet 182117492 1.86868e+08 0.97458 ns
tpch_q22/DataFusion:parquet 49094985 4.97726e+07 0.986385 ns
tpch_q01/DataFusion:vortex-file-compressed 42842621 4.19088e+07 1.02228 ns
tpch_q02/DataFusion:vortex-file-compressed 45209007 4.46165e+07 1.01328 ns
tpch_q03/DataFusion:vortex-file-compressed 25954345 2.5504e+07 1.01766 ns
tpch_q04/DataFusion:vortex-file-compressed 14168705 1.37287e+07 1.03205 ns
tpch_q05/DataFusion:vortex-file-compressed 42354757 4.20729e+07 1.0067 ns
tpch_q06/DataFusion:vortex-file-compressed 5478512 5.33262e+06 1.02736 ns
tpch_q07/DataFusion:vortex-file-compressed 66188646 6.47613e+07 1.02204 ns
tpch_q08/DataFusion:vortex-file-compressed 51025066 5.01265e+07 1.01793 ns
tpch_q09/DataFusion:vortex-file-compressed 69767144 6.9555e+07 1.00305 ns
tpch_q10/DataFusion:vortex-file-compressed 48935288 4.80079e+07 1.01932 ns
tpch_q11/DataFusion:vortex-file-compressed 22315916 2.25294e+07 0.990525 ns
tpch_q12/DataFusion:vortex-file-compressed 17410116 1.63399e+07 1.0655 ns
tpch_q13/DataFusion:vortex-file-compressed 19729143 1.99182e+07 0.990511 ns
tpch_q14/DataFusion:vortex-file-compressed 9671077 9.67893e+06 0.999189 ns
tpch_q15/DataFusion:vortex-file-compressed 18549572 1.86362e+07 0.995353 ns
tpch_q16/DataFusion:vortex-file-compressed 23012193 2.34603e+07 0.980898 ns
tpch_q17/DataFusion:vortex-file-compressed 62424495 6.20653e+07 1.00579 ns
tpch_q18/DataFusion:vortex-file-compressed 98564658 9.93092e+07 0.992503 ns
tpch_q19/DataFusion:vortex-file-compressed 26902541 2.70054e+07 0.996192 ns
tpch_q20/DataFusion:vortex-file-compressed 28656598 2.83247e+07 1.01172 ns
tpch_q21/DataFusion:vortex-file-compressed 90346671 9.13027e+07 0.989529 ns
tpch_q22/DataFusion:vortex-file-compressed 26993764 2.80137e+07 0.96359 ns
tpch_q01/DuckDB:parquet 103600706 1.03855e+08 0.997551 ns
tpch_q02/DuckDB:parquet 86139506 8.63931e+07 0.997065 ns
tpch_q03/DuckDB:parquet 150589409 1.49504e+08 1.00726 ns
tpch_q04/DuckDB:parquet 107637945 1.0803e+08 0.996375 ns
tpch_q05/DuckDB:parquet 141709589 1.41718e+08 0.999943 ns
tpch_q06/DuckDB:parquet 56784522 5.71693e+07 0.993269 ns
tpch_q07/DuckDB:parquet 137107091 1.36669e+08 1.00321 ns
tpch_q08/DuckDB:parquet 171692985 1.71727e+08 0.999803 ns
tpch_q09/DuckDB:parquet 193772858 1.92767e+08 1.00522 ns
tpch_q10/DuckDB:parquet 230087814 2.28005e+08 1.00914 ns
tpch_q11/DuckDB:parquet 51409573 5.14209e+07 0.999779 ns
tpch_q12/DuckDB:parquet 104525353 1.04888e+08 0.996544 ns
tpch_q13/DuckDB:parquet 305371149 3.05031e+08 1.00111 ns
tpch_q14/DuckDB:parquet 92955662 9.29231e+07 1.00035 ns
tpch_q15/DuckDB:parquet 63306740 6.28639e+07 1.00704 ns
tpch_q16/DuckDB:parquet 83904610 8.40194e+07 0.998633 ns
tpch_q17/DuckDB:parquet 102381727 1.02245e+08 1.00134 ns
tpch_q18/DuckDB:parquet 163606234 1.62559e+08 1.00644 ns
tpch_q19/DuckDB:parquet 114690537 1.14484e+08 1.0018 ns
tpch_q20/DuckDB:parquet 124351592 1.23686e+08 1.00538 ns
tpch_q21/DuckDB:parquet 209524330 2.06637e+08 1.01397 ns
tpch_q22/DuckDB:parquet 87537390 8.74934e+07 1.0005 ns
tpch_q01/DuckDB:vortex-file-compressed 341691629 3.40124e+08 1.00461 ns
tpch_q02/DuckDB:vortex-file-compressed 106869999 1.06237e+08 1.00596 ns
tpch_q03/DuckDB:vortex-file-compressed 297457662 2.65937e+08 1.11853 ns
tpch_q04/DuckDB:vortex-file-compressed 205495538 1.70823e+08 1.20297 ns
tpch_q05/DuckDB:vortex-file-compressed 469271918 4.34283e+08 1.08057 ns
tpch_q06/DuckDB:vortex-file-compressed 60897370 6.01886e+07 1.01178 ns
tpch_q07/DuckDB:vortex-file-compressed 251590632 2.23886e+08 1.12375 ns
tpch_q08/DuckDB:vortex-file-compressed 488368574 4.52779e+08 1.0786 ns
tpch_q09/DuckDB:vortex-file-compressed 879601213 8.36972e+08 1.05093 ns
tpch_q10/DuckDB:vortex-file-compressed 250678807 2.24501e+08 1.1166 ns
tpch_q11/DuckDB:vortex-file-compressed 77518859 7.79922e+07 0.993932 ns
tpch_q12/DuckDB:vortex-file-compressed 245877359 1.95939e+08 1.25486 ns
tpch_q13/DuckDB:vortex-file-compressed 298213870 2.99417e+08 0.995983 ns
tpch_q14/DuckDB:vortex-file-compressed 74656202 7.56848e+07 0.98641 ns
tpch_q15/DuckDB:vortex-file-compressed 70549061 7.03237e+07 1.00321 ns
tpch_q16/DuckDB:vortex-file-compressed 98820541 1.00025e+08 0.987954 ns
tpch_q17/DuckDB:vortex-file-compressed 133365087 1.35426e+08 0.984782 ns
tpch_q18/DuckDB:vortex-file-compressed 350490691 3.51501e+08 0.997126 ns
tpch_q19/DuckDB:vortex-file-compressed 146463388 1.48628e+08 0.985436 ns
tpch_q20/DuckDB:vortex-file-compressed 136157110 1.35364e+08 1.00586 ns
tpch_q21/DuckDB:vortex-file-compressed 680285583 5.87498e+08 1.15794 ns
tpch_q22/DuckDB:vortex-file-compressed 68099432 7.08573e+07 0.961079 ns

@github-actions
Copy link
Contributor

github-actions bot commented Apr 23, 2025

Benchmarks: TPC-H on S3

Table of Results
name PR 4fd965d base 87756b3 ratio (PR/base) unit
tpch_q01/DataFusion:parquet 305499437 2.9885e+08 1.02225 ns
tpch_q02/DataFusion:parquet 777030718 7.74916e+08 1.00273 ns
tpch_q03/DataFusion:parquet 474080348 4.42255e+08 1.07196 ns
tpch_q04/DataFusion:parquet 254886057 2.48779e+08 1.02455 ns
tpch_q05/DataFusion:parquet 632986045 6.28123e+08 1.00774 ns
tpch_q06/DataFusion:parquet 190905178 1.98358e+08 0.962426 ns
tpch_q07/DataFusion:parquet 683451099 6.77691e+08 1.0085 ns
tpch_q08/DataFusion:parquet 874331037 8.48062e+08 1.03098 ns
tpch_q09/DataFusion:parquet 731649860 7.53262e+08 0.971309 ns
tpch_q10/DataFusion:parquet 574195375 5.71633e+08 1.00448 ns
tpch_q11/DataFusion:parquet 307805724 3.05361e+08 1.008 ns
tpch_q12/DataFusion:parquet 295857263 2.95847e+08 1.00004 ns
tpch_q13/DataFusion:parquet 442754642 4.3983e+08 1.00665 ns
tpch_q14/DataFusion:parquet 294132022 2.78213e+08 1.05722 ns
tpch_q15/DataFusion:parquet 509529775 5.1398e+08 0.991342 ns
tpch_q16/DataFusion:parquet 299045403 2.98267e+08 1.00261 ns
tpch_q17/DataFusion:parquet 440513245 4.48532e+08 0.982123 ns
tpch_q18/DataFusion:parquet 591367034 5.95837e+08 0.992497 ns
tpch_q19/DataFusion:parquet 322833587 3.07434e+08 1.05009 ns
tpch_q20/DataFusion:parquet 542960869 5.52748e+08 0.982294 ns
tpch_q21/DataFusion:parquet 705268663 6.86666e+08 1.02709 ns
tpch_q22/DataFusion:parquet 284739376 2.73219e+08 1.04216 ns
tpch_q01/DataFusion:vortex-file-compressed 154380378 1.52667e+08 1.01122 ns
tpch_q02/DataFusion:vortex-file-compressed 146744051 1.50361e+08 0.975944 ns
tpch_q03/DataFusion:vortex-file-compressed 189327880 1.93519e+08 0.978342 ns
tpch_q04/DataFusion:vortex-file-compressed 153944377 1.5539e+08 0.990696 ns
tpch_q05/DataFusion:vortex-file-compressed 197887759 2.13425e+08 0.927201 ns
tpch_q06/DataFusion:vortex-file-compressed 117305467 1.21486e+08 0.965592 ns
tpch_q07/DataFusion:vortex-file-compressed 230432531 2.41493e+08 0.9542 ns
tpch_q08/DataFusion:vortex-file-compressed 262086060 2.80934e+08 0.93291 ns
tpch_q09/DataFusion:vortex-file-compressed 310978273 3.32103e+08 0.936391 ns
tpch_q10/DataFusion:vortex-file-compressed 266989588 2.98348e+08 0.894893 ns
tpch_q11/DataFusion:vortex-file-compressed 83879740 8.46346e+07 0.991081 ns
tpch_q12/DataFusion:vortex-file-compressed 165842125 2.16037e+08 0.767655 ns
tpch_q13/DataFusion:vortex-file-compressed 127443249 1.27863e+08 0.996719 ns
tpch_q14/DataFusion:vortex-file-compressed 140777338 1.48118e+08 0.950439 ns
tpch_q15/DataFusion:vortex-file-compressed 220102445 2.19123e+08 1.00447 ns
tpch_q16/DataFusion:vortex-file-compressed 89567824 8.36067e+07 1.0713 ns
tpch_q17/DataFusion:vortex-file-compressed 243005056 2.45083e+08 0.991523 ns
tpch_q18/DataFusion:vortex-file-compressed 285058938 3.18233e+08 0.895755 ns
tpch_q19/DataFusion:vortex-file-compressed 162264813 1.57866e+08 1.02786 ns
tpch_q20/DataFusion:vortex-file-compressed 233390726 2.37496e+08 0.982716 ns
tpch_q21/DataFusion:vortex-file-compressed 373386281 3.77745e+08 0.988462 ns
tpch_q22/DataFusion:vortex-file-compressed 123861276 1.18798e+08 1.04262 ns

@AdamGS AdamGS added the benchmark Run benchmarks on this branch label Apr 23, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Apr 23, 2025
@github-actions
Copy link
Contributor

Benchmarks: compress

Table of Results
name PR 4fd965d base 87756b3 ratio (PR/base) unit
compress time/taxi throughput 0.397005 0.408395 0.972109 bytes/ns
parquet_rs-zstd compress time/taxi throughput 0.274461 0.286399 0.958317 bytes/ns
decompress time/taxi throughput 2.1232 2.18996 0.969518 bytes/ns
parquet_rs-zstd decompress time/taxi throughput 1.60932 1.70421 0.944319 bytes/ns
compress time/Arade throughput 0.456371 0.465024 0.981392 bytes/ns
parquet_rs-zstd compress time/Arade throughput 0.354479 0.3727 0.951111 bytes/ns
decompress time/Arade throughput 4.6988 5.09086 0.922989 bytes/ns
parquet_rs-zstd decompress time/Arade throughput 1.6478 1.72299 0.956362 bytes/ns
compress time/Bimbo throughput 0.390409 0.401658 0.971995 bytes/ns
parquet_rs-zstd compress time/Bimbo throughput 0.145649 0.156047 0.933364 bytes/ns
decompress time/Bimbo throughput 2.68117 2.80719 0.955107 bytes/ns
parquet_rs-zstd decompress time/Bimbo throughput 1.04328 1.09939 0.948966 bytes/ns
compress time/CMSprovider throughput 0.210176 0.244675 0.859002 bytes/ns
parquet_rs-zstd compress time/CMSprovider throughput 0.321242 0.334957 0.959054 bytes/ns
decompress time/CMSprovider throughput 3.77531 4.53304 0.832844 bytes/ns
parquet_rs-zstd decompress time/CMSprovider throughput 1.65751 1.70097 0.974445 bytes/ns
compress time/Euro2016 throughput 0.199168 0.21795 0.913825 bytes/ns
parquet_rs-zstd compress time/Euro2016 throughput 0.30332 0.319457 0.949488 bytes/ns
decompress time/Euro2016 throughput 2.11963 2.17536 0.974381 bytes/ns
parquet_rs-zstd decompress time/Euro2016 throughput 1.04099 1.08218 0.961936 bytes/ns
compress time/Food throughput 0.278521 0.283895 0.98107 bytes/ns
parquet_rs-zstd compress time/Food throughput 0.238221 0.249437 0.955037 bytes/ns
decompress time/Food throughput 2.94878 3.07435 0.959158 bytes/ns
parquet_rs-zstd decompress time/Food throughput 1.15843 1.20331 0.9627 bytes/ns
compress time/HashTags throughput 0.282129 0.292013 0.966152 bytes/ns
parquet_rs-zstd compress time/HashTags throughput 0.432755 0.451445 0.958598 bytes/ns
decompress time/HashTags throughput 2.92885 3.08418 0.949634 bytes/ns
parquet_rs-zstd decompress time/HashTags throughput 1.44666 1.58269 0.914051 bytes/ns
compress time/TPC-H l_comment chunked throughput 0.224427 0.231229 0.970587 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 0.277305 0.288883 0.959923 bytes/ns
decompress time/TPC-H l_comment chunked throughput 2.51656 2.60615 0.965624 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 1.03655 1.11992 0.925554 bytes/ns
compress time/TPC-H l_comment canonical throughput 0.128781 0.131306 0.980769 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 0.276387 0.288305 0.958664 bytes/ns
decompress time/TPC-H l_comment canonical throughput 2.32071 2.38763 0.971972 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 1.03197 1.09271 0.944411 bytes/ns
compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.13379 0.136107 0.982976 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.182249 0.184114 0.98987 bytes/ns
decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.869959 0.846778 1.02738 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.465724 0.474496 0.981513 bytes/ns
compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.134197 0.136874 0.980446 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.174094 0.194403 0.895531 bytes/ns
decompress time/wide table cols=100 chunks=1 rows=1000 throughput 0.967968 0.985694 0.982017 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 throughput 0.44687 0.45047 0.992007 bytes/ns
compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.124041 0.135245 0.917159 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.159541 0.169251 0.942628 bytes/ns
decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.730167 0.805501 0.906475 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.397885 0.449762 0.884657 bytes/ns
compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.0640162 0.064205 0.997059 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.129173 0.121567 1.06257 bytes/ns
decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.899766 0.877254 1.02566 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.482121 0.485195 0.993663 bytes/ns
compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.056791 0.0630858 0.900219 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.111891 0.130231 0.859174 bytes/ns
decompress time/wide table cols=100 chunks=50 rows=1000 throughput 0.96286 1.01652 0.947211 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 throughput 0.453323 0.465711 0.973401 bytes/ns
compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.0462293 0.0518511 0.891578 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.0886384 0.0965467 0.918089 bytes/ns
decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.656859 0.775391 0.847132 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.347279 0.403219 0.861266 bytes/ns
vortex:raw size/taxi 0.114297 0.114297 1 ratio
vortex size/taxi 5.6499e+07 5.6499e+07 1 bytes
vortex:parquet-zstd size/taxi 1.00974 1.00974 1 ratio
vortex:raw size/Arade 0.155028 0.155028 1 ratio
vortex size/Arade 1.65588e+08 1.65588e+08 1 bytes
vortex:parquet-zstd size/Arade 0.542236 0.542236 1 ratio
vortex:raw size/Bimbo 0.172417 0.172417 1 ratio
vortex size/Bimbo 5.37177e+08 5.37177e+08 1 bytes
vortex:parquet-zstd size/Bimbo 1.37812 1.37812 1 ratio
vortex:raw size/CMSprovider 0.455994 0.255994 1.78127 ratio
vortex size/CMSprovider 2.65875e+09 1.49262e+09 1.78127 bytes
vortex:parquet-zstd size/CMSprovider 3.4518 1.93783 1.78127 ratio
vortex:raw size/Euro2016 0.374634 0.381775 0.981296 ratio
vortex size/Euro2016 1.75314e+08 1.78656e+08 0.981296 bytes
vortex:parquet-zstd size/Euro2016 1.42392 1.45106 0.981296 ratio
vortex:raw size/Food 0.224544 0.224544 1 ratio
vortex size/Food 5.73069e+07 5.73069e+07 1 bytes
vortex:parquet-zstd size/Food 1.58237 1.58237 1 ratio
vortex:raw size/HashTags 0.233204 0.243395 0.958128 ratio
vortex size/HashTags 2.43116e+08 2.53741e+08 0.958128 bytes
vortex:parquet-zstd size/HashTags 1.81445 1.89374 0.958128 ratio
vortex:raw size/TPC-H l_comment chunked 0.306134 0.308112 0.993578 ratio
vortex size/TPC-H l_comment chunked 7.62876e+07 7.67807e+07 0.993578 bytes
vortex:parquet-zstd size/TPC-H l_comment chunked 1.33983 1.34852 0.993552 ratio
vortex:raw size/TPC-H l_comment canonical 0.313713 0.306987 1.02191 ratio
vortex size/TPC-H l_comment canonical 7.81746e+07 7.64985e+07 1.02191 bytes
vortex:parquet-zstd size/TPC-H l_comment canonical 1.37306 1.34362 1.02191 ratio
vortex:raw size/wide table cols=10 chunks=1 rows=1000 0.634869 0.634869 1 ratio
vortex size/wide table cols=10 chunks=1 rows=1000 101640 101640 1 bytes
vortex:parquet-zstd size/wide table cols=10 chunks=1 rows=1000 1.08724 1.08724 1 ratio
vortex:raw size/wide table cols=100 chunks=1 rows=1000 0.631328 0.631328 1 ratio
vortex size/wide table cols=100 chunks=1 rows=1000 1.01064e+06 1.01064e+06 1 bytes
vortex:parquet-zstd size/wide table cols=100 chunks=1 rows=1000 1.08113 1.08113 1 ratio
vortex:raw size/wide table cols=1000 chunks=1 rows=1000 0.630974 0.630974 1 ratio
vortex size/wide table cols=1000 chunks=1 rows=1000 1.01006e+07 1.01006e+07 1 bytes
vortex:parquet-zstd size/wide table cols=1000 chunks=1 rows=1000 1.08051 1.08051 1 ratio
vortex:raw size/wide table cols=10 chunks=50 rows=1000 0.618218 0.618218 1 ratio
vortex size/wide table cols=10 chunks=50 rows=1000 101640 101640 1 bytes
vortex:parquet-zstd size/wide table cols=10 chunks=50 rows=1000 1.08724 1.08724 1 ratio
vortex:raw size/wide table cols=100 chunks=50 rows=1000 0.616091 0.616091 1 ratio
vortex size/wide table cols=100 chunks=50 rows=1000 1.01064e+06 1.01064e+06 1 bytes
vortex:parquet-zstd size/wide table cols=100 chunks=50 rows=1000 1.08113 1.08113 1 ratio
vortex:raw size/wide table cols=1000 chunks=50 rows=1000 0.615877 0.615877 1 ratio
vortex size/wide table cols=1000 chunks=50 rows=1000 1.01006e+07 1.01006e+07 1 bytes
vortex:parquet-zstd size/wide table cols=1000 chunks=50 rows=1000 1.08051 1.08051 1 ratio

@github-actions
Copy link
Contributor

Benchmarks: Clickbench on NVME

Table of Results
name PR 4fd965d base 87756b3 ratio (PR/base) unit
clickbench_q00/DataFusion:parquet 2327578 2.1295e+06 1.09302 ns
clickbench_q01/DataFusion:parquet 35904397 3.6258e+07 0.990246 ns
clickbench_q02/DataFusion:parquet 71700521 7.16554e+07 1.00063 ns
clickbench_q03/DataFusion:parquet 55551055 5.45165e+07 1.01898 ns
clickbench_q04/DataFusion:parquet 392976070 3.94364e+08 0.996481 ns
clickbench_q05/DataFusion:parquet 389536488 3.90511e+08 0.997504 ns
clickbench_q06/DataFusion:parquet 2198739 2.15189e+06 1.02177 ns
clickbench_q07/DataFusion:parquet 37450347 3.54942e+07 1.05511 ns
clickbench_q08/DataFusion:parquet 492416279 4.92527e+08 0.999774 ns
clickbench_q09/DataFusion:parquet 704280925 6.989e+08 1.0077 ns
clickbench_q10/DataFusion:parquet 150435496 1.5052e+08 0.999436 ns
clickbench_q11/DataFusion:parquet 178381817 1.75733e+08 1.01507 ns
clickbench_q12/DataFusion:parquet 392199012 3.86586e+08 1.01452 ns
clickbench_q13/DataFusion:parquet 602538140 5.95686e+08 1.0115 ns
clickbench_q14/DataFusion:parquet 378884965 3.84566e+08 0.985227 ns
clickbench_q15/DataFusion:parquet 441290929 4.52563e+08 0.975092 ns
clickbench_q16/DataFusion:parquet 941574922 9.41028e+08 1.00058 ns
clickbench_q17/DataFusion:parquet 829935615 8.13114e+08 1.02069 ns
clickbench_q18/DataFusion:parquet 1833069328 1.80059e+09 1.01804 ns
clickbench_q19/DataFusion:parquet 44765602 4.34088e+07 1.03126 ns
clickbench_q20/DataFusion:parquet 642125373 6.30864e+08 1.01785 ns
clickbench_q21/DataFusion:parquet 704605534 7.07925e+08 0.995312 ns
clickbench_q22/DataFusion:parquet 1124442691 1.11008e+09 1.01294 ns
clickbench_q23/DataFusion:parquet 4730872894 4.76793e+09 0.992227 ns
clickbench_q24/DataFusion:parquet 239994881 2.39671e+08 1.00135 ns
clickbench_q25/DataFusion:parquet 203281788 2.03854e+08 0.997194 ns
clickbench_q26/DataFusion:parquet 270973761 2.68136e+08 1.01058 ns
clickbench_q27/DataFusion:parquet 881485915 8.9162e+08 0.988634 ns
clickbench_q28/DataFusion:parquet 5811191813 5.81844e+09 0.998755 ns
clickbench_q29/DataFusion:parquet 254290476 2.52292e+08 1.00792 ns
clickbench_q30/DataFusion:parquet 383564852 3.79847e+08 1.00979 ns
clickbench_q31/DataFusion:parquet 443037531 4.40864e+08 1.00493 ns
clickbench_q32/DataFusion:parquet 1982893906 2.04484e+09 0.969705 ns
clickbench_q33/DataFusion:parquet 1761132082 1.77065e+09 0.994625 ns
clickbench_q34/DataFusion:parquet 1716084442 1.73095e+09 0.991411 ns
clickbench_q35/DataFusion:parquet 585818241 5.8666e+08 0.998565 ns
clickbench_q36/DataFusion:parquet 166585116 1.66e+08 1.00353 ns
clickbench_q37/DataFusion:parquet 75769383 7.53932e+07 1.00499 ns
clickbench_q38/DataFusion:parquet 103560007 9.94989e+07 1.04082 ns
clickbench_q39/DataFusion:parquet 318756998 3.09749e+08 1.02908 ns
clickbench_q40/DataFusion:parquet 45088104 4.44187e+07 1.01507 ns
clickbench_q41/DataFusion:parquet 42788652 4.23919e+07 1.00936 ns
clickbench_q42/DataFusion:parquet 57341837 5.63251e+07 1.01805 ns
clickbench_q00/DataFusion:vortex-file-compressed 2090552 2.06327e+06 1.01322 ns
clickbench_q01/DataFusion:vortex-file-compressed 8645450 8.66153e+06 0.998144 ns
clickbench_q02/DataFusion:vortex-file-compressed 26662532 2.7175e+07 0.981142 ns
clickbench_q03/DataFusion:vortex-file-compressed 31860864 3.14222e+07 1.01396 ns
clickbench_q04/DataFusion:vortex-file-compressed 356747760 4.24219e+08 0.840952 ns
clickbench_q05/DataFusion:vortex-file-compressed 403126499 4.05888e+08 0.993195 ns
clickbench_q06/DataFusion:vortex-file-compressed 2151624 2.12666e+06 1.01174 ns
clickbench_q07/DataFusion:vortex-file-compressed 11853601 1.17242e+07 1.01104 ns
clickbench_q08/DataFusion:vortex-file-compressed 457348379 5.15709e+08 0.886834 ns
clickbench_q09/DataFusion:vortex-file-compressed 560248199 6.00976e+08 0.932231 ns
clickbench_q10/DataFusion:vortex-file-compressed 64567011 6.67361e+07 0.967498 ns
clickbench_q11/DataFusion:vortex-file-compressed 76601378 7.80664e+07 0.981233 ns
clickbench_q12/DataFusion:vortex-file-compressed 311095599 2.98081e+08 1.04366 ns
clickbench_q13/DataFusion:vortex-file-compressed 453885074 4.66714e+08 0.972513 ns
clickbench_q14/DataFusion:vortex-file-compressed 293760861 2.87586e+08 1.02147 ns
clickbench_q15/DataFusion:vortex-file-compressed 427579991 4.93148e+08 0.867042 ns
clickbench_q16/DataFusion:vortex-file-compressed 953262745 9.64293e+08 0.988562 ns
clickbench_q17/DataFusion:vortex-file-compressed 880333036 9.3661e+08 0.939914 ns
clickbench_q18/DataFusion:vortex-file-compressed 1690173903 1.68342e+09 1.00401 ns
clickbench_q19/DataFusion:vortex-file-compressed 14890257 1.57305e+07 0.946585 ns
clickbench_q20/DataFusion:vortex-file-compressed 289865844 2.87154e+08 1.00944 ns
clickbench_q21/DataFusion:vortex-file-compressed 330949640 3.32314e+08 0.995893 ns
clickbench_q22/DataFusion:vortex-file-compressed 548589377 5.55902e+08 0.986845 ns
clickbench_q23/DataFusion:vortex-file-compressed 1108376269 1.0792e+09 1.02703 ns
clickbench_q24/DataFusion:vortex-file-compressed 105860471 1.08466e+08 0.975981 ns
clickbench_q25/DataFusion:vortex-file-compressed 106840000 1.01737e+08 1.05016 ns
clickbench_q26/DataFusion:vortex-file-compressed 139469170 1.40512e+08 0.992579 ns
clickbench_q27/DataFusion:vortex-file-compressed 656836126 6.83295e+08 0.961277 ns
clickbench_q28/DataFusion:vortex-file-compressed 6434621259 6.3689e+09 1.01032 ns
clickbench_q29/DataFusion:vortex-file-compressed 480438566 4.90954e+08 0.978581 ns
clickbench_q30/DataFusion:vortex-file-compressed 250589200 2.45991e+08 1.01869 ns
clickbench_q31/DataFusion:vortex-file-compressed 275153422 2.67806e+08 1.02744 ns
clickbench_q32/DataFusion:vortex-file-compressed 1623216958 1.65676e+09 0.979751 ns
clickbench_q33/DataFusion:vortex-file-compressed 1501948450 1.56901e+09 0.957261 ns
clickbench_q34/DataFusion:vortex-file-compressed 1468058518 1.53241e+09 0.958006 ns
clickbench_q35/DataFusion:vortex-file-compressed 647052228 6.79839e+08 0.951772 ns
clickbench_q36/DataFusion:vortex-file-compressed 57338845 5.80943e+07 0.986997 ns
clickbench_q37/DataFusion:vortex-file-compressed 31874253 3.16671e+07 1.00654 ns
clickbench_q38/DataFusion:vortex-file-compressed 24989299 2.42164e+07 1.03192 ns
clickbench_q39/DataFusion:vortex-file-compressed 116799618 1.1683e+08 0.999737 ns
clickbench_q40/DataFusion:vortex-file-compressed 16260158 1.6021e+07 1.01493 ns
clickbench_q41/DataFusion:vortex-file-compressed 15914828 1.55048e+07 1.02644 ns
clickbench_q42/DataFusion:vortex-file-compressed 25990635 2.55589e+07 1.01689 ns
clickbench_q00/DuckDB:parquet 277770262 2.85936e+08 0.971442 ns
clickbench_q01/DuckDB:parquet 180845013 1.81232e+08 0.997866 ns
clickbench_q02/DuckDB:parquet 207526628 2.07858e+08 0.998404 ns
clickbench_q03/DuckDB:parquet 205560610 2.0528e+08 1.00137 ns
clickbench_q04/DuckDB:parquet 374204637 3.67772e+08 1.01749 ns
clickbench_q05/DuckDB:parquet 434759085 4.20427e+08 1.03409 ns
clickbench_q06/DuckDB:parquet 192475463 1.91357e+08 1.00585 ns
clickbench_q07/DuckDB:parquet 183471818 1.83356e+08 1.00063 ns
clickbench_q08/DuckDB:parquet 407526080 3.98512e+08 1.02262 ns
clickbench_q09/DuckDB:parquet 518469926 5.15127e+08 1.00649 ns
clickbench_q10/DuckDB:parquet 255177107 2.50973e+08 1.01675 ns
clickbench_q11/DuckDB:parquet 270683138 2.65329e+08 1.02018 ns
clickbench_q12/DuckDB:parquet 452807882 4.35954e+08 1.03866 ns
clickbench_q13/DuckDB:parquet 656950182 6.40364e+08 1.0259 ns
clickbench_q14/DuckDB:parquet 494592573 4.78886e+08 1.0328 ns
clickbench_q15/DuckDB:parquet 399712629 3.91553e+08 1.02084 ns
clickbench_q16/DuckDB:parquet 835773231 8.13866e+08 1.02692 ns
clickbench_q17/DuckDB:parquet 756266836 7.39729e+08 1.02236 ns
clickbench_q18/DuckDB:parquet 1368128386 1.33815e+09 1.0224 ns
clickbench_q19/DuckDB:parquet 183412050 1.84657e+08 0.993257 ns
clickbench_q20/DuckDB:parquet 910833674 8.8107e+08 1.03378 ns
clickbench_q21/DuckDB:parquet 837162548 8.12473e+08 1.03039 ns
clickbench_q22/DuckDB:parquet 1314203990 1.28433e+09 1.02326 ns
clickbench_q23/DuckDB:parquet 2837251999 2.80163e+09 1.01271 ns
clickbench_q24/DuckDB:parquet 176349116 1.73941e+08 1.01385 ns
clickbench_q25/DuckDB:parquet 271935087 2.66383e+08 1.02084 ns
clickbench_q26/DuckDB:parquet 176509459 1.74935e+08 1.009 ns
clickbench_q27/DuckDB:parquet 963281413 9.40586e+08 1.02413 ns
clickbench_q28/DuckDB:parquet 6189666458 6.14404e+09 1.00743 ns
clickbench_q29/DuckDB:parquet 199126172 2.01917e+08 0.986177 ns
clickbench_q30/DuckDB:parquet 492472442 4.88989e+08 1.00712 ns
clickbench_q31/DuckDB:parquet 573469174 5.55917e+08 1.03157 ns
clickbench_q32/DuckDB:parquet 1569847715 1.54342e+09 1.01712 ns
clickbench_q33/DuckDB:parquet 1681378794 1.64257e+09 1.02363 ns
clickbench_q34/DuckDB:parquet 1728980584 1.71475e+09 1.0083 ns
clickbench_q35/DuckDB:parquet 508196784 4.99746e+08 1.01691 ns
clickbench_q36/DuckDB:parquet 186700601 1.8608e+08 1.00333 ns
clickbench_q37/DuckDB:parquet 172625886 1.71103e+08 1.0089 ns
clickbench_q38/DuckDB:parquet 184580864 1.85297e+08 0.996137 ns
clickbench_q39/DuckDB:parquet 220619178 2.17991e+08 1.01206 ns
clickbench_q40/DuckDB:parquet 175971860 1.78944e+08 0.983392 ns
clickbench_q41/DuckDB:parquet 180554848 1.80683e+08 0.999288 ns
clickbench_q42/DuckDB:parquet 169433617 1.70717e+08 0.992481 ns
clickbench_q00/DuckDB:vortex-file-compressed 28555068 2.87112e+07 0.994561 ns
clickbench_q01/DuckDB:vortex-file-compressed 36610511 3.64581e+07 1.00418 ns
clickbench_q02/DuckDB:vortex-file-compressed 97450064 9.89927e+07 0.984417 ns
clickbench_q03/DuckDB:vortex-file-compressed 96579457 1.10986e+08 0.870195 ns
clickbench_q04/DuckDB:vortex-file-compressed 328850016 3.37818e+08 0.973452 ns
clickbench_q05/DuckDB:vortex-file-compressed 428414132 4.20303e+08 1.0193 ns
clickbench_q06/DuckDB:vortex-file-compressed 46629297 4.61161e+07 1.01113 ns
clickbench_q07/DuckDB:vortex-file-compressed 45319225 4.39567e+07 1.031 ns
clickbench_q08/DuckDB:vortex-file-compressed 440053284 4.45175e+08 0.988495 ns
clickbench_q09/DuckDB:vortex-file-compressed 594572901 6.03517e+08 0.98518 ns
clickbench_q10/DuckDB:vortex-file-compressed 166543782 1.86115e+08 0.894844 ns
clickbench_q11/DuckDB:vortex-file-compressed 210228742 2.13207e+08 0.986033 ns
clickbench_q12/DuckDB:vortex-file-compressed 420160769 4.14901e+08 1.01268 ns
clickbench_q13/DuckDB:vortex-file-compressed 702634834 6.88899e+08 1.01994 ns
clickbench_q14/DuckDB:vortex-file-compressed 481676623 4.44403e+08 1.08387 ns
clickbench_q15/DuckDB:vortex-file-compressed 378786013 3.7503e+08 1.01002 ns
clickbench_q16/DuckDB:vortex-file-compressed 927905966 9.20559e+08 1.00798 ns
clickbench_q17/DuckDB:vortex-file-compressed 860680164 8.3932e+08 1.02545 ns
clickbench_q18/DuckDB:vortex-file-compressed 1592996355 1.59993e+09 0.995666 ns
clickbench_q19/DuckDB:vortex-file-compressed 151173060 1.49405e+08 1.01183 ns
clickbench_q20/DuckDB:vortex-file-compressed 1218504191 1.16969e+09 1.04174 ns
clickbench_q21/DuckDB:vortex-file-compressed 876288742 8.91089e+08 0.983391 ns
clickbench_q22/DuckDB:vortex-file-compressed 1535971501 1.58162e+09 0.971136 ns
clickbench_q23/DuckDB:vortex-file-compressed 2421003708 2.40003e+09 1.00874 ns
clickbench_q24/DuckDB:vortex-file-compressed 302877824 2.98518e+08 1.01461 ns
clickbench_q25/DuckDB:vortex-file-compressed 208146544 2.06799e+08 1.00651 ns
clickbench_q26/DuckDB:vortex-file-compressed 289951506 2.94512e+08 0.984515 ns
clickbench_q27/DuckDB:vortex-file-compressed 1335563029 1.33252e+09 1.00228 ns
clickbench_q28/DuckDB:vortex-file-compressed 6326640540 6.28637e+09 1.00641 ns
clickbench_q29/DuckDB:vortex-file-compressed 83537730 7.96755e+07 1.04847 ns
clickbench_q30/DuckDB:vortex-file-compressed 427708698 4.31178e+08 0.991954 ns
clickbench_q31/DuckDB:vortex-file-compressed 647054128 6.15956e+08 1.05049 ns
clickbench_q32/DuckDB:vortex-file-compressed 1779750266 1.78789e+09 0.995445 ns
clickbench_q33/DuckDB:vortex-file-compressed 1948786132 1.8756e+09 1.03902 ns
clickbench_q34/DuckDB:vortex-file-compressed 2025694025 2.04263e+09 0.991709 ns
clickbench_q35/DuckDB:vortex-file-compressed 537516717 5.20677e+08 1.03234 ns
clickbench_q36/DuckDB:vortex-file-compressed 106859355 1.09716e+08 0.97396 ns
clickbench_q37/DuckDB:vortex-file-compressed 63550916 5.99983e+07 1.05921 ns
clickbench_q38/DuckDB:vortex-file-compressed 77736920 7.44077e+07 1.04474 ns
clickbench_q39/DuckDB:vortex-file-compressed 211271474 2.10575e+08 1.00331 ns
clickbench_q40/DuckDB:vortex-file-compressed 45696703 4.78823e+07 0.954355 ns
clickbench_q41/DuckDB:vortex-file-compressed 44742748 4.29386e+07 1.04202 ns
clickbench_q42/DuckDB:vortex-file-compressed 46462729 4.4904e+07 1.03471 ns

@AdamGS
Copy link
Contributor Author

AdamGS commented Apr 23, 2025

This PR also makes a huge change in the cold runs benchmark.

@AdamGS AdamGS changed the title Not changing the ratio considered better Prevent compression from converging into low-quality compressions Apr 23, 2025
@AdamGS AdamGS enabled auto-merge (squash) April 23, 2025 16:40
@AdamGS AdamGS disabled auto-merge April 23, 2025 16:40
@AdamGS AdamGS enabled auto-merge (squash) April 23, 2025 16:40
@codspeed-hq
Copy link

codspeed-hq bot commented Apr 23, 2025

CodSpeed Performance Report

Merging #3092 will improve performances by 14.6%

Comparing adamg/compressor-changes (96e087c) with develop (0478e7c)

Summary

⚡ 1 improvements
✅ 810 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
take_map[(0.1, 1.0)] 615.1 µs 536.7 µs +14.6%

@AdamGS AdamGS merged commit 76e846a into develop Apr 23, 2025
32 checks passed
@AdamGS AdamGS deleted the adamg/compressor-changes branch April 23, 2025 16:53
@onursatici onursatici mentioned this pull request Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants