Skip to content

Conversation

@AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Nov 29, 2024

As a followup to #1455 as @robert3005 noted.

@AdamGS AdamGS requested review from danking and robert3005 November 29, 2024 16:38
@AdamGS AdamGS force-pushed the adamg/metadata-read-followup branch from 1ad2ec4 to 9095baf Compare November 29, 2024 16:48
@AdamGS AdamGS added the benchmark Run benchmarks on this branch label Nov 29, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Nov 29, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion

Details
Benchmark suite Current: 9095baf Previous: 6696266 Ratio
arrow/planning 811414.2201600871 ns (1339.3570784056792) 804287.4003737285 ns (3060.038381783932) 1.01
arrow/exec 1767571.5728310489 ns (6562.712357198354) 1768153.037634904 ns (5285.6973435455) 1.00
vortex-pushdown-compressed/planning 502533.68012416555 ns (960.8454571783368) 506083.14675167593 ns (2309.68065084322) 0.99
vortex-pushdown-compressed/exec 2625691.321499998 ns (9507.288618749706) 2665428.875263158 ns (11847.3367236841) 0.99
vortex-pushdown-uncompressed/planning 502612.5573503771 ns (987.8515674743976) 508341.47328858654 ns (1783.9559742663987) 0.99
vortex-pushdown-uncompressed/exec 1481126.6082006928 ns (5102.038431004155) 1470004.3147268945 ns (4795.4555683872895) 1.01
vortex-nopushdown-compressed/planning 840108.6625838736 ns (1770.1300886661047) 841686.590321122 ns (2306.2376911277534) 1.00
vortex-nopushdown-compressed/exec 3580110.4635714297 ns (16389.981544642244) 3807679.6830769223 ns (17064.216961538885) 0.94
vortex-nopushdown-uncompressed/planning 819913.0605457122 ns (1998.1944451162708) 822935.650612076 ns (2457.635306605138) 1.00
vortex-nopushdown-uncompressed/exec 5053103.407272727 ns (14624.992534091696) 5220411.030000001 ns (10408.77533749817) 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random Access

Details
Benchmark suite Current: 9095baf Previous: 6696266 Ratio
random-access/vortex-tokio-local-disk 2170506.5358333336 ns (14008.30539062526) 2619887.8745000004 ns (13685.819999999367) 0.83
random-access/vortex-local-fs 2555708.8455000008 ns (18080.37664999976) 3009190.087647059 ns (12081.916294117458) 0.85
random-access/parquet-tokio-local-disk 216877191.2333333 ns (3583941.740416676) 225727101.8333333 ns (2847408.4895833135) 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC-H

Details
Benchmark suite Current: 9095baf Previous: 6696266 Ratio
tpch_q1/arrow 536656502 ns 567013372 ns 0.95
tpch_q1/parquet 764549883 ns 775106659 ns 0.99
tpch_q1/vortex-file-compressed 445212209 ns 464638158 ns 0.96
tpch_q2/arrow 140569968 ns 148784622 ns 0.94
tpch_q2/parquet 176780482 ns 184437604 ns 0.96
tpch_q2/vortex-file-compressed 143264872 ns 149780852 ns 0.96
tpch_q3/arrow 170457178 ns 173737150 ns 0.98
tpch_q3/parquet 371647619 ns 384411307 ns 0.97
tpch_q3/vortex-file-compressed 221562989 ns 226808871 ns 0.98
tpch_q4/arrow 180941923 ns 181725872 ns 1.00
tpch_q4/parquet 217120469 ns 225020347 ns 0.96
tpch_q4/vortex-file-compressed 138259141 ns 147223099 ns 0.94
tpch_q5/arrow 322502029 ns 344948582 ns 0.93
tpch_q5/parquet 497212773 ns 528383612 ns 0.94
tpch_q5/vortex-file-compressed 317996537 ns 334469700 ns 0.95
tpch_q6/arrow 25869290 ns 27607786 ns 0.94
tpch_q6/parquet 152918230 ns 156253557 ns 0.98
tpch_q6/vortex-file-compressed 20368369 ns 16346826 ns 1.25
tpch_q7/arrow 625845195 ns 660002901 ns 0.95
tpch_q7/parquet 767441884 ns 805565795 ns 0.95
tpch_q7/vortex-file-compressed 613931790 ns 642087022 ns 0.96
tpch_q8/arrow 260929266 ns 271597055 ns 0.96
tpch_q8/parquet 538629654 ns 560580768 ns 0.96
tpch_q8/vortex-file-compressed 281105165 ns 292607203 ns 0.96
tpch_q9/arrow 470236205 ns 498399441 ns 0.94
tpch_q9/parquet 778024379 ns 816105445 ns 0.95
tpch_q9/vortex-file-compressed 503220085 ns 535727637 ns 0.94
tpch_q10/arrow 266786564 ns 277451475 ns 0.96
tpch_q10/parquet 511450349 ns 531585225 ns 0.96
tpch_q10/vortex-file-compressed 270915280 ns 286752497 ns 0.94
tpch_q11/arrow 141049426 ns 151305982 ns 0.93
tpch_q11/parquet 148027975 ns 159799881 ns 0.93
tpch_q11/vortex-file-compressed 123587112 ns 132614956 ns 0.93
tpch_q12/arrow 182452359 ns 187340513 ns 0.97
tpch_q12/parquet 330168972 ns 338808134 ns 0.97
tpch_q12/vortex-file-compressed 203611750 ns 219500588 ns 0.93
tpch_q13/arrow 168977682 ns 188874938 ns 0.89
tpch_q13/parquet 308431147 ns 329472541 ns 0.94
tpch_q13/vortex-file-compressed 225986047 ns 192620790 ns 1.17
tpch_q14/arrow 37840246 ns 40392105 ns 0.94
tpch_q14/parquet 233971058 ns 240472174 ns 0.97
tpch_q14/vortex-file-compressed 70288473 ns 73283264 ns 0.96
tpch_q15/arrow 67999700 ns 72673226 ns 0.94
tpch_q15/parquet 323464345 ns 335157968 ns 0.97
tpch_q15/vortex-file-compressed 132113104 ns 135381723 ns 0.98
tpch_q16/arrow 105070087 ns 112123802 ns 0.94
tpch_q16/parquet 118106062 ns 126093723 ns 0.94
tpch_q16/vortex-file-compressed 103527946 ns 109026757 ns 0.95
tpch_q17/arrow 594790077 ns 666980163 ns 0.89
tpch_q17/parquet 680023165 ns 707607185 ns 0.96
tpch_q17/vortex-file-compressed 544444751 ns 559216437 ns 0.97
tpch_q18/arrow 1140832248 ns 1229075109 ns 0.93
tpch_q18/parquet 1351047743 ns 1429878396 ns 0.94
tpch_q18/vortex-file-compressed 1119184854 ns 1193707490 ns 0.94
tpch_q19/arrow 153407330 ns 155720308 ns 0.99
tpch_q19/parquet 424579516 ns 433581870 ns 0.98
tpch_q19/vortex-file-compressed 135762416 ns 138250265 ns 0.98
tpch_q20/arrow 211038724 ns 236795282 ns 0.89
tpch_q20/parquet 346289368 ns 368443012 ns 0.94
tpch_q20/vortex-file-compressed 260956801 ns 276995405 ns 0.94
tpch_q21/arrow 987480877 ns 1034269438 ns 0.95
tpch_q21/parquet 1099561756 ns 1169543589 ns 0.94
tpch_q21/vortex-file-compressed 862504811 ns 926745029 ns 0.93
tpch_q22/arrow 79528247 ns 81507673 ns 0.98
tpch_q22/parquet 110547212 ns 112823643 ns 0.98
tpch_q22/vortex-file-compressed 84448192 ns 88697092 ns 0.95

This comment was automatically generated by workflow using github-action-benchmark.

@AdamGS
Copy link
Contributor Author

AdamGS commented Nov 29, 2024

A bit surprised by the performance regressions, I'll take a deeper look next week.

@gatesn
Copy link
Contributor

gatesn commented Nov 29, 2024

I don't think this should be a statistic.

Perhaps a compute function, or just something like nbytes.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex Compression

Details
Benchmark suite Current: 9095baf Previous: 6696266 Ratio
compress time/taxi 1283158196 ns (2366414.9499999285) 1273257704.2 ns (4112478.0499999523) 1.01
compress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd compress time/taxi 1706577401.1 ns (2740387.7000000477) 1694299786.2 ns (2528193.118749976) 1.01
parquet_rs-zstd compress time/taxi throughput 470808924 bytes 470808924 bytes 1
decompress time/taxi 354306041.4 ns (938043.1062499881) 363921545.1 ns (2020241.9937499762) 0.97
decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
parquet_rs-zstd decompress time/taxi 306241250.9 ns (1083544.800000012) 307065771.6 ns (1014370.849999994) 1.00
parquet_rs-zstd decompress time/taxi throughput 470808924 bytes 470808924 bytes 1
vortex:parquet-zstd size/taxi 1.0305060273430637 ratio 1.0302258312558379 ratio 1.00
vortex:raw size/taxi 0.12248695608836845 ratio 0.12245365170690775 ratio 1.00
vortex size/taxi 57667952 bytes 57652272 bytes 1.00
compress time/AirlineSentiment 1024753.0407865169 ns (2597.1738338014693) 1007805.6009148788 ns (1720.9680495159118) 1.02
compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd compress time/AirlineSentiment 55307.5118153914 ns (147.98978785103827) 55374.599144666376 ns (63.3076588507829) 1.00
parquet_rs-zstd compress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
decompress time/AirlineSentiment 123138.70574833061 ns (332.1189923180136) 120136.72450696661 ns (585.6543130579448) 1.02
decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
parquet_rs-zstd decompress time/AirlineSentiment 32253.893163291406 ns (38.709615989399026) 31338.820056011107 ns (55.07451221761221) 1.03
parquet_rs-zstd decompress time/AirlineSentiment throughput 2020 bytes 2020 bytes 1
vortex:parquet-zstd size/AirlineSentiment 12.339193381592555 ratio 11.544984488107549 ratio 1.07
vortex:raw size/AirlineSentiment 5.906930693069307 ratio 5.526732673267326 ratio 1.07
vortex size/AirlineSentiment 11932 bytes 11164 bytes 1.07
compress time/Arade 2277532692.3 ns (2568211.690000057) 2264178168.9 ns (4759313.25) 1.01
compress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd compress time/Arade 2892210197.4 ns (7792294.27125001) 2901534368.5 ns (8350509.5) 1.00
parquet_rs-zstd compress time/Arade throughput 787023760 bytes 787023760 bytes 1
decompress time/Arade 624256280.1 ns (2600586.6100000143) 613225196.7 ns (1660682.8812500238) 1.02
decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
parquet_rs-zstd decompress time/Arade 668002509.6 ns (2306066.378750026) 663225141.9 ns (2044824.8974999785) 1.01
parquet_rs-zstd decompress time/Arade throughput 787023760 bytes 787023760 bytes 1
vortex:parquet-zstd size/Arade 0.49387982007952463 ratio 0.4938228181453686 ratio 1.00
vortex:raw size/Arade 0.1916428952538866 ratio 0.1916207764807507 ratio 1.00
vortex size/Arade 150827512 bytes 150810104 bytes 1.00
compress time/Bimbo 10719829676.6 ns (15692589.894999504) 10705558425.5 ns (8099288.300000191) 1.00
compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd compress time/Bimbo 19361264112.3 ns (20850139.752500534) 19367437105 ns (22691898.097499847) 1.00
parquet_rs-zstd compress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
decompress time/Bimbo 3920832189.4 ns (11211390.650000095) 3811670809.9 ns (5582022.668750048) 1.03
decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
parquet_rs-zstd decompress time/Bimbo 2675582378.6 ns (9004549.4775002) 2679680139.2 ns (6769897.154999971) 1.00
parquet_rs-zstd decompress time/Bimbo throughput 7121333608 bytes 7121333608 bytes 1
vortex:parquet-zstd size/Bimbo 1.8552079259263476 ratio 1.8277250112243697 ratio 1.02
vortex:raw size/Bimbo 0.10111880662226659 ratio 0.0996208400071348 ratio 1.02
vortex size/Bimbo 720100756 bytes 709433236 bytes 1.02
compress time/CMSprovider 12492910154.3 ns (12778204.600000381) 12251128093.2 ns (19108270.376250267) 1.02
compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd compress time/CMSprovider 18573850934.3 ns (24206615.17750168) 18277188372.7 ns (27211560.393749237) 1.02
parquet_rs-zstd compress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
decompress time/CMSprovider 4295405286.3 ns (332703181.94124985) 3455023378 ns (317311201.54999995) 1.24
decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
parquet_rs-zstd decompress time/CMSprovider 5323077450.3 ns (17531432.44999981) 5527636978.3 ns (10920543.650000095) 0.96
parquet_rs-zstd decompress time/CMSprovider throughput 5149123964 bytes 5149123964 bytes 1
vortex:parquet-zstd size/CMSprovider 1.328511974269405 ratio 1.3078211079973756 ratio 1.02
vortex:raw size/CMSprovider 0.19854039855079317 ratio 0.19544823761015204 ratio 1.02
vortex size/CMSprovider 1022309124 bytes 1006387204 bytes 1.02
compress time/Euro2016 2680495800.3 ns (4315278.450000048) 2679316084.9 ns (5234151.927500248) 1.00
compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd compress time/Euro2016 1534135025.4 ns (3707327.495000005) 1520575086.5 ns (3364003.966249943) 1.01
parquet_rs-zstd compress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
decompress time/Euro2016 297372359.7 ns (1014572.9137499928) 292386810.8 ns (1186322.199999988) 1.02
decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
parquet_rs-zstd decompress time/Euro2016 483086566.8 ns (2198970.4600000083) 479503781.8 ns (1754925.5100000203) 1.01
parquet_rs-zstd decompress time/Euro2016 throughput 393253221 bytes 393253221 bytes 1
vortex:parquet-zstd size/Euro2016 1.482526210343833 ratio 1.4831851057079755 ratio 1.00
vortex:raw size/Euro2016 0.4482032812135568 ratio 0.4484024811077135 ratio 1.00
vortex size/Euro2016 176257384 bytes 176335720 bytes 1.00
compress time/Food 1037969314.7 ns (3663724.8549999595) 958239752.2 ns (3250341.146250069) 1.08
compress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd compress time/Food 1029550791.3 ns (1844639.800000012) 1031135399.6 ns (2028689.6124999523) 1.00
parquet_rs-zstd compress time/Food throughput 332718229 bytes 332718229 bytes 1
decompress time/Food 126932340.30499999 ns (429186.3840624988) 121666205.33559522 ns (421162.6320684552) 1.04
decompress time/Food throughput 332718229 bytes 332718229 bytes 1
parquet_rs-zstd decompress time/Food 221026619.4 ns (881098.5481249988) 219378143.1 ns (392373.875) 1.01
parquet_rs-zstd decompress time/Food throughput 332718229 bytes 332718229 bytes 1
vortex:parquet-zstd size/Food 1.4171873468552165 ratio 1.4170195301889623 ratio 1.00
vortex:raw size/Food 0.15431891469944076 ratio 0.15430064097870633 ratio 1.00
vortex size/Food 51344716 bytes 51338636 bytes 1.00
compress time/HashTags 2485683670.5 ns (4573608.731249809) 2471010239.3 ns (2855255.9500000477) 1.01
compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd compress time/HashTags 2437325057.8 ns (5515173.5224998) 2425962054.6 ns (2702472.4787499905) 1.00
parquet_rs-zstd compress time/HashTags throughput 804495592 bytes 804495592 bytes 1
decompress time/HashTags 461320311.6 ns (2029490.6649999917) 451949403.4 ns (946712.9737499952) 1.02
decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
parquet_rs-zstd decompress time/HashTags 790341006.4 ns (2226761.6862499714) 768271816.8 ns (3405217.8025000095) 1.03
parquet_rs-zstd decompress time/HashTags throughput 804495592 bytes 804495592 bytes 1
vortex:parquet-zstd size/HashTags 1.6967941999513934 ratio 1.696667122455664 ratio 1.00
vortex:raw size/HashTags 0.28255205157171326 ratio 0.2825308904862216 ratio 1.00
vortex size/HashTags 227311880 bytes 227294856 bytes 1.00
compress time/TPC-H l_comment chunked without fsst 3320285481.2 ns (13308041.376250029) 3288979278.7 ns (13542434.357499838) 1.01
compress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst 917646536.1 ns (1726417.3487500548) 905224851.9 ns (1714537.0325000286) 1.01
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
decompress time/TPC-H l_comment chunked without fsst 160560948 ns (360293.3862500042) 157464725.8 ns (623900.049999997) 1.02
decompress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst 252088843.4 ns (735379.0725000054) 249149892.2 ns (771346.6712500006) 1.01
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput 249197098 bytes 249197098 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked without fsst 4.609638356114052 ratio 4.60956593739563 ratio 1.00
vortex:raw size/TPC-H l_comment chunked without fsst 1.053231334178699 ratio 1.053199231076118 ratio 1.00
vortex size/TPC-H l_comment chunked without fsst 262462192 bytes 262454192 bytes 1.00
compress time/TPC-H l_comment chunked 987845648.2 ns (1079198.3499999642) 983195432.8 ns (1487749.643750012) 1.00
compress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment chunked 916547539.5 ns (1271969.269999981) 902074340.3 ns (2967490.1087499857) 1.02
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
decompress time/TPC-H l_comment chunked 132581947.16666666 ns (1673252.215000011) 132793167.95944445 ns (1006282.1018819436) 1.00
decompress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment chunked 251510127.7 ns (1616207.5) 248528670.45 ns (485638.55249999464) 1.01
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 249197098 bytes 249197098 bytes 1
vortex:parquet-zstd size/TPC-H l_comment chunked 1.3522685330950424 ratio 1.352287383061685 ratio 1.00
vortex:raw size/TPC-H l_comment chunked 0.30897252262544406 ratio 0.3089722658006234 ratio 1.00
vortex size/TPC-H l_comment chunked 76995056 bytes 76994992 bytes 1.00
compress time/TPC-H l_comment canonical 990065391.2 ns (857515.8131249547) 983223304.4 ns (1609762.75) 1.01
compress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
parquet_rs-zstd compress time/TPC-H l_comment canonical 913897268.75 ns (1078198.9456250072) 904400313 ns (1884481.5818750262) 1.01
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
decompress time/TPC-H l_comment canonical 134439484.88843253 ns (732047.7224689946) 133110464.56702383 ns (1008702.9804985151) 1.01
decompress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
parquet_rs-zstd decompress time/TPC-H l_comment canonical 250642072.94396824 ns (490663.6031418592) 249070624.86809522 ns (614028.2991666794) 1.01
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 249197114 bytes 249197114 bytes 1
vortex:parquet-zstd size/TPC-H l_comment canonical 1.3523021638873813 ratio 1.3523420592703685 ratio 1.00
vortex:raw size/TPC-H l_comment canonical 0.30897250278749216 ratio 0.308972245962688 ratio 1.00
vortex size/TPC-H l_comment canonical 76995056 bytes 76994992 bytes 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@lwwmanning lwwmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already Stat::UncompressedSizeInBytes that we compute before compressing! Don't need to decompress/recalculate

@lwwmanning
Copy link
Contributor

See #1315

@gatesn
Copy link
Contributor

gatesn commented Nov 29, 2024

It just doesn't seem like it fits the definition of a statistics, because I can compute it in constant time. So why do we need to store it?

@robert3005
Copy link
Contributor

@gatesn we can remove it as a statistic if it's computable in constant time. I don't quite know how to do it but happy to review a pr. Alternatively it can be only a feature of file format and not in memory arrays if you think it shouldn't be a statistic

@AdamGS AdamGS merged commit 4855ff2 into develop Dec 2, 2024
29 checks passed
@AdamGS AdamGS deleted the adamg/metadata-read-followup branch December 2, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants