Gather partial TopN results #21761

Dith3r · 2024-04-30T09:44:02Z

Description

Gathering TopN avoids unnecessary network overhead, especially when both the number of splits and the TopN limit are big.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Improve performance of ORDER BY queries with LIMIT on large data sets. ({issue}`21761`)

lukasz-stec

lgtm

martint · 2024-04-30T16:35:28Z

core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/GatherPartialTopN.java

+import static io.trino.sql.planner.plan.TopNNode.Step.PARTIAL;
+
+/**
+ * Adds local round-robin and gathering exchange on top of partial TopN to limit the task output size.


How big of an issue is this? For a leaf task, the output is currently N * number_of_splits. With this change, it's N. But if N is small, this may not be such big of a deal in general.

here are some numbers

trino:iceberg_small_files_tpch_sf1000_orc_part> explain analyze verbose select * from lineitem order by orderkey limit 10000; Before Output: 2947463228 rows (419.40GB) ... CPU Time: 18061.8s total, 332K rows/s, 9.04MB/s, 45% active Per Node: 9.9 parallelism, 3.29M rows/s, 89.6MB/s Parallelism: 69.4 Peak Memory: 1.07GB 4:20 [6B rows, 159GB] [23.1M rows/s, 627MB/s] After Output: 60000 rows (8.75MB) ... CPU Time: 17139.7s total, 350K rows/s, 9.53MB/s, 34% active Per Node: 17.7 parallelism, 6.2M rows/s, 169MB/s Parallelism: 123.9 Peak Memory: 1.62GB 2:18 [6B rows, 159GB] [43.4M rows/s, 1.15GB/s]

That's a nice win 👍

core/trino-main/src/main/java/io/trino/SystemSessionProperties.java

Gathering TopN avoids unnecessary network overhead, especially when both the number of splits and the TopN limit are big. Co-authored-by: Kamil Endruszkiewicz <kamil.endruszkiewicz@starburstdata.com>

cla-bot bot added the cla-signed label Apr 30, 2024

Dith3r requested review from sopel39, raunaqmorarka and lukasz-stec April 30, 2024 09:44

lukasz-stec approved these changes Apr 30, 2024

View reviewed changes

Dith3r force-pushed the ke/gather-topn branch 3 times, most recently from f2fbc4f to 98b4c2a Compare April 30, 2024 11:02

github-actions bot added the hive Hive connector label Apr 30, 2024

Dith3r force-pushed the ke/gather-topn branch from 98b4c2a to 757a4d8 Compare April 30, 2024 12:22

martint reviewed Apr 30, 2024

View reviewed changes

Dith3r force-pushed the ke/gather-topn branch from 757a4d8 to b5ce243 Compare May 2, 2024 07:09

raunaqmorarka approved these changes May 2, 2024

View reviewed changes

Gather partial TopN results

e6d54f0

Gathering TopN avoids unnecessary network overhead, especially when both the number of splits and the TopN limit are big. Co-authored-by: Kamil Endruszkiewicz <kamil.endruszkiewicz@starburstdata.com>

Dith3r force-pushed the ke/gather-topn branch from b5ce243 to e6d54f0 Compare May 2, 2024 07:41

raunaqmorarka added the performance label May 2, 2024

raunaqmorarka merged commit a6474d8 into trinodb:master May 2, 2024
94 checks passed

github-actions bot added this to the 447 milestone May 2, 2024

colebow mentioned this pull request May 8, 2024

Add Trino 447 release notes #21873

Merged

Dith3r deleted the ke/gather-topn branch May 21, 2024 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gather partial TopN results #21761

Gather partial TopN results #21761

Dith3r commented Apr 30, 2024 •

edited by raunaqmorarka

lukasz-stec left a comment

martint Apr 30, 2024

lukasz-stec Apr 30, 2024

martint Apr 30, 2024

Gather partial TopN results #21761

Gather partial TopN results #21761

Conversation

Dith3r commented Apr 30, 2024 • edited by raunaqmorarka

Description

Additional context and related issues

Release notes

lukasz-stec left a comment

Choose a reason for hiding this comment

martint Apr 30, 2024

Choose a reason for hiding this comment

lukasz-stec Apr 30, 2024

Choose a reason for hiding this comment

martint Apr 30, 2024

Choose a reason for hiding this comment

Dith3r commented Apr 30, 2024 •

edited by raunaqmorarka