-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize queries with similar subplans (CTE) #5878
Comments
Initially we could focus on eliminating redundant predicates. |
@sopel39 , In my mind, one way to eliminate the redundant subquery is materializing the subquery to a temp table, then query from the temp table, so there is some work:
|
Either temp table or keep the data in memory. First step is to identify and unify CTEs. Just identifying common CTE could lead to improvements as we can execute certain subplans just once. |
I note from http://c.raqsoft.com/article/1571214718416:
It seems like the big payoff in q21 is pushing down the |
Pushing |
|
Superseded by #22167 |
Many queries also have similar subqueries. For instance:
tpch/q21
share similar subqueries in twoNOT EXIST/EXIST
correlated subqueries.tpcds/q95
has a redundantIN
predicate which can be removedThis would also enable us to optimize queries where:
tpcds/q95
)IN/EXISTS/NOT EXISTS/ALL
predicates (seetpch/q21
)Initially, we could reuse similar subplans only when we are sure that it won't cause performance degradation (without CBO).
An example of such optimization based on
tpch/q21
query:this translates to plan:
when reusing similar subplans this plan can be conceptually rewritten to:
We could base the similar subplans extraction on paper: http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2008/Seminar_MatViews/p533-zhou.pdf (Efficient Exploitation of Similar Subexpressions for Query Processing).
The paper describes a method for detecting similar subplans and constructing a reusable common subplan. The method works for
S(elect)P(roject)J(oin)G(roup By)
type of plans.The optimization could be introduced in following steps:
PlanSignatureEvaluator
, which can compute plan signature according to normalization rules (SPJG
plan intoTableScan -> Join -> Filter -> GroupBy
plan).CommonSubplanExtractor
that takes a list of plans as an input. The result would be a list of extracted common subplans with a mapping to original subplans (with detailed info which predicates and aggregates needs to be applied in order to have the same result as the original subplan). A conceptual output example:common_subplan_X
with predicatesp_y
and aggregatesaggr_y
applied should produce the same result asY
.We could then add following rules:
ApplyNodes
ApplyNodes
with similar subqueries.Original issue: prestodb/presto#6944
The text was updated successfully, but these errors were encountered: