Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify redundant EXISTS #22167

Open
Tracked by #22114
sopel39 opened this issue May 27, 2024 · 1 comment
Open
Tracked by #22114

Simplify redundant EXISTS #22167

sopel39 opened this issue May 27, 2024 · 1 comment
Labels
subquery-cache Label for subquery cache relates issues

Comments

@sopel39
Copy link
Member

sopel39 commented May 27, 2024

Example query tpch/q21:

  AND EXISTS (
    SELECT 
      * 
    FROM 
      "lineitem" l2
    WHERE 
      l2.orderkey = l1.orderkey
      AND l2.suppkey <> l1.suppkey
  ) 
  AND NOT EXISTS (
    SELECT 
      * 
    FROM 
      "lineitem" l3
    WHERE 
      l3.orderkey = l1.orderkey 
      AND l3.suppkey <> l1.suppkey 
      AND l3.receiptdate > l3.commitdate
  ) 

With subquery canonicalisation and CommonSubqueryExtractor we can determine that first EXISTS is super set of the other. Let’s say A is represents:

SELECT 
      * 
    FROM 
      "lineitem" l2
    WHERE 
      l2.orderkey = l1.orderkey
      AND l2.suppkey <> l1.suppkey

and B represents:

SELECT 
      * 
    FROM 
      "lineitem" l3
    WHERE 
      l3.orderkey = l1.orderkey 
      AND l3.suppkey <> l1.suppkey 
      AND l3.receiptdate > l3.commitdate

with CommonSubqueryExtractor we can figure that adaptation predicate from A to B is l3.receiptdate > l3.commitdate. Therefore we can rewrite two EXISTS into single one with principle: C = A - B, which is A AND NOT (adaptation_predicate) which is

EXISTS (
    SELECT 
      * 
    FROM 
      "lineitem" l2
    WHERE 
      l2.orderkey = l1.orderkey
      AND l2.suppkey <> l1.suppkey
      AND NOT l3.receiptdate > l3.commitdate
  )
@sopel39 sopel39 added the subquery-cache Label for subquery cache relates issues label May 27, 2024
@sopel39
Copy link
Member Author

sopel39 commented May 27, 2024

Similarly we can eliminate overlapping IN-list predicates, e.g: tpch/q95:

AND ("ws1".ws_order_number IN (SELECT ws_order_number FROM ws_wh))
AND ("ws1".ws_order_number IN (SELECT wr_order_number FROM web_returns , ws_wh
WHERE (wr_order_number = ws_wh.ws_order_number)))

or simpler case:

AND col1 IN (SELECT col2 FROM table2) 
AND col1 IN (SELECT col2 FROM table2 WHERE col3 > 10) 

we could try to compute common subplan for such subplans and then it will be obvious that one subplan is superset of the other. In such case we can just remove one, redundant IN conjunct.

We can also extract common subplan for joins as in case of q95 common subquery can be LEFT JOIN with adaptation predicate on top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subquery-cache Label for subquery cache relates issues
Development

No branches or pull requests

1 participant