Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce Leading and join method hints with ANY_subquery alias for IN-subquery #8

Open
mtakahar opened this issue May 15, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@mtakahar
Copy link

mtakahar commented May 15, 2023

Postgres optimizer tries to convert an ANY- and IN- subquery expression to an inner join when it satisfies certain conditions, and when it does so, it assigns an internally generated table alias: ANY_subquery to the subquery.

Example:
An IN-subquery (the same as =ANY (...) ) expression, c2 in (select c1 from t100 where ...) was converted to an inner join, but the Leading hint was not followed. The optimizer actually generated a plan: Leading (( t10000w ( t1000 t100 ) )) NestLoop(t10000w t100) NestLoop(t10000w t100 t1000).

explain /*+ Leading (( ( t10000w t100 ) t1000 ))  HashJoin(t10000w t100) HashJoin(t10000w t100 t1000) SeqScan(t10000w) SeqScan(t100) SeqScan(t1000) */select * from t10000w where c2 in (select c1 from t100 where exists (select 0 from t1000 where c1 = t100.c2));

                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=20000000000.00..20000017610.50 rows=400 width=16396)
   Join Filter: (t10000w.c2 = t100.c1)
   ->  YB Seq Scan on t10000w  (cost=0.00..1000.00 rows=10000 width=16396)
   ->  Materialize  (cost=10000000000.00..10000001610.75 rows=100 width=4)
         ->  Nested Loop  (cost=10000000000.00..10000001610.25 rows=100 width=4)
               Join Filter: (t100.c2 = t1000.c1)
               ->  YB Seq Scan on t1000  (cost=0.00..100.00 rows=1000 width=4)
               ->  Materialize  (cost=0.00..10.50 rows=100 width=8)
                     ->  YB Seq Scan on t100  (cost=0.00..10.00 rows=100 width=8)
(9 rows)

We can get HashJoins by specifying ANY_subquery in place of t100, however, the Leading hint is still not followed:

explain /*+ Leading (( ( t10000w ANY_subquery ) t1000 ))  HashJoin(t10000w ANY_subquery) HashJoin(t10000w ANY_subquery t1000) SeqScan(t10000w) SeqScan(t100) SeqScan(t1000) */select * from t10000w where c2 in (select c1 from t100 where exists (select 0 from t1000 where c1 = t100.c2));
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Hash Semi Join  (cost=118.50..1149.20 rows=400 width=16396)
   Hash Cond: (t10000w.c2 = t100.c1)
   ->  YB Seq Scan on t10000w  (cost=0.00..1000.00 rows=10000 width=16396)
   ->  Hash  (cost=117.25..117.25 rows=100 width=4)
         ->  Hash Join  (cost=11.25..117.25 rows=100 width=4)
               Hash Cond: (t1000.c1 = t100.c2)
               ->  YB Seq Scan on t1000  (cost=0.00..100.00 rows=1000 width=4)
               ->  Hash  (cost=10.00..10.00 rows=100 width=8)
                     ->  YB Seq Scan on t100  (cost=0.00..10.00 rows=100 width=8)
(9 rows)

We need more investigations before we can implement this to the framework.

@mtakahar mtakahar added the enhancement New feature or request label May 15, 2023
@mtakahar mtakahar self-assigned this May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant