New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Enabling CBO without running ANALYZE may lead to suboptimal plans #16825
Comments
Those plans highlighted in red and green are the same, including the row counts and the cost estimates. The red one is showing the runtime stats and the actual hash bucket adjustments happened at runtime, but green one doesn't. It is possible that the initial number of hash buckets and whether to do "batched hash join" (partitioned hash join in more general term) were different, but can't tell without the runtime information from the green one. Can you try it again with ANALYZE option specified to the EXPLAIN command for the faster run and post the plan diff? |
This is compact reproducer for other case:
|
In the compact repro above, we are not running ANALYZE. With CBO enabled, and without ANALYZE, we do not have cardinality estimates for the tables. Looks like we assume that cardinality is 1, so we end up with a nested loop join that in reality generates ~500M rows:
Perhaps, if we are missing cardinality estimates, we should fall back to some hard-coded cardinality (i.e. 10k rows) like we do in the heuristic model. |
Note - in vanilla PG, without stats, cardinality estimates are better. Perhaps it is estimating based on # of pages:
|
I briefly tried it myself with TAQO on my MacBook Pro and couldn't reproduce it. Now I see it was because I specified to run the analyze script from the command line.
I agree. I think we should either create a separate issue or repurpose this to track that.
I believe that's what it does. Postgres also uses 1000 rows as the default in some places though. |
#16097 is the most likely cause of the same hash join plan taking much longer when |
When The problem is that when the estimated output rows is 1, the CBO tends to chooses Nested Loop Join which is very expensive. PG’s logic to estimate reltuples is in We can fix the issue in YB in following ways,
@rthallamko3 Can you please comment how challenging it is to extract the size of SST tables from DocDB? We would need to expose this information to Cost Model and cache it somewhere too. |
@gauravk-in , Is the SST size the only unknown for CBO or are there other aspects. I would recommend going down the simpler approach - If analyze is not run, then assume 1000. Once analyze is run, all the statistics are populated and are available? |
…ot yet called Summary: Before this change, if `yb_enable_optimizer_statistics` is set to TRUE but `ANALYZE` has not been called, all tables seem to have 0 rows. This causes CBO to pick poor execution plans. For `yb_enable_optimizer_statistics=FALSE` this case was handled by setting tuple count to 1000, if statistics suggested that the table had 0 rows. In this change, the same logic is extended to `yb_enable_optimizer_statistic=TRUE`. This should make CBO produce better query plans. Jira: DB-6175 Test Plan: ybd --java-test 'org.yb.pgsql.TestPgYbOptimizerStatistics' Reviewers: tverona, tnayak, yguan Reviewed By: tnayak Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D30194
Jira Link: DB-6175
Description
Full list of queries can be found here, also subqueries set shows bad execution time.
https://github.com/qvad/taqo/blob/main/sql/complex/queries/distinct.sql
There is comparison between 2 runs - default and with CBO and TABLE ANALYZE.
This is one example from queries that started perform bad with CBO/ANALYZE
Here is execution plan difference:
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: