Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Enabling CBO without running ANALYZE may lead to suboptimal plans #16825

Open
1 task done
qvad opened this issue Apr 12, 2023 · 8 comments
Open
1 task done

[YSQL] Enabling CBO without running ANALYZE may lead to suboptimal plans #16825

qvad opened this issue Apr 12, 2023 · 8 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@qvad
Copy link
Contributor

qvad commented Apr 12, 2023

Jira Link: DB-6175

Description

Full list of queries can be found here, also subqueries set shows bad execution time.
https://github.com/qvad/taqo/blob/main/sql/complex/queries/distinct.sql

There is comparison between 2 runs - default and with CBO and TABLE ANALYZE.

This is one example from queries that started perform bad with CBO/ANALYZE

SELECT DISTINCT t500000.c_int,
                t50000.c_bool
FROM   t1000000 right
    OUTER JOIN t500000
        ON t1000000.c_text = t500000.c_text right
    OUTER JOIN t50000
        ON t1000000.c_text = t50000.c_text
WHERE  t1000000.c_int < 474525
ORDER BY t500000.c_int, t50000.c_bool desc limit 1000 offset 10

Here is execution plan difference:
Screenshot 2023-04-12 at 19 02 30

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@qvad qvad added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 12, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 12, 2023
@mtakahar
Copy link
Contributor

Those plans highlighted in red and green are the same, including the row counts and the cost estimates. The red one is showing the runtime stats and the actual hash bucket adjustments happened at runtime, but green one doesn't.

It is possible that the initial number of hash buckets and whether to do "batched hash join" (partitioned hash join in more general term) were different, but can't tell without the runtime information from the green one.

Can you try it again with ANALYZE option specified to the EXPLAIN command for the faster run and post the plan diff?

@tverona1 tverona1 changed the title [YSQL] Enabling CBO may lead to significat performance decreasing for some queries [YSQL] Enabling CBO may lead to significant performance decreasing for some queries May 2, 2023
@qvad
Copy link
Contributor Author

qvad commented May 2, 2023

This is compact reproducer for other case:

CREATE TABLE t1
    WITH (colocation = true) AS
SELECT k1,
       (k1 + 0.0001)::text as k2,
       k1 as v1,
       (k1 + 0.0002):: varchar as v2 FROM generate_Series(1, 50000) k1;


CREATE TABLE t2
    WITH (colocation = true) AS
SELECT k1,
       (k1 + 0.0001)::text as k2,
       k1 as v1,
       (k1 + 0.0002):: varchar as v2 FROM generate_Series(1, 100000) k1;

-- w/o CBO
SELECT t1.k1,
      t1.k2,
       t2.v1,
      t2.v2
FROM   t1 join t2
        ON t1.k1 = t2.k1
WHERE  t1.k1 >= 2500
   and t1.k1 < 25100
   and t2.k1 >= 2500
   and t2.k1 < 25100
GROUP BY t1.k1, t1.k2, t2.v1, t2.v2;

--enable CBO
SET yb_enable_optimizer_statistics = true;
SELECT t1.k1,
      t1.k2,
       t2.v1,
      t2.v2
FROM   t1 join t2
        ON t1.k1 = t2.k1
WHERE  t1.k1 >= 2500
   and t1.k1 < 25100
   and t2.k1 >= 2500
   and t2.k1 < 25100
GROUP BY t1.k1, t1.k2, t2.v1, t2.v2;

@tverona1
Copy link
Contributor

tverona1 commented May 2, 2023

In the compact repro above, we are not running ANALYZE. With CBO enabled, and without ANALYZE, we do not have cardinality estimates for the tables. Looks like we assume that cardinality is 1, so we end up with a nested loop join that in reality generates ~500M rows:

With CBO (without running ANALYZ prior):
 HashAggregate  (cost=0.02..0.03 rows=1 width=72)
   Group Key: t1.k1, t1.k2, t2.v1, t2.v2
   ->  Nested Loop  (cost=0.00..0.01 rows=1 width=72)
         Join Filter: (t1.k1 = t2.k1)
         ->  Seq Scan on t1  (cost=0.00..0.00 rows=1 width=36)
               Remote Filter: ((k1 >= 2500) AND (k1 < 25100))
         ->  Seq Scan on t2  (cost=0.00..0.00 rows=1 width=40)
               Remote Filter: ((k1 >= 2500) AND (k1 < 25100))

Perhaps, if we are missing cardinality estimates, we should fall back to some hard-coded cardinality (i.e. 10k rows) like we do in the heuristic model.

@tverona1
Copy link
Contributor

tverona1 commented May 2, 2023

Note - in vanilla PG, without stats, cardinality estimates are better. Perhaps it is estimating based on # of pages:

CREATE TABLE t1
    WITH (autovacuum_enabled = false) AS
SELECT k1,
       (k1 + 0.0001)::text as k2,
       k1 as v1,
       (k1 + 0.0002):: varchar as v2 FROM generate_Series(1, 50000) k1;

CREATE TABLE t2
    WITH (autovacuum_enabled = false) AS
SELECT k1,
       (k1 + 0.0001)::text as k2,
       k1 as v1,
       (k1 + 0.0002):: varchar as v2 FROM generate_Series(1, 100000) k1;
explain SELECT t1.k1,
      t1.k2,
       t2.v1,
      t2.v2
FROM   t1 join t2
        ON t1.k1 = t2.k1
WHERE  t1.k1 >= 2500
   and t1.k1 < 25100
   and t2.k1 >= 2500
   and t2.k1 < 25100
GROUP BY t1.k1, t1.k2, t2.v1, t2.v2;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Group  (cost=2464.43..2467.20 rows=222 width=72)
   Group Key: t1.k1, t1.k2, t2.v1, t2.v2
   ->  Sort  (cost=2464.43..2464.98 rows=222 width=72)
         Sort Key: t1.k1, t1.k2, t2.v1, t2.v2
         ->  Hash Join  (cost=816.98..2455.78 rows=222 width=72)
               Hash Cond: (t2.k1 = t1.k1)
               ->  Seq Scan on t2  (cost=0.00..1630.24 rows=298 width=40)
                     Filter: ((k1 >= 2500) AND (k1 < 25100))
               ->  Hash  (cost=815.12..815.12 rows=149 width=36)
                     ->  Seq Scan on t1  (cost=0.00..815.12 rows=149 width=36)
                           Filter: ((k1 >= 2500) AND (k1 < 25100))

@mtakahar
Copy link
Contributor

mtakahar commented May 2, 2023

In the compact repro above, we are not running ANALYZE.

I briefly tried it myself with TAQO on my MacBook Pro and couldn't reproduce it. Now I see it was because I specified to run the analyze script from the command line.

Perhaps, if we are missing cardinality estimates, we should fall back to some hard-coded cardinality (i.e. 10k rows) like we do in the heuristic model.

I agree. I think we should either create a separate issue or repurpose this to track that.

Perhaps it is estimating based on # of pages:

I believe that's what it does. Postgres also uses 1000 rows as the default in some places though.

@mtakahar
Copy link
Contributor

mtakahar commented May 9, 2023

#16097 is the most likely cause of the same hash join plan taking much longer when yb_enable_optimizer_statistics is turned on but the tables are unanalyzed.

@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug status/awaiting-triage Issue awaiting triage labels May 17, 2023
@tverona1 tverona1 changed the title [YSQL] Enabling CBO may lead to significant performance decreasing for some queries [YSQL] Enabling CBO without running ANALYZE may lead to suboptimal plans Oct 16, 2023
@gauravk-in
Copy link
Contributor

When yb_enable_optimizer_statistics is disabled, YB uses the heuristic 1000 rows per table. When enabled, we take the value from pg_class.reltuples. This is 0 when ANALYZE is not run and gets “clamped” to 1.

The problem is that when the estimated output rows is 1, the CBO tends to chooses Nested Loop Join which is very expensive.

PG’s logic to estimate reltuples is in estimate_rel_size method. They compute the size of the row and divide the total size of the table by that to estimate the number of tuples. When the table is empty, they assume it has 10 pages. This is why their estimate is never 0.

We can fix the issue in YB in following ways,

  • To keep things simple (as @mtakahar has also suggested) if pg_class.reltuples == 0 we can return 1000.
  • Like PG, when reltuples is 0, we can get the size of the SST files from docdb, compute the width of the tuple and use that to compute the number of tuples.

@rthallamko3 Can you please comment how challenging it is to extract the size of SST tables from DocDB? We would need to expose this information to Cost Model and cache it somewhere too.

@gauravk-in gauravk-in self-assigned this Oct 25, 2023
@rthallamko3
Copy link
Contributor

@gauravk-in , Is the SST size the only unknown for CBO or are there other aspects. I would recommend going down the simpler approach - If analyze is not run, then assume 1000. Once analyze is run, all the statistics are populated and are available?

gauravk-in added a commit that referenced this issue Dec 7, 2023
…ot yet called

Summary:
Before this change, if `yb_enable_optimizer_statistics` is set to TRUE but
`ANALYZE` has not been called, all tables seem to have 0 rows. This causes CBO
to pick poor execution plans. For `yb_enable_optimizer_statistics=FALSE` this
case was handled by setting tuple count to 1000, if statistics suggested that
the table had 0 rows.

In this change, the same logic is extended to
`yb_enable_optimizer_statistic=TRUE`. This should make CBO produce better query
plans.
Jira: DB-6175

Test Plan: ybd --java-test 'org.yb.pgsql.TestPgYbOptimizerStatistics'

Reviewers: tverona, tnayak, yguan

Reviewed By: tnayak

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D30194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

6 participants