Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] distinct pushdown requires a range predicate #16771

Closed
1 task done
FranckPachot opened this issue Apr 9, 2023 · 2 comments
Closed
1 task done

[YSQL] distinct pushdown requires a range predicate #16771

FranckPachot opened this issue Apr 9, 2023 · 2 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@FranckPachot
Copy link
Contributor

FranckPachot commented Apr 9, 2023

Jira Link: DB-6127

Description

Issue: The distinct pushdown is not used when the Index Scan doesn't have an index condition.

Reproducible test case:

drop table demo;
create table demo as select generate_series(1,1000) a,42 b;
create index demo_b on demo (b asc);
explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b>0;

Without b>0 1000 rows have been fetched from DocDB and deduplicated in YSQL

   ->  Index Only Scan using demo_b on demo (actual time=4.592..4.856 rows=1000 loops=1)

With b>0 only 1 row which is the distinct value:

   ->  Index Only Scan using demo_b on demo (actual time=0.984..0.985 rows=1 loops=1)
         Index Cond: (b > 0)

Here is a more real-life test case:
https://dev.to/yugabyte/combining-postgresql-partial-index-yugabytedb-hybrid-scan-and-pushed-down-limit-to-get-a-scalable-bounded-count-distinct-3geo

Workaround: add a lower bound with a predicate in the WHERE clause, either by knowing the value (guaranteed by the datatype or check constraint) or with a (select min() from) which should be fast with the same index that allows hybrid scan

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@FranckPachot FranckPachot added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 9, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 9, 2023
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug status/awaiting-triage Issue awaiting triage labels May 18, 2023
@FranckPachot
Copy link
Contributor Author

I tested on 2.19.3 and it seems it is the opposite:

yugabyte=# select version();
                                                                                            version                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.19.3.0-b0 on aarch64-unknown-linux-gnu, compiled by clang version 16.0.6 (https://github.com/yugabyte/llvm-project.git 1e6329f40e5c531c09ade7015278078682293ebd), 64-bit
(1 row)

yugabyte=# explain (analyze on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=0.920..0.922 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=0.917..0.919 rows=1 loops=1)
         Distinct Prefix: 1
         Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 0.958 ms
 Peak Memory Usage: 8 kB
(7 rows)

yugabyte=# explain (analyze on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b>0;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Unique (actual time=1.170..1.593 rows=1 loops=1)
   ->  Index Only Scan using demo_b on demo (actual time=1.168..1.399 rows=1000 loops=1)
         Index Cond: (b > 0)
         Heap Fetches: 0
 Planning Time: 0.119 ms
 Execution Time: 1.634 ms
 Peak Memory Usage: 24 kB
(7 rows)

Here distinct is pushed down only without an index condition but is not pushed down with the index condition

@pao214
Copy link
Contributor

pao214 commented Jan 31, 2024

Our planner picks the "expected" plan when we run analyze on the table.

yugabyte=# analyze demo;
WARNING:  'analyze' is a beta feature!
LINE 1: analyze demo;
        ^
HINT:  Set 'ysql_beta_features' yb-tserver gflag to true to suppress the warning for all beta features.
ANALYZE

yugabyte=# select version();
                                                            version
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.21.1.0-b0 on arm-apple-darwin22.6.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
(1 row)

yugabyte=# explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=3.554..3.565 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=3.532..3.542 rows=1 loops=1)
         Distinct Prefix: 1
         Heap Fetches: 0
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 2.779 ms
         Storage Index Rows Scanned: 1
 Planning Time: 0.389 ms
 Execution Time: 3.727 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 2.779 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 2.779 ms
 Peak Memory Usage: 24 kB
(18 rows)

yugabyte=# explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b > 0;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=4.262..4.273 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=4.254..4.264 rows=1 loops=1)
         Index Cond: (b > 0)
         Distinct Prefix: 1
         Heap Fetches: 0
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 3.189 ms
         Storage Index Rows Scanned: 1
 Planning Time: 0.471 ms
 Execution Time: 4.440 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 3.189 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 3.189 ms
 Peak Memory Usage: 24 kB
(19 rows)

Please reach out if there are any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

5 participants