[YSQL] distinct pushdown requires a range predicate #16771

FranckPachot · 2023-04-09T10:50:45Z

Jira Link: DB-6127

Description

Issue: The distinct pushdown is not used when the Index Scan doesn't have an index condition.

Reproducible test case:

drop table demo;
create table demo as select generate_series(1,1000) a,42 b;
create index demo_b on demo (b asc);
explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b>0;

Without b>0 1000 rows have been fetched from DocDB and deduplicated in YSQL

   ->  Index Only Scan using demo_b on demo (actual time=4.592..4.856 rows=1000 loops=1)

With b>0 only 1 row which is the distinct value:

   ->  Index Only Scan using demo_b on demo (actual time=0.984..0.985 rows=1 loops=1)
         Index Cond: (b > 0)

Here is a more real-life test case:
https://dev.to/yugabyte/combining-postgresql-partial-index-yugabytedb-hybrid-scan-and-pushed-down-limit-to-get-a-scalable-bounded-count-distinct-3geo

Workaround: add a lower bound with a predicate in the WHERE clause, either by knowing the value (guaranteed by the datatype or check constraint) or with a (select min() from) which should be fast with the same index that allows hybrid scan

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

FranckPachot · 2024-01-06T20:23:48Z

I tested on 2.19.3 and it seems it is the opposite:

yugabyte=# select version();
                                                                                            version                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.19.3.0-b0 on aarch64-unknown-linux-gnu, compiled by clang version 16.0.6 (https://github.com/yugabyte/llvm-project.git 1e6329f40e5c531c09ade7015278078682293ebd), 64-bit
(1 row)

yugabyte=# explain (analyze on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=0.920..0.922 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=0.917..0.919 rows=1 loops=1)
         Distinct Prefix: 1
         Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 0.958 ms
 Peak Memory Usage: 8 kB
(7 rows)

yugabyte=# explain (analyze on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b>0;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Unique (actual time=1.170..1.593 rows=1 loops=1)
   ->  Index Only Scan using demo_b on demo (actual time=1.168..1.399 rows=1000 loops=1)
         Index Cond: (b > 0)
         Heap Fetches: 0
 Planning Time: 0.119 ms
 Execution Time: 1.634 ms
 Peak Memory Usage: 24 kB
(7 rows)

Here distinct is pushed down only without an index condition but is not pushed down with the index condition

pao214 · 2024-01-31T06:15:35Z

Our planner picks the "expected" plan when we run analyze on the table.

yugabyte=# analyze demo;
WARNING:  'analyze' is a beta feature!
LINE 1: analyze demo;
        ^
HINT:  Set 'ysql_beta_features' yb-tserver gflag to true to suppress the warning for all beta features.
ANALYZE

yugabyte=# select version();
                                                            version
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.21.1.0-b0 on arm-apple-darwin22.6.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
(1 row)

yugabyte=# explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=3.554..3.565 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=3.532..3.542 rows=1 loops=1)
         Distinct Prefix: 1
         Heap Fetches: 0
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 2.779 ms
         Storage Index Rows Scanned: 1
 Planning Time: 0.389 ms
 Execution Time: 3.727 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 2.779 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 2.779 ms
 Peak Memory Usage: 24 kB
(18 rows)

yugabyte=# explain (analyze on, dist on, costs off)
 /*+ IndexOnlyScan(demo) */
 select distinct b from demo where b > 0;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique (actual time=4.262..4.273 rows=1 loops=1)
   ->  Distinct Index Only Scan using demo_b on demo (actual time=4.254..4.264 rows=1 loops=1)
         Index Cond: (b > 0)
         Distinct Prefix: 1
         Heap Fetches: 0
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 3.189 ms
         Storage Index Rows Scanned: 1
 Planning Time: 0.471 ms
 Execution Time: 4.440 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 3.189 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 3.189 ms
 Peak Memory Usage: 24 kB
(19 rows)

Please reach out if there are any further questions.

FranckPachot added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 9, 2023

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 9, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug status/awaiting-triage Issue awaiting triage labels May 18, 2023

yugabyte-ci assigned sushantrmishra May 18, 2023

tanujnay112 assigned vbalvida Jan 30, 2024

yugabyte-ci closed this as completed Jan 31, 2024

yugabyte-ci assigned pao214 and unassigned vbalvida and sushantrmishra Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] distinct pushdown requires a range predicate #16771

[YSQL] distinct pushdown requires a range predicate #16771

FranckPachot commented Apr 9, 2023 •

edited by jira bot

FranckPachot commented Jan 6, 2024

pao214 commented Jan 31, 2024 •

edited

[YSQL] distinct pushdown requires a range predicate #16771

[YSQL] distinct pushdown requires a range predicate #16771

Comments

FranckPachot commented Apr 9, 2023 • edited by jira bot

Description

Warning: Please confirm that this issue does not contain any sensitive information

FranckPachot commented Jan 6, 2024

pao214 commented Jan 31, 2024 • edited

FranckPachot commented Apr 9, 2023 •

edited by jira bot

pao214 commented Jan 31, 2024 •

edited