[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

karan-yb · 2023-02-21T23:49:16Z

Jira Link: DB-5611

Description

For partition key filters in range shaded database, we prefer Local Skip scan today which validates each row. Instead, we can use sequential scan with adjusted lower and upper bound based on the filter.

yugabyte=# create table t_asc_1m (a int,  primary key(a asc));
CREATE TABLE
yugabyte=# insert into t_asc_1m select generate_series(1, 1000000);
INSERT 0 1000000
yugabyte=# explain (analyze, dist) select * from t_asc_1m;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=2.905..2475.912 rows=1000000 loops=1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 2278.163 ms
 Planning Time: 3.966 ms
 Execution Time: 2549.952 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 2278.163 ms
 Peak Memory Usage: 0 kB
(9 rows)
                                                       ^
yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=3.492..3163.391 rows=999990 loops=1)
   Index Cond: (a > 10)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 2704.193 ms
 Planning Time: 0.072 ms
 Execution Time: 3248.633 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 2704.193 ms
 Peak Memory Usage: 0 kB
(10 rows)

The text was updated successfully, but these errors were encountered:

sushantrmishra · 2023-02-27T07:12:39Z

yugabyte=# select version();
                                                                                         version

--------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.17.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 15.0.3 (https://github.com/y
ugabyte/llvm-project.git 0b8d1183745fd3998d8beffeec8cbe99c1b20529), 64-bit
(1 row)

yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------
-
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=1.535..1225.910 rows=999990 loops=1)
   Remote Filter: (a > 10)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1073.001 ms
 Planning Time: 0.053 ms
 Execution Time: 1305.454 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1073.001 ms
 Peak Memory Usage: 0 kB
(10 rows)

yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------
------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=1.794..1447.192 row
s=999990 loops=1)
   Index Cond: (a > 10)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 1171.001 ms
 Planning Time: 0.062 ms
 Execution Time: 1527.544 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1171.001 ms
 Peak Memory Usage: 0 kB
(10 rows)

yugabyte=# explain (analyze, dist) select * from t_asc_1m;
                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------
--
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=1.498..1080.834 rows=1000000 loops=1
)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 951.001 ms
 Planning Time: 0.038 ms
 Execution Time: 1158.679 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 951.001 ms
 Peak Memory Usage: 0 kB
(9 rows)

rthallamko3 · 2023-03-03T00:24:12Z

Follow up on cost model: Should we adjust the threshold for switching between sequential scan and index scan for the above queries.

mtakahar · 2023-04-11T17:02:08Z

(updated with pk full scan w/o predicate case and made the search condition truly full range for more fair comparisons. Non-negligible overhead exists in Index Only Scan, too, with a range predicate.)

Index only scan using a secondary index on range key with or w/o predicate runs about the same or slightly faster than Seq Scan w/o predicate. ~~This indicates we are not doing value comparisons against every row in the range.~~

On the other hand, the Index Scan using the pkey takes almost the same amount of time as the Seq Scan with remote filter that is evaluated for every single row.

	Seq Scan	Index Scan (t_asc_1m_pkey)	Index Only Scan (i_t_asc_1m_a)
no predicate	895	920	806
a >= 1	1265	1289	949
a <= 1000000	1285	1244	952


create table t_asc_1m (a int,  primary key(a asc));
insert into t_asc_1m select generate_series(1, 1000000);
create index i_t_asc_1m_a on t_asc_1m (a asc);


select version();
                                                              version
------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.19.1.0-b0 on x86_64-apple-darwin22.4.0, compiled by Apple clang version 14.0.3 (clang-1403.0.22.14.1), 64-bit
(1 row)


-- no predicate, Seq Scan
explain (analyze, dist) select * from t_asc_1m;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=1.701..830.636 rows=1000000 loops=1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 723.078 ms
 Planning Time: 0.042 ms
 Execution Time: 895.126 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 723.078 ms
 Peak Memory Usage: 0 kB
(9 rows)

-- no predicate, Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m order by a;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..114.00 rows=1000 width=4) (actual time=1.567..861.209 rows=1000000 loops=1)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 619.449 ms
 Planning Time: 0.090 ms
 Execution Time: 920.188 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 619.449 ms
 Peak Memory Usage: 0 kB
(9 rows)

-- no predicate, Index Scan (secondary index)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..114.00 rows=1000 width=4) (actual time=1.571..747.469 rows=1000000 loops=1)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 530.621 ms
 Planning Time: 0.084 ms
 Execution Time: 806.153 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 530.621 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Seq Scan + Remote Filter
explain (analyze, dist) /*+ Set(enable_indexscan off) */select * from t_asc_1m where a >= 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=2.213..1198.874 rows=1000000 loops=1)
   Remote Filter: (a >= 1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1086.940 ms
 Planning Time: 0.087 ms
 Execution Time: 1265.200 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1086.940 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m where a >= 1;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=2.138..1228.078 rows=1000000 loops=1)
   Index Cond: (a >= 1)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 976.811 ms
 Planning Time: 0.074 ms
 Execution Time: 1288.997 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 976.811 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Index Only Scan (secondary)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m where a >= 1;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..5.12 rows=10 width=4) (actual time=1.859..889.037 rows=1000000 loops=1)
   Index Cond: (a >= 1)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 670.900 ms
 Planning Time: 0.098 ms
 Execution Time: 948.624 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 670.900 ms
 Peak Memory Usage: 0 kB
(11 rows)

-- with full range predicate (upper bound), Seq Scan + Remote Filter
explain (analyze, dist) /*+ Set(enable_indexscan off) */select * from t_asc_1m where a <= 1000000;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=2.331..1220.015 rows=1000000 loops=1)
   Remote Filter: (a <= 1000000)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1108.152 ms
 Planning Time: 0.090 ms
 Execution Time: 1284.939 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1108.152 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (upper bound), Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m where a <= 1000000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=2.374..1185.112 rows=1000000 loops=1)
   Index Cond: (a <= 1000000)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 942.843 ms
 Planning Time: 0.098 ms
 Execution Time: 1243.769 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 942.843 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (upper bound), Index Only Scan (secondary)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m where a <= 1000000;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..5.12 rows=10 width=4) (actual time=1.762..891.369 rows=1000000 loops=1)
   Index Cond: (a <= 1000000)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 664.875 ms
 Planning Time: 0.099 ms
 Execution Time: 952.431 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 664.875 ms
 Peak Memory Usage: 0 kB
(11 rows)

… boundary conditions Summary: This change checks if the Index scan condition can be encoded in bounds for the scan. If conditions are encoded in bound keys, then based on the new flag `index_scan_prefer_sequential_scan_for_boundary_condition`, we will disable use of ScanChoices (aka skip scan). To determine if a condition is encoded in bound key or not, we use the following logic: 1. We check for prefix conditions which are encoded in both min and max bound. For ex, equals on a key column. 2. If there is only one column min/max bound is specified after the prefix conditions. Note: this change only optimizes the range sharded tablets. This can be extended for hash sharded tablets when there is an equal condition for hash. Jira: DB-5611 Test Plan: ./yb_build.sh --sj ./build/latest/tests-docdb/scan_choices-test Reviewers: tnayak, tverona, rthallam Reviewed By: tnayak Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D28560

…can with boundary conditions" Summary: This reverts commit 2afd4fd. Reverting due to correctness issues. Jira: DB-5611 Test Plan: Jenkins Reviewers: rthallam Reviewed By: rthallam Subscribers: tnayak, ybase Differential Revision: https://phorge.dev.yugabyte.com/D28978

karan-yb added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Feb 21, 2023

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Feb 21, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Mar 3, 2023

rthallamko3 assigned sushantrmishra Mar 29, 2023

sushantrmishra assigned mtakahar Apr 10, 2023

yugabyte-ci unassigned sushantrmishra May 30, 2023

rthallamko3 assigned karan-yb Sep 28, 2023

rthallamko3 added the 2.20 Backport Required label Sep 28, 2023

rthallamko3 assigned spolitov and unassigned karan-yb Oct 9, 2023

gauravk-in assigned tanujnay112 Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

karan-yb commented Feb 21, 2023 •

edited by jira bot

sushantrmishra commented Feb 27, 2023 •

edited

rthallamko3 commented Mar 3, 2023

mtakahar commented Apr 11, 2023 •

edited

[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

Comments

karan-yb commented Feb 21, 2023 • edited by jira bot

Description

sushantrmishra commented Feb 27, 2023 • edited

rthallamko3 commented Mar 3, 2023

mtakahar commented Apr 11, 2023 • edited

karan-yb commented Feb 21, 2023 •

edited by jira bot

sushantrmishra commented Feb 27, 2023 •

edited

mtakahar commented Apr 11, 2023 •

edited