Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] For range sharded database, index scan with pk filter is slower than sequential scan #16178

Open
karan-yb opened this issue Feb 21, 2023 · 3 comments
Assignees
Labels
2.20 Backport Required area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@karan-yb
Copy link
Contributor

karan-yb commented Feb 21, 2023

Jira Link: DB-5611

Description

For partition key filters in range shaded database, we prefer Local Skip scan today which validates each row. Instead, we can use sequential scan with adjusted lower and upper bound based on the filter.

yugabyte=# create table t_asc_1m (a int,  primary key(a asc));
CREATE TABLE
yugabyte=# insert into t_asc_1m select generate_series(1, 1000000);
INSERT 0 1000000
yugabyte=# explain (analyze, dist) select * from t_asc_1m;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=2.905..2475.912 rows=1000000 loops=1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 2278.163 ms
 Planning Time: 3.966 ms
 Execution Time: 2549.952 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 2278.163 ms
 Peak Memory Usage: 0 kB
(9 rows)
                                                       ^
yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=3.492..3163.391 rows=999990 loops=1)
   Index Cond: (a > 10)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 2704.193 ms
 Planning Time: 0.072 ms
 Execution Time: 3248.633 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 2704.193 ms
 Peak Memory Usage: 0 kB
(10 rows)

@karan-yb karan-yb added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Feb 21, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Feb 21, 2023
@sushantrmishra
Copy link

sushantrmishra commented Feb 27, 2023

yugabyte=# select version();
                                                                                         version

--------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.17.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 15.0.3 (https://github.com/y
ugabyte/llvm-project.git 0b8d1183745fd3998d8beffeec8cbe99c1b20529), 64-bit
(1 row)

yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------
-
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=1.535..1225.910 rows=999990 loops=1)
   Remote Filter: (a > 10)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1073.001 ms
 Planning Time: 0.053 ms
 Execution Time: 1305.454 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1073.001 ms
 Peak Memory Usage: 0 kB
(10 rows)

yugabyte=# explain (analyze, dist) select * from t_asc_1m where a > 10;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------
------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=1.794..1447.192 row
s=999990 loops=1)
   Index Cond: (a > 10)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 1171.001 ms
 Planning Time: 0.062 ms
 Execution Time: 1527.544 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1171.001 ms
 Peak Memory Usage: 0 kB
(10 rows)

yugabyte=# explain (analyze, dist) select * from t_asc_1m;
                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------
--
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=1.498..1080.834 rows=1000000 loops=1
)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 951.001 ms
 Planning Time: 0.038 ms
 Execution Time: 1158.679 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 951.001 ms
 Peak Memory Usage: 0 kB
(9 rows)

@rthallamko3
Copy link
Contributor

Follow up on cost model: Should we adjust the threshold for switching between sequential scan and index scan for the above queries.

@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Mar 3, 2023
@mtakahar
Copy link
Contributor

mtakahar commented Apr 11, 2023

(updated with pk full scan w/o predicate case and made the search condition truly full range for more fair comparisons. Non-negligible overhead exists in Index Only Scan, too, with a range predicate.)

Index only scan using a secondary index on range key with or w/o predicate runs about the same or slightly faster than Seq Scan w/o predicate. This indicates we are not doing value comparisons against every row in the range.

On the other hand, the Index Scan using the pkey takes almost the same amount of time as the Seq Scan with remote filter that is evaluated for every single row.

Seq Scan Index Scan (t_asc_1m_pkey) Index Only Scan (i_t_asc_1m_a)
no predicate 895 920 806
a >= 1 1265 1289 949
a <= 1000000 1285 1244 952

create table t_asc_1m (a int,  primary key(a asc));
insert into t_asc_1m select generate_series(1, 1000000);
create index i_t_asc_1m_a on t_asc_1m (a asc);


select version();
                                                              version
------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.19.1.0-b0 on x86_64-apple-darwin22.4.0, compiled by Apple clang version 14.0.3 (clang-1403.0.22.14.1), 64-bit
(1 row)


-- no predicate, Seq Scan
explain (analyze, dist) select * from t_asc_1m;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..100.00 rows=1000 width=4) (actual time=1.701..830.636 rows=1000000 loops=1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 723.078 ms
 Planning Time: 0.042 ms
 Execution Time: 895.126 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 723.078 ms
 Peak Memory Usage: 0 kB
(9 rows)

-- no predicate, Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m order by a;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..114.00 rows=1000 width=4) (actual time=1.567..861.209 rows=1000000 loops=1)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 619.449 ms
 Planning Time: 0.090 ms
 Execution Time: 920.188 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 619.449 ms
 Peak Memory Usage: 0 kB
(9 rows)

-- no predicate, Index Scan (secondary index)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..114.00 rows=1000 width=4) (actual time=1.571..747.469 rows=1000000 loops=1)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 530.621 ms
 Planning Time: 0.084 ms
 Execution Time: 806.153 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 530.621 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Seq Scan + Remote Filter
explain (analyze, dist) /*+ Set(enable_indexscan off) */select * from t_asc_1m where a >= 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=2.213..1198.874 rows=1000000 loops=1)
   Remote Filter: (a >= 1)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1086.940 ms
 Planning Time: 0.087 ms
 Execution Time: 1265.200 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1086.940 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m where a >= 1;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=2.138..1228.078 rows=1000000 loops=1)
   Index Cond: (a >= 1)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 976.811 ms
 Planning Time: 0.074 ms
 Execution Time: 1288.997 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 976.811 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (lower bound), Index Only Scan (secondary)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m where a >= 1;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..5.12 rows=10 width=4) (actual time=1.859..889.037 rows=1000000 loops=1)
   Index Cond: (a >= 1)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 670.900 ms
 Planning Time: 0.098 ms
 Execution Time: 948.624 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 670.900 ms
 Peak Memory Usage: 0 kB
(11 rows)

-- with full range predicate (upper bound), Seq Scan + Remote Filter
explain (analyze, dist) /*+ Set(enable_indexscan off) */select * from t_asc_1m where a <= 1000000;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_asc_1m  (cost=0.00..102.50 rows=1000 width=4) (actual time=2.331..1220.015 rows=1000000 loops=1)
   Remote Filter: (a <= 1000000)
   Storage Table Read Requests: 977
   Storage Table Execution Time: 1108.152 ms
 Planning Time: 0.090 ms
 Execution Time: 1284.939 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 1108.152 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (upper bound), Index Scan (pk)
explain (analyze, dist) /*+ IndexScan(t_asc_1m t_asc_1m_pkey) */select * from t_asc_1m where a <= 1000000;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_asc_1m_pkey on t_asc_1m  (cost=0.00..4.11 rows=1 width=4) (actual time=2.374..1185.112 rows=1000000 loops=1)
   Index Cond: (a <= 1000000)
   Storage Index Read Requests: 977
   Storage Index Execution Time: 942.843 ms
 Planning Time: 0.098 ms
 Execution Time: 1243.769 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 942.843 ms
 Peak Memory Usage: 0 kB
(10 rows)

-- with full range predicate (upper bound), Index Only Scan (secondary)
explain (analyze, dist) /*+ IndexOnlyScan(t_asc_1m i_t_asc_1m_a) */select * from t_asc_1m where a <= 1000000;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i_t_asc_1m_a on t_asc_1m  (cost=0.00..5.12 rows=10 width=4) (actual time=1.762..891.369 rows=1000000 loops=1)
   Index Cond: (a <= 1000000)
   Heap Fetches: 0
   Storage Index Read Requests: 977
   Storage Index Execution Time: 664.875 ms
 Planning Time: 0.099 ms
 Execution Time: 952.431 ms
 Storage Read Requests: 977
 Storage Write Requests: 0
 Storage Execution Time: 664.875 ms
 Peak Memory Usage: 0 kB
(11 rows)

karan-yb added a commit that referenced this issue Sep 28, 2023
… boundary conditions

Summary:
This change checks if the Index scan condition can be encoded in bounds for the scan. If conditions are encoded in bound keys, then based on the new flag `index_scan_prefer_sequential_scan_for_boundary_condition`, we will disable use of ScanChoices (aka skip scan).

To determine if a condition is encoded in bound key or not, we use the following logic:
1. We check for prefix conditions which are encoded in both min and max bound. For ex, equals on a key column.
2. If there is only one column min/max bound is specified after the prefix conditions.

Note: this change only optimizes the range sharded tablets. This can be extended for hash sharded tablets when there is an equal condition for hash.
Jira: DB-5611

Test Plan:
./yb_build.sh --sj
./build/latest/tests-docdb/scan_choices-test

Reviewers: tnayak, tverona, rthallam

Reviewed By: tnayak

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D28560
spolitov added a commit that referenced this issue Oct 2, 2023
…can with boundary conditions"

Summary:
This reverts commit 2afd4fd.
Reverting due to correctness issues.
Jira: DB-5611

Test Plan: Jenkins

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: tnayak, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D28978
@rthallamko3 rthallamko3 assigned spolitov and unassigned karan-yb Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.20 Backport Required area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

7 participants