[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

ramsrivatsa · 2021-03-28T19:09:14Z

Jira Link: DB-3200

CREATE TABLE t1 (id int primary key, val int);
INSERT INTO t1 SELECT i, i FROM (SELECT generate_series(1, 100) i) t;
set yb_debug_log_docdb_requests=true;
SELECT * FROM t1 where t1.id=ANY(ARRAY[10,20,30,40,50]);

If the following set of queries are executed 5 docDB requests are being sent. These requests can be batched into 1 docDB request.

Note: docDB requests can be viewed in this file ~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs/postgresql*.log

The text was updated successfully, but these errors were encountered:

ramsrivatsa · 2021-03-28T19:10:15Z

@m-iancu @tanujnay112 @kmuthukk

Summary: Before this change, IN conditions bound to hash key columns produce one request per possible values of the hash keys. For example, consider a query `SELECT * FROM sample_table WHERE h1 IN (1,4,6,8);` where sample_table has a primary index with `h1` as its full hash component. We send 4 requests, one per each in element, of the form `SELECT * FROM sample_table WHERE h1 = 1;`, `SELECT * FROM sample_table WHERE h1 = 4;` etc. If the IN condition was bound to a range column, we would send the entire filter at once as a singular condition and send just one request per partition of `sample_table_pkey`. The reason why we couldn't do this with hash column IN filters was because the `DocRowwiseIterator` could not perform skip scans over hash columns so it did not have the necessary infrastructure to process IN conditions on hash columns. This diff fixes the above issue by having IN conditions on hash columns behave similar to those on range columns. In order to do this, we did the following changes: - We adjusted pgsql/qlscanspec and ScanChoices to be able to carry out skip scans on hash column IN conditions. - We added infrastructure in pg_doc_op.h to convert IN filters of the form `h1 IN (v1,v2,...,vn)` a condition expression of the form `(yb_hash_code(h1), h1) IN ((yb_hash_code(v1), v1), (yb_hash_code(v2), v2), (yb_hash_code(v3), v3), ..., (yb_hash_code(vn), vn))`. If we have multiple hash partitions on the table we form one request per partition and the RHS of the hash condition on each partition request is ensured to only have values from (v1,v2,...vn) that are relevant to it. This feature also works similarly for multicolumn hash keys. This feature is disabled when serializable isolation level is used for now as there isn't infrastructure to lock multiple non-contiguous rows as such filters would require. This feature's enablement is controlled by the autoflag GUC `yb_enable_hash_batch_in`. We also added a tserver flag `ysql_hash_batch_permutation_limit` that specifies a limit on the number of hash permutations a query must produce in order to be eligible to use this feature. Without this check, we can materialize an unbounded number of hash permutations in memory and cause an OOM crash. Test Plan: ``` ./yb_build.sh release --java-test org.yb.pgsql.TestPgRegressIndex ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressHashInQueries' ``` Reviewers: smishra, neil, amartsinchyk, kpopali Reviewed By: kpopali Subscribers: mbautin, kpopali, kannan, ssong, yql, mihnea, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D19672

ramsrivatsa self-assigned this Mar 28, 2021

ramsrivatsa added the kind/enhancement This is an enhancement of an existing feature label Mar 28, 2021

ramsrivatsa assigned m-iancu and tanujnay112 and unassigned m-iancu and tanujnay112 Mar 28, 2021

ramsrivatsa changed the title ~~[YSQL] Batching queries that use index lookup of elements inside an ANY array~~ [YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component Mar 29, 2021

ramsrivatsa assigned ramsrivatsa and sushantrmishra and unassigned ramsrivatsa Aug 15, 2022

sushantrmishra added the area/ysql Yugabyte SQL (YSQL) label Aug 15, 2022

yugabyte-ci added the priority/medium Medium priority issue label Aug 15, 2022

sushantrmishra mentioned this issue Sep 26, 2022

[YSQL] Improve performance for nested loop join execution #14199

Closed

tanujnay112 mentioned this issue Oct 13, 2022

[YSQL] Improve performance for nested loop join execution #14070

Open

8 tasks

tanujnay112 self-assigned this Mar 27, 2023

tanujnay112 closed this as completed Mar 27, 2023

yugabyte-ci unassigned ramsrivatsa Apr 9, 2023

sushantrmishra mentioned this issue Aug 21, 2023

[YSQL] Crash reported DocPgsqlScanSpec #18776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

ramsrivatsa commented Mar 28, 2021 •

edited by yugabyte-ci

Loading

ramsrivatsa commented Mar 28, 2021 •

edited

Loading

[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

[YSQL] Batching queries that use index lookup of elements inside an ANY array which uses only hash key component #7836

Comments

ramsrivatsa commented Mar 28, 2021 • edited by yugabyte-ci Loading

ramsrivatsa commented Mar 28, 2021 • edited Loading

ramsrivatsa commented Mar 28, 2021 •

edited by yugabyte-ci

Loading

ramsrivatsa commented Mar 28, 2021 •

edited

Loading