[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

d-uspenskiy · 2021-02-01T06:53:23Z

For processing some of the SQL queries which uses subset of columns in their result YSQL reads all the columns from DocDB. Reading only necessary columns will reduce network traffic and improve the performance.

Example:

CREATE TABLE t(k INT PRIMARY KEY, short_data INT, long_data TEXT);
INSERT INTO t values(1, 1, 'long text');
SELECT * FROM t WHERE k = 1;
SELECT k FROM t WHERE k = 1;
SELECT short_data FROM t WHERE k = 1;
SELECT long_data FROM t WHERE k = 1;

All 4 selects sends same requests to tserver (with reading all 3 columns + ybctid)

stmt_id: 94082278811152
schema_version: 0
partition_column_values {
  value {
    int32_value: 1
  }
}
targets {
  column_id: 10
}
targets {
  column_id: 11
}
targets {
  column_id: 12
}
targets {
  column_id: -8
}
column_refs {
  ids: 10
  ids: 11
  ids: 12
}
is_forward_scan: true
is_aggregate: false
limit: 1024
return_paging_state: true
ysql_catalog_version: 2
table_id: "000030af00003000800000000000400a"

Note: queries without WHERE clause sends read request with only necessary columns.

The text was updated successfully, but these errors were encountered:

… scan Summary: For handling `SELECT` queries data for target columns and for columns in `WHERE` clause is required only. ``` CREATE TABLE table (k INT PRIMARY KEY, v1 INT, v2 INT, v3 INT); SELECT v1 FROM table WHERE v3 > 1; ``` In the following example `v1` is required as target column and `v3` is required as column in `WHERE` clause (YB has an extra step to filter fetched tuples on postgres side as not all conditions can be pushed to DocDB ) . It is not necessary to read `k`, `v2` columns. In case of `INDEX SCAN` for primary key same approach can be used. ``` SELECT v1 FROM table WHERE k = 1; ``` Only `v1` and `k` is required. To perform such kind of optimization in case of index scan YB code must know set of target columns for scan and columns in `WHERE` clause. This information can be retrieved from scan plan (`Scan` structure). It is extracted from postgres node structure and provided to YB code via `yb_scan_plan` field of the `IndexScanDescData` structure. Current change fixes fixes all the case of index scans: - Index Only Scan - Index Scan with secondary index - Index Scan with table primary key **Note:** 1. It will be good to generalize setting required set of columns in case of `INDEX SCAN` and `SEQ SCAN`. Now for `SEQ SCAN` case set of required target is built in different place - at the `ybcGetForeignPlan` function. 2. Fetching of `ybctid` in case of `Index Scan with primary index` and fetching of `ybbasectid` in case of `Index Only Scan` potentially can be omitted. But some extra work is required. This optimization is not implemented in context of this diff. Test Plan: New unit test has been added ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgColumnReadEfficiency' ``` Reviewers: rskannan, mihnea, alex, tnayak Reviewed By: mihnea, alex, tnayak Subscribers: rsami, yql Differential Revision: https://phabricator.dev.yugabyte.com/D10601

…of index scan Summary: For handling `SELECT` queries data for target columns and for columns in `WHERE` clause is required only. ``` CREATE TABLE table (k INT PRIMARY KEY, v1 INT, v2 INT, v3 INT); SELECT v1 FROM table WHERE v3 > 1; ``` In the following example `v1` is required as target column and `v3` is required as column in `WHERE` clause (YB has an extra step to filter fetched tuples on postgres side as not all conditions can be pushed to DocDB ) . It is not necessary to read `k`, `v2` columns. In case of `INDEX SCAN` for primary key same approach can be used. ``` SELECT v1 FROM table WHERE k = 1; ``` Only `v1` and `k` is required. To perform such kind of optimization in case of index scan YB code must know set of target columns for scan and columns in `WHERE` clause. This information can be retrieved from scan plan (`Scan` structure). It is extracted from postgres node structure and provided to YB code via `yb_scan_plan` field of the `IndexScanDescData` structure. Current change fixes fixes all the case of index scans: - Index Only Scan - Index Scan with secondary index - Index Scan with table primary key **Note:** 1. It will be good to generalize setting required set of columns in case of `INDEX SCAN` and `SEQ SCAN`. Now for `SEQ SCAN` case set of required target is built in different place - at the `ybcGetForeignPlan` function. 2. Fetching of `ybctid` in case of `Index Scan with primary index` and fetching of `ybbasectid` in case of `Index Only Scan` potentially can be omitted. But some extra work is required. This optimization is not implemented in context of this diff. Test Plan: New unit test has been added ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgColumnReadEfficiency' ``` Reviewers: rskannan, mihnea, alex, tnayak Reviewed By: mihnea, alex, tnayak Subscribers: rsami, yql Differential Revision: https://phabricator.dev.yugabyte.com/D10601

d-uspenskiy added the kind/enhancement This is an enhancement of an existing feature label Feb 1, 2021

d-uspenskiy assigned m-iancu Feb 1, 2021

d-uspenskiy added this to Backlog in YSQL via automation Feb 1, 2021

d-uspenskiy assigned d-uspenskiy and unassigned m-iancu Feb 13, 2021

d-uspenskiy closed this as completed Mar 19, 2021

YSQL automation moved this from Backlog to Done Mar 19, 2021

d-uspenskiy reopened this Mar 24, 2021

YSQL automation moved this from Done to In progress Mar 24, 2021

d-uspenskiy closed this as completed Apr 9, 2021

YSQL automation moved this from In progress to Done Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

d-uspenskiy commented Feb 1, 2021

[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

Comments

d-uspenskiy commented Feb 1, 2021