Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] YSQL reads all the columns from DocDB when subset is enough #7047

Closed
d-uspenskiy opened this issue Feb 1, 2021 · 0 comments
Closed
Assignees
Labels
kind/enhancement This is an enhancement of an existing feature
Projects

Comments

@d-uspenskiy
Copy link
Contributor

For processing some of the SQL queries which uses subset of columns in their result YSQL reads all the columns from DocDB. Reading only necessary columns will reduce network traffic and improve the performance.

Example:

CREATE TABLE t(k INT PRIMARY KEY, short_data INT, long_data TEXT);
INSERT INTO t values(1, 1, 'long text');
SELECT * FROM t WHERE k = 1;
SELECT k FROM t WHERE k = 1;
SELECT short_data FROM t WHERE k = 1;
SELECT long_data FROM t WHERE k = 1;

All 4 selects sends same requests to tserver (with reading all 3 columns + ybctid)

stmt_id: 94082278811152
schema_version: 0
partition_column_values {
  value {
    int32_value: 1
  }
}
targets {
  column_id: 10
}
targets {
  column_id: 11
}
targets {
  column_id: 12
}
targets {
  column_id: -8
}
column_refs {
  ids: 10
  ids: 11
  ids: 12
}
is_forward_scan: true
is_aggregate: false
limit: 1024
return_paging_state: true
ysql_catalog_version: 2
table_id: "000030af00003000800000000000400a"

Note: queries without WHERE clause sends read request with only necessary columns.

@d-uspenskiy d-uspenskiy added the kind/enhancement This is an enhancement of an existing feature label Feb 1, 2021
@d-uspenskiy d-uspenskiy added this to Backlog in YSQL via automation Feb 1, 2021
@d-uspenskiy d-uspenskiy assigned d-uspenskiy and unassigned m-iancu Feb 13, 2021
YSQL automation moved this from Backlog to Done Mar 19, 2021
@d-uspenskiy d-uspenskiy reopened this Mar 24, 2021
YSQL automation moved this from Done to In progress Mar 24, 2021
d-uspenskiy added a commit that referenced this issue Apr 9, 2021
… scan

Summary:
For handling `SELECT` queries data for target columns and for columns in `WHERE` clause is required only.

```
CREATE TABLE table (k INT PRIMARY KEY, v1 INT, v2 INT, v3 INT);
SELECT v1 FROM table WHERE v3 > 1;
```

In the following example `v1` is required as target column and `v3` is required as column in `WHERE` clause (YB has an extra step to filter fetched tuples on postgres side as not all conditions can be pushed to DocDB ) .
It is not necessary to read `k`, `v2` columns.

In case of `INDEX SCAN` for primary key same approach can be used.

```
SELECT v1 FROM table WHERE k = 1;
```
Only `v1` and `k` is required.

To perform such kind of optimization in case of index scan YB code must know set of target columns for scan and columns in `WHERE` clause. This information can be retrieved from scan plan (`Scan` structure). It is extracted from postgres node structure and provided to YB code via `yb_scan_plan` field of the `IndexScanDescData` structure.

Current change fixes fixes all the case of index scans:
- Index Only Scan
- Index Scan with secondary index
- Index Scan with table primary key

**Note:**
1. It will be good to generalize setting required set of columns in case of `INDEX SCAN` and `SEQ SCAN`.
Now for `SEQ SCAN` case set of required target is built in different place - at the `ybcGetForeignPlan` function.

2. Fetching of `ybctid` in case of `Index Scan with primary index` and fetching of `ybbasectid` in case of `Index Only Scan` potentially can be omitted. But some extra work is required. This optimization is not implemented in context of this diff.

Test Plan:
New unit test has been added

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgColumnReadEfficiency'
```

Reviewers: rskannan, mihnea, alex, tnayak

Reviewed By: mihnea, alex, tnayak

Subscribers: rsami, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10601
YSQL automation moved this from In progress to Done Apr 9, 2021
YintongMa pushed a commit to YintongMa/yugabyte-db that referenced this issue May 26, 2021
…of index scan

Summary:
For handling `SELECT` queries data for target columns and for columns in `WHERE` clause is required only.

```
CREATE TABLE table (k INT PRIMARY KEY, v1 INT, v2 INT, v3 INT);
SELECT v1 FROM table WHERE v3 > 1;
```

In the following example `v1` is required as target column and `v3` is required as column in `WHERE` clause (YB has an extra step to filter fetched tuples on postgres side as not all conditions can be pushed to DocDB ) .
It is not necessary to read `k`, `v2` columns.

In case of `INDEX SCAN` for primary key same approach can be used.

```
SELECT v1 FROM table WHERE k = 1;
```
Only `v1` and `k` is required.

To perform such kind of optimization in case of index scan YB code must know set of target columns for scan and columns in `WHERE` clause. This information can be retrieved from scan plan (`Scan` structure). It is extracted from postgres node structure and provided to YB code via `yb_scan_plan` field of the `IndexScanDescData` structure.

Current change fixes fixes all the case of index scans:
- Index Only Scan
- Index Scan with secondary index
- Index Scan with table primary key

**Note:**
1. It will be good to generalize setting required set of columns in case of `INDEX SCAN` and `SEQ SCAN`.
Now for `SEQ SCAN` case set of required target is built in different place - at the `ybcGetForeignPlan` function.

2. Fetching of `ybctid` in case of `Index Scan with primary index` and fetching of `ybbasectid` in case of `Index Only Scan` potentially can be omitted. But some extra work is required. This optimization is not implemented in context of this diff.

Test Plan:
New unit test has been added

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgColumnReadEfficiency'
```

Reviewers: rskannan, mihnea, alex, tnayak

Reviewed By: mihnea, alex, tnayak

Subscribers: rsami, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This is an enhancement of an existing feature
Projects
YSQL
  
Done
Development

No branches or pull requests

2 participants