Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index support for compress chunk #4821

Merged
merged 1 commit into from Dec 15, 2022

Conversation

shhnwz
Copy link
Contributor

@shhnwz shhnwz commented Oct 13, 2022

Index support for compress chunk

It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'

Fixes #4617

@CLAassistant
Copy link

CLAassistant commented Oct 13, 2022

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Oct 13, 2022

Codecov Report

Merging #4821 (d1a5bb7) into main (cbf5180) will increase coverage by 0.03%.
The diff coverage is 94.65%.

❗ Current head d1a5bb7 differs from pull request most recent head e45a557. Consider uploading reports for the commit e45a557 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4821      +/-   ##
==========================================
+ Coverage   89.59%   89.63%   +0.03%     
==========================================
  Files         227      226       -1     
  Lines       51623    51220     -403     
==========================================
- Hits        46252    45910     -342     
+ Misses       5371     5310      -61     
Impacted Files Coverage Δ
tsl/src/compression/compression.c 96.09% <94.61%> (-0.12%) ⬇️
src/guc.c 94.11% <100.00%> (ø)
tsl/src/remote/copy_fetcher.c 78.39% <0.00%> (-6.81%) ⬇️
src/loader/bgw_message_queue.c 86.36% <0.00%> (-2.28%) ⬇️
src/nodes/chunk_dispatch.c 92.85% <0.00%> (-0.90%) ⬇️
src/time_bucket.c 95.28% <0.00%> (-0.54%) ⬇️
tsl/src/remote/connection.c 85.10% <0.00%> (-0.36%) ⬇️
tsl/src/fdw/modify_exec.c 84.97% <0.00%> (-0.26%) ⬇️
tsl/src/continuous_aggs/options.c 95.10% <0.00%> (-0.25%) ⬇️
tsl/src/continuous_aggs/create.c 87.68% <0.00%> (-0.22%) ⬇️
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbf5180...e45a557. Read the comment docs.

@shhnwz shhnwz self-assigned this Oct 14, 2022
@akuzm
Copy link
Member

akuzm commented Oct 14, 2022

Needs more tests, probably good to put them into a separate test file.

  1. some way to tell from the tests which code is used -- maybe a debug message? I don't really like them because then suddenly other debug messages pop up from unrelated parts of code, and your test becomes flaky. Maybe we can add a debug option which enables an INFO message about which code path is taken.
  2. all combinations of desc/asc and nulls first/last that match the index or not.
  3. different index type -- we can only use btree here, not gist and so on.

@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch 2 times, most recently from 489ecc5 to d85c7e1 Compare October 27, 2022 15:07
@shhnwz shhnwz requested a review from akuzm October 28, 2022 10:30
Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing check for collation. Can you add tests with different collation. In the index check the collation must match for an index to be elegible for datatypes that are collation sensitive.

@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch from d85c7e1 to 5b347c0 Compare November 14, 2022 15:52
src/guc.c Outdated Show resolved Hide resolved
@akuzm
Copy link
Member

akuzm commented Nov 15, 2022

Looks like we still need more tests:

  1. When collation doesn't match
  2. Two segmentby columns, both the case where they require the same order and the different one. Codecov seems to complain that this is not covered.
  3. Would be good to construct some kind of test that shows the compression went correctly, and the compressed groups are the same as w/o the index. Maybe select segmentby + metadata columns from the compressed chunk?

@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch 6 times, most recently from ce44416 to 15d28f5 Compare November 22, 2022 09:54
@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch from 15d28f5 to 3984c45 Compare November 23, 2022 10:37
@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch from 3984c45 to bbd312e Compare November 25, 2022 09:33
@shhnwz shhnwz requested a review from akuzm November 25, 2022 10:13
else
{
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not resetting current_direction here, this will lead to an index still being chosen if the last column didn't match, right?

We have to fix some things:

  1. add a test for this bug. Try to reproduce it on the unfixed version and verify that the last column is chosen incorrectly.
  2. Reset current_direction at the start of the loop to avoid bugs like this.
  3. Rewrite this as early exit from the loop:
					if (att_num == 0 || index_info->ii_IndexAttrNumbers[i] != att_num)
					{
					    break;
					}

Long if branches are hard to read, so early exit with break/return is preferrable.

bool is_null_first =
COMPRESSIONCOL_IS_SEGMENT_BY(keys[i]) ? false : keys[i]->orderby_nullsfirst;

if (att_num > 0 && index_info->ii_IndexAttrNumbers[i] == att_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attnum can't be zero here, right? The orderby/segmentby keys must be present in the table. We should check this with an assertion.

@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch from bbd312e to 33b8b38 Compare December 5, 2022 06:57
@shhnwz shhnwz requested a review from akuzm December 5, 2022 12:14
bool is_null_first =
COMPRESSIONCOL_IS_SEGMENT_BY(keys[i]) ? false : keys[i]->orderby_nullsfirst;

if (att_num == 0 || index_info->ii_IndexAttrNumbers[i] != att_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can att_num == 0 happen in practice? This means that the orderby/segmentby column is not found in the uncompressed chunk. That would be either a program error or a data corruption. Let's use Ensure macro to check this, not if.

@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch from 33b8b38 to 447b682 Compare December 12, 2022 05:36
@shhnwz shhnwz enabled auto-merge (rebase) December 12, 2022 09:03
@shhnwz shhnwz disabled auto-merge December 12, 2022 09:43
@shhnwz shhnwz force-pushed the add-index-support-to-compression-path branch 5 times, most recently from de7c95d to 9655ec7 Compare December 13, 2022 04:54
It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Add index support to compress_chunk
5 participants