Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deletes with subqueries and compression #6789

Merged
merged 1 commit into from Mar 28, 2024

Conversation

nikkhils
Copy link
Contributor

For UPDATEs and DELETEs when a compressed chunk is involved, the code decompresses the relevant data into the uncompressed portion of the chunk. This happens during execution, so it's possible that if the planner doesn't have a plan for the uncompressed chunk then we might miss scanning out on those decompressed rows. We now check for the possibility of a compressed chunk becoming partial during the planning itself and tag on an APPEND plan on top of scans on the compressed and uncompressed parts.

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 80.88%. Comparing base (59f50f2) to head (0eb2c9e).
Report is 87 commits behind head on main.

Files Patch % Lines
tsl/src/nodes/decompress_chunk/decompress_chunk.c 72.72% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6789      +/-   ##
==========================================
+ Coverage   80.06%   80.88%   +0.81%     
==========================================
  Files         190      191       +1     
  Lines       37181    36513     -668     
  Branches     9450     9531      +81     
==========================================
- Hits        29770    29534     -236     
- Misses       2997     3174     +177     
+ Partials     4414     3805     -609     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I would like for somebody like @akuzm to take a look at this since he is most familiar with DecompressChunk here, but the change seems straight forward.

@nikkhils nikkhils requested a review from akuzm March 27, 2024 09:17
Comment on lines 1471 to 1473
-- check that DML causes transparent decompression and that
-- data gets shifted to the uncompressed parts
EXPLAIN (costs off) DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This delete query shouldn't actually touch anything, right? So the fact that the data is decompressed here is a missing optimization that we should add at some point. Can you change this to actually delete one row from a compressed batch, to make the test more robust? Also might be good to add one chunk with at least two compressed batches, so that both parts of the partial chunk path are tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akuzm I believe it could be a chicken and egg thingy because unless we decompress we might not get access to all the values that we need to check against.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akuzm there are other tests which test out various scenarios with a mix of multiple partial, complete compressed, uncompressed chunks in tsl/test/sql/compression_ddl.sql already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akuzm I believe it could be a chicken and egg thingy because unless we decompress we might not get access to all the values that we need to check against.

I mean, imagine in the future we have a more optimal path for DELETEs, for example through TAM, and it's not going to decompress anything if the WHERE doesn't match anything. So the test will stop working. It would be good if some rows actually matched the clause and were deleted, e.g. if it was time >= ALL(..., then the latest row would match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed

-- data gets shifted to the uncompressed parts
EXPLAIN (costs off) DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials);
DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials);
-- P, P, P
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment needs updating :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, all 3 chunks will now become partial right? That's why the P, P, P.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I didn't understand what it means and thought it was some temporary comment :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded the comment :-)

Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very familiar with this code tbh, but this looks like an improvement. The test should be made more robust though.

@mkindahl mkindahl removed their request for review March 28, 2024 07:38
For UPDATEs and DELETEs when a compressed chunk is involved, the code
decompresses the relevant data into the uncompressed portion of the
chunk. This happens during execution, so it's possible that if the
planner doesn't have a plan for the uncompressed chunk then we might
miss scanning out on those decompressed rows. We now check for the
possibility of a compressed chunk becoming partial during the planning
itself and tag on an APPEND plan on top of scans on the compressed and
uncompressed parts.
@nikkhils nikkhils enabled auto-merge (rebase) March 28, 2024 11:56
@nikkhils nikkhils merged commit ea5c7f1 into timescale:main Mar 28, 2024
40 of 41 checks passed
@nikkhils nikkhils deleted the del_subq branch March 28, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: DELETE FROM compressed chunks ignoring subquery
3 participants