New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deletes with subqueries and compression #6789
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6789 +/- ##
==========================================
+ Coverage 80.06% 80.88% +0.81%
==========================================
Files 190 191 +1
Lines 37181 36513 -668
Branches 9450 9531 +81
==========================================
- Hits 29770 29534 -236
- Misses 2997 3174 +177
+ Partials 4414 3805 -609 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I would like for somebody like @akuzm to take a look at this since he is most familiar with DecompressChunk here, but the change seems straight forward.
-- check that DML causes transparent decompression and that | ||
-- data gets shifted to the uncompressed parts | ||
EXPLAIN (costs off) DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delete query shouldn't actually touch anything, right? So the fact that the data is decompressed here is a missing optimization that we should add at some point. Can you change this to actually delete one row from a compressed batch, to make the test more robust? Also might be good to add one chunk with at least two compressed batches, so that both parts of the partial chunk path are tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akuzm I believe it could be a chicken and egg
thingy because unless we decompress we might not get access to all the values that we need to check against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akuzm there are other tests which test out various scenarios with a mix of multiple partial, complete compressed, uncompressed chunks in tsl/test/sql/compression_ddl.sql already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akuzm I believe it could be a
chicken and egg
thingy because unless we decompress we might not get access to all the values that we need to check against.
I mean, imagine in the future we have a more optimal path for DELETEs, for example through TAM, and it's not going to decompress anything if the WHERE doesn't match anything. So the test will stop working. It would be good if some rows actually matched the clause and were deleted, e.g. if it was time >= ALL(...
, then the latest row would match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed
-- data gets shifted to the uncompressed parts | ||
EXPLAIN (costs off) DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials); | ||
DELETE FROM test_partials WHERE time <> ALL(SELECT time from test_partials); | ||
-- P, P, P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this comment needs updating :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, all 3 chunks will now become partial right? That's why the P, P, P
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, I didn't understand what it means and thought it was some temporary comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded the comment :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with this code tbh, but this looks like an improvement. The test should be made more robust though.
For UPDATEs and DELETEs when a compressed chunk is involved, the code decompresses the relevant data into the uncompressed portion of the chunk. This happens during execution, so it's possible that if the planner doesn't have a plan for the uncompressed chunk then we might miss scanning out on those decompressed rows. We now check for the possibility of a compressed chunk becoming partial during the planning itself and tag on an APPEND plan on top of scans on the compressed and uncompressed parts.
For UPDATEs and DELETEs when a compressed chunk is involved, the code decompresses the relevant data into the uncompressed portion of the chunk. This happens during execution, so it's possible that if the planner doesn't have a plan for the uncompressed chunk then we might miss scanning out on those decompressed rows. We now check for the possibility of a compressed chunk becoming partial during the planning itself and tag on an APPEND plan on top of scans on the compressed and uncompressed parts.