-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed batch look ahead in compressed sorted merge #5798
Conversation
f1328d9
to
9212e4d
Compare
Codecov Report
@@ Coverage Diff @@
## main #5798 +/- ##
==========================================
- Coverage 87.85% 87.66% -0.19%
==========================================
Files 239 239
Lines 55649 55648 -1
Branches 12322 12322
==========================================
- Hits 48889 48786 -103
- Misses 4873 4929 +56
- Partials 1887 1933 +46
... and 36 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I don't really understand what's going on anymore :) Can we make the "filtering" happen strictly before the "heap" part? I.e. for each batch, we find the next tuple that passes the filter, and then use this tuple to find out which batch is the top one. |
* Therefore, we must continue opening additional batches until the condition | ||
* is met. | ||
*/ | ||
if (first_tuple_returned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this also means that in case a filter removes all 1st elements - it will open all batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might want to open all batches; up to the current peek
element - or not?
...or you could probably move the filter out from applying it during read right away - and retain the old logic
...or also add the chunk boundaries to the heap - and always use the max
level to pull in new batches; possibly pull in by min for the 1st element - but the peek
based approach is probably simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, if the first element is removed from the batch, the existing look-ahead logic can not use the returned tuple to decide if we have to open more batches.
This PR would exclude batches from the look-ahead logic if the first tuple was filtered. This could lead to a situation where we open more batches than needed.
Returning the first tuple from the batch, regardless if it satisfies the filter condition or not, could indeed help to reduce the number of open batches. However, this would also require some extra logic, which I wanted to avoid because this algorithm already has many special cases and I didn't want to make it more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a filter has p
probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:
(first row is filtered) * (at least 1 match in remaining)
(first row is filtered) * (1-(all rows filtered))
- which is:
p * (1-p^999)
- this function can get very close to
1
https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29 - which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches
but the best would be to do something with it later...dual-heap or something else; doesn't matter.
@@ -1027,9 +1028,11 @@ decompress_get_next_tuple_from_batch(DecompressChunkState *chunk_state, | |||
if (is_valid_tuple) | |||
{ | |||
Assert(!TTS_EMPTY(decompressed_slot_projected)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think method name decompress_chunk_perform_select_project
is misleading; it does a filter and projection.
...and the comment inside it about a 1000 rows
looks invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion on how the function should be named?
Could you elaborate on what is wrong with the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think decompress_chunk_perform_filter_project
makes more sense - as select
in this context is project
The comment suggests that it will only reset the ctx after a ~1000 ;I think thats not true because:
- it does it during every method call
- and the method seem to be invoked for-each-row
- on one branch it also increments the number of filtered rows by 1
in the current form - whatever caching that ctx could do is thrown away; so its not able to cache anything...but we have a cloudy comment why we are doing that
if you look at the original changeset which added this comment - the situation was much different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think decompress_chunk_perform_filter_project makes more sense - as select in this context is project
No, that's the proper term, it's just the relational algebra and SQL conspiracy to confuse us -- SELECT
is a projection and WHERE
is a selection :)
Regarding the comment, I also noticed this, because it started showing up in profiles. Anyway, let's keep this for the later, this PR should be a quick backportable fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure; that also makes sense; but in that case - the term is selection
and not select
; if you look at explain plans it uses the word filter
... and that method internally also invokes InstrCountFiltered1
which ; I'm just saying that because of the way the method was named I had to read it thru because of the confusion => I think the name of the method did a bad job in describing what it does....maybe: decompress_chunk_perform_selection_project
- but no big deal.
[...] it started showing up in profiles [...]
ok - and how much it worth? I guess it worth the most when there is an udf on a segmentby column.
but I think this comment should be removed...as its not valid anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - and how much it worth?
Not much, 1-2% of the total query CPU time.
@akuzm This should already be implemented in that way. In The function is called from |
OK, I think I understand, this PR relates to how we determine whether we need a new batch. For that we have to check if the last batch is unopened. Maybe it's OK as a quick fix, I struggle to understand it fully. I think I'll have to refactor it later along the lines of what I mentioned in the original PR. I'm sticking the vectorized filters in the same place, and then the aggregations, and it's becoming more and more confusing :) Something like this:
The usage of queue:
For the sorted merge queue, I'd make the data structure two-part: the heap + the explicit next unopened batch. To answer whether we need more batches, we'd compare the metadata of the unopened one (doesn't require decompression) to the current top decompressed tuple. I think this might simplify the logic, in particular the place you're changing here. |
In decompress_sorted_merge_get_next_tuple it is determine how many batches need to be opened currently to perform a sorted merge. This is done by checking if the first tuple from the last opened batch is larger than the last returned tuple. If a filter removes the first tuple, the first into the heap inserted tuple from this batch can no longer be used to perform the check. This patch fixes the wrong batch look ahead. Fixes: timescale#5797
9212e4d
to
1c2c251
Compare
@akuzm Sure, we should refactor the compression API before we add the vectorized optimizations. It's starting to become more and more complex. Once we have a clearer understanding of our requirements for vectorized operations, we can create an API proposal.
I have not implemented this so far, because this requires another compare function that can handle a comparison between metadata and the top tuple of the heap and this introduces additional complexity. In the current implementation, the heap compare function can be reused. If we want to optimize the functionality further, we could implement this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the queue of decompressed batches is easier to achieve that's great!
...but leaving it like this might mean that it could hit back later
* Therefore, we must continue opening additional batches until the condition | ||
* is met. | ||
*/ | ||
if (first_tuple_returned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a filter has p
probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:
(first row is filtered) * (at least 1 match in remaining)
(first row is filtered) * (1-(all rows filtered))
- which is:
p * (1-p^999)
- this function can get very close to
1
https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29 - which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches
but the best would be to do something with it later...dual-heap or something else; doesn't matter.
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5711 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext * timescale#5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext * timescale#5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend
In decompress_sorted_merge_get_next_tuple it is determine how many batches need to be opened currently to perform a sorted merge. This is done by checking if the first tuple from the last opened batch is larger than the last returned tuple.
If a filter removes the first tuple, the first into the heap inserted tuple from this batch can no longer be used to perform the check. This patch fixes the wrong batch look ahead.
Fixes: #5797