Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed batch look ahead in compressed sorted merge #5798

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

jnidzwetzki
Copy link
Contributor

In decompress_sorted_merge_get_next_tuple it is determine how many batches need to be opened currently to perform a sorted merge. This is done by checking if the first tuple from the last opened batch is larger than the last returned tuple.

If a filter removes the first tuple, the first into the heap inserted tuple from this batch can no longer be used to perform the check. This patch fixes the wrong batch look ahead.

Fixes: #5797

@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Merging #5798 (1c2c251) into main (81e2f35) will decrease coverage by 0.19%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #5798      +/-   ##
==========================================
- Coverage   87.85%   87.66%   -0.19%     
==========================================
  Files         239      239              
  Lines       55649    55648       -1     
  Branches    12322    12322              
==========================================
- Hits        48889    48786     -103     
- Misses       4873     4929      +56     
- Partials     1887     1933      +46     
Impacted Files Coverage Δ
tsl/src/nodes/decompress_chunk/exec.c 92.97% <100.00%> (+0.02%) ⬆️
tsl/src/nodes/decompress_chunk/sorted_merge.c 91.00% <100.00%> (+0.18%) ⬆️

... and 36 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jnidzwetzki jnidzwetzki marked this pull request as ready for review June 16, 2023 21:32
@akuzm
Copy link
Member

akuzm commented Jun 19, 2023

I don't really understand what's going on anymore :) Can we make the "filtering" happen strictly before the "heap" part? I.e. for each batch, we find the next tuple that passes the filter, and then use this tuple to find out which batch is the top one.

@jnidzwetzki jnidzwetzki added this to the TimescaleDB 2.11.1 milestone Jun 22, 2023
* Therefore, we must continue opening additional batches until the condition
* is met.
*/
if (first_tuple_returned)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also means that in case a filter removes all 1st elements - it will open all batches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might want to open all batches; up to the current peek element - or not?
...or you could probably move the filter out from applying it during read right away - and retain the old logic
...or also add the chunk boundaries to the heap - and always use the max level to pull in new batches; possibly pull in by min for the 1st element - but the peek based approach is probably simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, if the first element is removed from the batch, the existing look-ahead logic can not use the returned tuple to decide if we have to open more batches.

This PR would exclude batches from the look-ahead logic if the first tuple was filtered. This could lead to a situation where we open more batches than needed.

Returning the first tuple from the batch, regardless if it satisfies the filter condition or not, could indeed help to reduce the number of open batches. However, this would also require some extra logic, which I wanted to avoid because this algorithm already has many special cases and I didn't want to make it more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a filter has p probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:

  • (first row is filtered) * (at least 1 match in remaining)
  • (first row is filtered) * (1-(all rows filtered))
  • which is: p * (1-p^999)
  • this function can get very close to 1 https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29
  • which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches

but the best would be to do something with it later...dual-heap or something else; doesn't matter.

@@ -1027,9 +1028,11 @@ decompress_get_next_tuple_from_batch(DecompressChunkState *chunk_state,
if (is_valid_tuple)
{
Assert(!TTS_EMPTY(decompressed_slot_projected));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think method name decompress_chunk_perform_select_project is misleading; it does a filter and projection.
...and the comment inside it about a 1000 rows looks invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion on how the function should be named?

Could you elaborate on what is wrong with the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think decompress_chunk_perform_filter_project makes more sense - as select in this context is project

The comment suggests that it will only reset the ctx after a ~1000 ;I think thats not true because:

  • it does it during every method call
  • and the method seem to be invoked for-each-row
    • on one branch it also increments the number of filtered rows by 1

in the current form - whatever caching that ctx could do is thrown away; so its not able to cache anything...but we have a cloudy comment why we are doing that

if you look at the original changeset which added this comment - the situation was much different

Copy link
Member

@akuzm akuzm Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think decompress_chunk_perform_filter_project makes more sense - as select in this context is project

No, that's the proper term, it's just the relational algebra and SQL conspiracy to confuse us -- SELECT is a projection and WHERE is a selection :)

Regarding the comment, I also noticed this, because it started showing up in profiles. Anyway, let's keep this for the later, this PR should be a quick backportable fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure; that also makes sense; but in that case - the term is selection and not select ; if you look at explain plans it uses the word filter... and that method internally also invokes InstrCountFiltered1 which ; I'm just saying that because of the way the method was named I had to read it thru because of the confusion => I think the name of the method did a bad job in describing what it does....maybe: decompress_chunk_perform_selection_project - but no big deal.

[...] it started showing up in profiles [...]

ok - and how much it worth? I guess it worth the most when there is an udf on a segmentby column.
but I think this comment should be removed...as its not valid anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - and how much it worth?

Not much, 1-2% of the total query CPU time.

@jnidzwetzki
Copy link
Contributor Author

@akuzm This should already be implemented in that way. In decompress_get_next_tuple_from_batch, we decompress the next tuple, apply the filter, and return the next tuple that satisfies the filter condition.

The function is called from decompress_batch_open_next_batch, which performs the heap logic and the opening of the next batches.

@akuzm
Copy link
Member

akuzm commented Jun 23, 2023

@akuzm This should already be implemented in that way. In decompress_get_next_tuple_from_batch, we decompress the next tuple, apply the filter, and return the next tuple that satisfies the filter condition.

The function is called from decompress_batch_open_next_batch, which performs the heap logic and the opening of the next batches.

OK, I think I understand, this PR relates to how we determine whether we need a new batch. For that we have to check if the last batch is unopened. Maybe it's OK as a quick fix, I struggle to understand it fully.

I think I'll have to refactor it later along the lines of what I mentioned in the original PR. I'm sticking the vectorized filters in the same place, and then the aggregations, and it's becoming more and more confusing :)

Something like this:

  • have a queue of decompressed batches
  • one implementation is one-element fifo, no sorted merge
  • the other implementation is heap for sorted merge

The usage of queue:

while (queue needs more compressed batches)
{
    push compressed batch to queue;
}
pop top decompressed element;
filter/project (done by the caller);

For the sorted merge queue, I'd make the data structure two-part: the heap + the explicit next unopened batch. To answer whether we need more batches, we'd compare the metadata of the unopened one (doesn't require decompression) to the current top decompressed tuple. I think this might simplify the logic, in particular the place you're changing here.

In decompress_sorted_merge_get_next_tuple it is determine how many
batches need to be opened currently to perform a sorted merge. This is
done by checking if the first tuple from the last opened batch is larger
than the last returned tuple.

If a filter removes the first tuple, the first into the heap inserted
tuple from this batch can no longer be used to perform the check. This
patch fixes the wrong batch look ahead.

Fixes: timescale#5797
@jnidzwetzki
Copy link
Contributor Author

@akuzm Sure, we should refactor the compression API before we add the vectorized optimizations. It's starting to become more and more complex. Once we have a clearer understanding of our requirements for vectorized operations, we can create an API proposal.

To answer whether we need more batches, we'd compare the metadata of the unopened one (doesn't require decompression) to the current top decompressed tuple.

I have not implemented this so far, because this requires another compare function that can handle a comparison between metadata and the top tuple of the heap and this introduces additional complexity. In the current implementation, the heap compare function can be reused. If we want to optimize the functionality further, we could implement this.

Copy link
Contributor

@kgyrtkirk kgyrtkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the queue of decompressed batches is easier to achieve that's great!
...but leaving it like this might mean that it could hit back later

* Therefore, we must continue opening additional batches until the condition
* is met.
*/
if (first_tuple_returned)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a filter has p probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:

  • (first row is filtered) * (at least 1 match in remaining)
  • (first row is filtered) * (1-(all rows filtered))
  • which is: p * (1-p^999)
  • this function can get very close to 1 https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29
  • which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches

but the best would be to do something with it later...dual-heap or something else; doesn't matter.

@jnidzwetzki jnidzwetzki enabled auto-merge (rebase) June 26, 2023 12:23
@jnidzwetzki jnidzwetzki merged commit 33a3e10 into timescale:main Jun 26, 2023
41 checks passed
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 27, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* timescale#5679 Teach loader to load OSM extension

**Bugfixes**
* timescale#5711 Scheduler accidentally getting killed when calling `delete_job`
* timescale#5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* timescale#5750 Ensure tlist is present in decompress chunk plan
* timescale#5754 Fixed handling of NULL values in bookend_sfunc
* timescale#5798 Fixed batch look ahead in compressed sorted merge
* timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* timescale#5807 Copy job config JSONB structure into current MemoryContext

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
@jnidzwetzki jnidzwetzki mentioned this pull request Jun 27, 2023
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 27, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* timescale#5679 Teach loader to load OSM extension

**Bugfixes**
* timescale#5705 Scheduler accidentally getting killed when calling `delete_job`
* timescale#5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* timescale#5750 Ensure tlist is present in decompress chunk plan
* timescale#5754 Fixed handling of NULL values in bookend_sfunc
* timescale#5798 Fixed batch look ahead in compressed sorted merge
* timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* timescale#5807 Copy job config JSONB structure into current MemoryContext

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 27, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* timescale#5679 Teach loader to load OSM extension

**Bugfixes**
* timescale#5705 Scheduler accidentally getting killed when calling `delete_job`
* timescale#5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* timescale#5750 Ensure tlist is present in decompress chunk plan
* timescale#5754 Fixed handling of NULL values in bookend_sfunc
* timescale#5798 Fixed batch look ahead in compressed sorted merge
* timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* timescale#5807 Copy job config JSONB structure into current MemoryContext

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 28, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* timescale#5679 Teach loader to load OSM extension

**Bugfixes**
* timescale#5705 Scheduler accidentally getting killed when calling `delete_job`
* timescale#5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* timescale#5750 Ensure tlist is present in decompress chunk plan
* timescale#5754 Fixed handling of NULL values in bookend_sfunc
* timescale#5798 Fixed batch look ahead in compressed sorted merge
* timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* timescale#5807 Copy job config JSONB structure into current MemoryContext
* timescale#5824 Improve continuous aggregate query chunk exclusion

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit that referenced this pull request Jun 28, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* #5679 Teach loader to load OSM extension

**Bugfixes**
* #5705 Scheduler accidentally getting killed when calling `delete_job`
* #5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* #5750 Ensure tlist is present in decompress chunk plan
* #5754 Fixed handling of NULL values in bookend_sfunc
* #5798 Fixed batch look ahead in compressed sorted merge
* #5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* #5807 Copy job config JSONB structure into current MemoryContext
* #5824 Improve continuous aggregate query chunk exclusion

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 28, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* timescale#5679 Teach loader to load OSM extension

**Bugfixes**
* timescale#5705 Scheduler accidentally getting killed when calling `delete_job`
* timescale#5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* timescale#5750 Ensure tlist is present in decompress chunk plan
* timescale#5754 Fixed handling of NULL values in bookend_sfunc
* timescale#5798 Fixed batch look ahead in compressed sorted merge
* timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* timescale#5807 Copy job config JSONB structure into current MemoryContext
* timescale#5824 Improve continuous aggregate query chunk exclusion

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit that referenced this pull request Jun 28, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* #5679 Teach loader to load OSM extension

**Bugfixes**
* #5705 Scheduler accidentally getting killed when calling `delete_job`
* #5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* #5750 Ensure tlist is present in decompress chunk plan
* #5754 Fixed handling of NULL values in bookend_sfunc
* #5798 Fixed batch look ahead in compressed sorted merge
* #5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* #5807 Copy job config JSONB structure into current MemoryContext
* #5824 Improve continuous aggregate query chunk exclusion

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
jnidzwetzki added a commit that referenced this pull request Jun 28, 2023
This release contains bug fixes since the 2.11.0 release. We recommend
that you upgrade at the next available opportunity.

**Features**
* #5679 Teach loader to load OSM extension

**Bugfixes**
* #5705 Scheduler accidentally getting killed when calling `delete_job`
* #5742 Fix Result node handling with ConstraintAwareAppend on
  compressed chunks
* #5750 Ensure tlist is present in decompress chunk plan
* #5754 Fixed handling of NULL values in bookend_sfunc
* #5798 Fixed batch look ahead in compressed sorted merge
* #5804 Mark cagg_watermark function as PARALLEL RESTRICTED
* #5807 Copy job config JSONB structure into current MemoryContext
* #5824 Improve continuous aggregate query chunk exclusion

**Thanks**
* @JamieD9 for reporting an issue with a wrong result ordering
* @xvaara for reporting an issue with Result node handling in
  ConstraintAwareAppend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ORDER BY with a LIMIT clause leads to spurious output order
5 participants