Fixed batch look ahead in compressed sorted merge #5798

jnidzwetzki · 2023-06-16T20:58:45Z

In decompress_sorted_merge_get_next_tuple it is determine how many batches need to be opened currently to perform a sorted merge. This is done by checking if the first tuple from the last opened batch is larger than the last returned tuple.

If a filter removes the first tuple, the first into the heap inserted tuple from this batch can no longer be used to perform the check. This patch fixes the wrong batch look ahead.

Fixes: #5797

codecov · 2023-06-16T21:14:35Z

Codecov Report

Merging #5798 (1c2c251) into main (81e2f35) will decrease coverage by 0.19%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #5798      +/-   ##
==========================================
- Coverage   87.85%   87.66%   -0.19%     
==========================================
  Files         239      239              
  Lines       55649    55648       -1     
  Branches    12322    12322              
==========================================
- Hits        48889    48786     -103     
- Misses       4873     4929      +56     
- Partials     1887     1933      +46

Impacted Files	Coverage Δ
tsl/src/nodes/decompress_chunk/exec.c	`92.97% <100.00%> (+0.02%)`	⬆️
tsl/src/nodes/decompress_chunk/sorted_merge.c	`91.00% <100.00%> (+0.18%)`	⬆️

... and 36 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

akuzm · 2023-06-19T12:43:59Z

I don't really understand what's going on anymore :) Can we make the "filtering" happen strictly before the "heap" part? I.e. for each batch, we find the next tuple that passes the filter, and then use this tuple to find out which batch is the top one.

kgyrtkirk · 2023-06-23T08:48:36Z

tsl/src/nodes/decompress_chunk/sorted_merge.c

+			 * Therefore, we must continue opening additional batches until the condition
+			 * is met.
+			 */
+			if (first_tuple_returned)


does this also means that in case a filter removes all 1st elements - it will open all batches?

I think you might want to open all batches; up to the current peek element - or not?
...or you could probably move the filter out from applying it during read right away - and retain the old logic
...or also add the chunk boundaries to the heap - and always use the max level to pull in new batches; possibly pull in by min for the 1st element - but the peek based approach is probably simpler

This is correct, if the first element is removed from the batch, the existing look-ahead logic can not use the returned tuple to decide if we have to open more batches.

This PR would exclude batches from the look-ahead logic if the first tuple was filtered. This could lead to a situation where we open more batches than needed.

Returning the first tuple from the batch, regardless if it satisfies the filter condition or not, could indeed help to reduce the number of open batches. However, this would also require some extra logic, which I wanted to avoid because this algorithm already has many special cases and I didn't want to make it more complicated.

if a filter has p probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:

(first row is filtered) * (at least 1 match in remaining)

(first row is filtered) * (1-(all rows filtered))

which is: p * (1-p^999)

this function can get very close to 1 https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29

which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches

but the best would be to do something with it later...dual-heap or something else; doesn't matter.

kgyrtkirk · 2023-06-23T09:03:16Z

tsl/src/nodes/decompress_chunk/exec.c

@@ -1027,9 +1028,11 @@ decompress_get_next_tuple_from_batch(DecompressChunkState *chunk_state,
 		if (is_valid_tuple)
 		{
 			Assert(!TTS_EMPTY(decompressed_slot_projected));


I think method name decompress_chunk_perform_select_project is misleading; it does a filter and projection.
...and the comment inside it about a 1000 rows looks invalid

Do you have a suggestion on how the function should be named?

Could you elaborate on what is wrong with the comment?

I think decompress_chunk_perform_filter_project makes more sense - as select in this context is project

The comment suggests that it will only reset the ctx after a ~1000 ;I think thats not true because:

it does it during every method call

and the method seem to be invoked for-each-row

on one branch it also increments the number of filtered rows by 1

in the current form - whatever caching that ctx could do is thrown away; so its not able to cache anything...but we have a cloudy comment why we are doing that

if you look at the original changeset which added this comment - the situation was much different

I think decompress_chunk_perform_filter_project makes more sense - as select in this context is project

No, that's the proper term, it's just the relational algebra and SQL conspiracy to confuse us -- SELECT is a projection and WHERE is a selection :)

Regarding the comment, I also noticed this, because it started showing up in profiles. Anyway, let's keep this for the later, this PR should be a quick backportable fix.

sure; that also makes sense; but in that case - the term is selection and not select ; if you look at explain plans it uses the word filter... and that method internally also invokes InstrCountFiltered1 which ; I'm just saying that because of the way the method was named I had to read it thru because of the confusion => I think the name of the method did a bad job in describing what it does....maybe: decompress_chunk_perform_selection_project - but no big deal.

[...] it started showing up in profiles [...]

ok - and how much it worth? I guess it worth the most when there is an udf on a segmentby column.
but I think this comment should be removed...as its not valid anymore

ok - and how much it worth?

Not much, 1-2% of the total query CPU time.

jnidzwetzki · 2023-06-23T10:12:44Z

@akuzm This should already be implemented in that way. In decompress_get_next_tuple_from_batch, we decompress the next tuple, apply the filter, and return the next tuple that satisfies the filter condition.

The function is called from decompress_batch_open_next_batch, which performs the heap logic and the opening of the next batches.

akuzm · 2023-06-23T11:37:03Z

@akuzm This should already be implemented in that way. In decompress_get_next_tuple_from_batch, we decompress the next tuple, apply the filter, and return the next tuple that satisfies the filter condition.

The function is called from decompress_batch_open_next_batch, which performs the heap logic and the opening of the next batches.

OK, I think I understand, this PR relates to how we determine whether we need a new batch. For that we have to check if the last batch is unopened. Maybe it's OK as a quick fix, I struggle to understand it fully.

I think I'll have to refactor it later along the lines of what I mentioned in the original PR. I'm sticking the vectorized filters in the same place, and then the aggregations, and it's becoming more and more confusing :)

Something like this:

have a queue of decompressed batches
one implementation is one-element fifo, no sorted merge
the other implementation is heap for sorted merge

The usage of queue:

while (queue needs more compressed batches)
{
    push compressed batch to queue;
}
pop top decompressed element;
filter/project (done by the caller);

For the sorted merge queue, I'd make the data structure two-part: the heap + the explicit next unopened batch. To answer whether we need more batches, we'd compare the metadata of the unopened one (doesn't require decompression) to the current top decompressed tuple. I think this might simplify the logic, in particular the place you're changing here.

In decompress_sorted_merge_get_next_tuple it is determine how many batches need to be opened currently to perform a sorted merge. This is done by checking if the first tuple from the last opened batch is larger than the last returned tuple. If a filter removes the first tuple, the first into the heap inserted tuple from this batch can no longer be used to perform the check. This patch fixes the wrong batch look ahead. Fixes: timescale#5797

jnidzwetzki · 2023-06-26T07:21:41Z

@akuzm Sure, we should refactor the compression API before we add the vectorized optimizations. It's starting to become more and more complex. Once we have a clearer understanding of our requirements for vectorized operations, we can create an API proposal.

To answer whether we need more batches, we'd compare the metadata of the unopened one (doesn't require decompression) to the current top decompressed tuple.

I have not implemented this so far, because this requires another compare function that can handle a comparison between metadata and the top tuple of the heap and this introduces additional complexity. In the current implementation, the heap compare function can be reused. If we want to optimize the functionality further, we could implement this.

kgyrtkirk

if the queue of decompressed batches is easier to achieve that's great!
...but leaving it like this might mean that it could hit back later

kgyrtkirk · 2023-06-26T10:34:29Z

tsl/src/nodes/decompress_chunk/sorted_merge.c

+			 * Therefore, we must continue opening additional batches until the condition
+			 * is met.
+			 */
+			if (first_tuple_returned)


if a filter has p probability to filter out a row; the chance to have a batch with at least one element matching; but not the first is:

(first row is filtered) * (at least 1 match in remaining)

(first row is filtered) * (1-(all rows filtered))

which is: p * (1-p^999)

this function can get very close to 1 https://www.wolframalpha.com/input?i=max%28p+*+%281-p%5E999%29%29

which means it will just open up all batches; but because of the selectivity it will most likely also have 100% excluded batches

but the best would be to do something with it later...dual-heap or something else; doesn't matter.

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5711 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext * timescale#5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#5679 Teach loader to load OSM extension **Bugfixes** * timescale#5705 Scheduler accidentally getting killed when calling `delete_job` * timescale#5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * timescale#5750 Ensure tlist is present in decompress chunk plan * timescale#5754 Fixed handling of NULL values in bookend_sfunc * timescale#5798 Fixed batch look ahead in compressed sorted merge * timescale#5804 Mark cagg_watermark function as PARALLEL RESTRICTED * timescale#5807 Copy job config JSONB structure into current MemoryContext * timescale#5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

@JamieD9

This release contains bug fixes since the 2.11.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #5679 Teach loader to load OSM extension **Bugfixes** * #5705 Scheduler accidentally getting killed when calling `delete_job` * #5742 Fix Result node handling with ConstraintAwareAppend on compressed chunks * #5750 Ensure tlist is present in decompress chunk plan * #5754 Fixed handling of NULL values in bookend_sfunc * #5798 Fixed batch look ahead in compressed sorted merge * #5804 Mark cagg_watermark function as PARALLEL RESTRICTED * #5807 Copy job config JSONB structure into current MemoryContext * #5824 Improve continuous aggregate query chunk exclusion **Thanks** * @JamieD9 for reporting an issue with a wrong result ordering * @xvaara for reporting an issue with Result node handling in ConstraintAwareAppend

github-actions bot assigned jnidzwetzki Jun 16, 2023

jnidzwetzki force-pushed the fix_sorted_merge_open branch from f1328d9 to 9212e4d Compare June 16, 2023 21:01

jnidzwetzki requested review from svenklemm, akuzm and konskov June 16, 2023 21:32

jnidzwetzki marked this pull request as ready for review June 16, 2023 21:32

jnidzwetzki added this to the TimescaleDB 2.11.1 milestone Jun 22, 2023

kgyrtkirk reviewed Jun 23, 2023

View reviewed changes

akuzm approved these changes Jun 23, 2023

View reviewed changes

jnidzwetzki force-pushed the fix_sorted_merge_open branch from 9212e4d to 1c2c251 Compare June 26, 2023 06:34

kgyrtkirk approved these changes Jun 26, 2023

View reviewed changes

svenklemm approved these changes Jun 26, 2023

View reviewed changes

jnidzwetzki enabled auto-merge (rebase) June 26, 2023 12:23

jnidzwetzki merged commit 33a3e10 into timescale:main Jun 26, 2023
41 checks passed

timescale-automation mentioned this pull request Jun 26, 2023

Backport to 2.11.x: #5798: Fixed batch look ahead in compressed sorted merge #5819

Merged

timescale-automation added the backported-2.11.x label Jun 26, 2023

jnidzwetzki mentioned this pull request Jun 27, 2023

Release 2.11.1 #5822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed batch look ahead in compressed sorted merge #5798

Fixed batch look ahead in compressed sorted merge #5798

jnidzwetzki commented Jun 16, 2023

codecov bot commented Jun 16, 2023 •

edited

akuzm commented Jun 19, 2023

kgyrtkirk Jun 23, 2023

kgyrtkirk Jun 23, 2023

jnidzwetzki Jun 23, 2023

kgyrtkirk Jun 26, 2023

kgyrtkirk Jun 23, 2023

jnidzwetzki Jun 23, 2023

kgyrtkirk Jun 23, 2023

akuzm Jun 23, 2023 •

edited

kgyrtkirk Jun 26, 2023

akuzm Jun 26, 2023

jnidzwetzki commented Jun 23, 2023

akuzm commented Jun 23, 2023

jnidzwetzki commented Jun 26, 2023

kgyrtkirk left a comment

kgyrtkirk Jun 26, 2023

Fixed batch look ahead in compressed sorted merge #5798

Fixed batch look ahead in compressed sorted merge #5798

Conversation

jnidzwetzki commented Jun 16, 2023

codecov bot commented Jun 16, 2023 • edited

Codecov Report

akuzm commented Jun 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm Jun 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnidzwetzki commented Jun 23, 2023

akuzm commented Jun 23, 2023

jnidzwetzki commented Jun 26, 2023

kgyrtkirk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 16, 2023 •

edited

akuzm Jun 23, 2023 •

edited