Merge adjacent invalidations during refresh #2440

erimatnor · 2020-09-24T12:07:07Z

Since invalidations are inclusive in both ends, adjacent invalidations
can be merged. However, adjacency wasn't accounted for when merging
invalidations, which meant that a refresh could leave more
invalidations in the log than strictly necessary. Note that this
didn't otherwise affect the correctness of a refresh.

The PR also includes the following fix:

Fix index attribute in invalidation scan

When setting up an index scan for invalidations, a table attribute
number was used instead of the corresponding index attribute
number. While the attribute numbers happened to be the same, it isn't
future proof to use the wrong attribute reference.

codecov · 2020-09-24T12:58:08Z

Codecov Report

Merging #2440 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2440   +/-   ##
=======================================
  Coverage   90.10%   90.11%           
=======================================
  Files         213      213           
  Lines       34336    34356   +20     
=======================================
+ Hits        30940    30961   +21     
+ Misses       3396     3395    -1

Impacted Files	Coverage Δ
src/process_utility.c	`93.70% <100.00%> (+0.04%)`	⬆️
tsl/src/continuous_aggs/invalidation.c	`98.18% <100.00%> (+0.02%)`	⬆️
tsl/src/continuous_aggs/refresh.c	`99.32% <100.00%> (ø)`
src/utils.h	`71.42% <0.00%> (+26.98%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb4fec1...520bf2d. Read the comment docs.

mkindahl · 2020-09-24T13:26:59Z

tsl/src/continuous_aggs/invalidation.c

+static bool
+invalidations_can_be_merged(const Invalidation *a, const Invalidation *b)
+{
+	/* To account for adjacency, expand one window 1 step in each
+	 * direction. This makes adjacent invalidations overlapping. */
+	int64 a_start = int64_saturating_sub(a->lowest_modified_value, 1);
+	int64 a_end = int64_saturating_add(a->greatest_modified_value, 1);
+
+	return a_end >= b->lowest_modified_value && a_start <= b->greatest_modified_value;
+}


Is it guaranteed that the invalidations are ordered so that a << b? If that is the case, an assertion would be good, if not, testing both orders is probably good.

Not sure I understand the question. This function checks for overlap of two ranges and order shouldn't matter. Either they overlap or they don't.

Never mind, I misread the code. Now I see what you did.

mkindahl · 2020-09-25T07:56:53Z

tsl/src/continuous_aggs/invalidation.c

+static bool
+invalidations_can_be_merged(const Invalidation *a, const Invalidation *b)
+{
+	/* To account for adjacency, expand one window 1 step in each
+	 * direction. This makes adjacent invalidations overlapping. */
+	int64 a_start = int64_saturating_sub(a->lowest_modified_value, 1);
+	int64 a_end = int64_saturating_add(a->greatest_modified_value, 1);
+
+	return a_end >= b->lowest_modified_value && a_start <= b->greatest_modified_value;
+}


Never mind, I misread the code. Now I see what you did.

k-rus

LGTM
I have two nits. See the comments.

k-rus · 2020-09-25T08:25:01Z

tsl/src/continuous_aggs/invalidation.c

+ * matches a window, and, optionally, adds the invalidation segments covered
+ * by the window to the invalidation store in the passed in state. These


I don't understand this at all. What is the invalidation store? I know about hypertable invalidation log and cagg (or materialized hypertable) invalidation log. Also how is optionally specified, i.e., when is it done and when not?
What does the passed in state mean?
Taking in account my current knowledge about invalidation and working with it I am not be able to understand. I guess if somebody will need to deal with it later, will have hard time to understand too.

The function has a parameter called "state", which is of type CaggInvalidationState. The caller passes in this state as an argument to the function and it contains the invalidation store. The invalidation store is of type InvalidationStore (actually just a Tuplestorestate with tuple descriptor), which is defined in this file/header, and explained more extensively in the large comment at the top of this file.

I am happy to make this more clear, but I am not sure how to do it at this point. There is only on "state" passed into this function with said name. There is only one InvalidationStore so IMO, the reference is unambiguous.

Thank you for explanation. I still don't understand why optionally. Do you mean if the invalidation store is not null in state? If so, then I suggest to omit optionally as it is confusing.
Now I understand what you mean by in the passed in state - I think there is some problem with the grammar, which didn't allow me to understand. May be:

Suggested change

* matches a window, and, optionally, adds the invalidation segments covered

* by the window to the invalidation store in the passed in state. These

* matches a window, and, optionally, adds the invalidation segments covered

* by the window to the invalidation store, which is given in the argument state. These

Removed optionally and clarified the other parts.

k-rus · 2020-09-25T08:29:52Z

tsl/test/expected/continuous_aggs_invalidation.out

@@ -1163,3 +1163,23 @@ ORDER BY 1,2;
     10 |      1 |       10
 (1 row)

+-- Test that adjacent invalidations are merged
+INSERT INTO conditions VALUES(1, 1, 1.0), (2, 1, 2.0);


it will be easier to read if it is on separate line as others, otherwise it feels like a hole between 1 and 3.

This is intentional in order to create a range invalidation between 1-2. Each statement is its own transaction and thus records a separate invalidation. I want to test a mix of both non-trivial (length > 1) and trivial (length=1) ranges.

Thank you for clarifying!
May be:

Suggested change

INSERT INTO conditions VALUES(1, 1, 1.0), (2, 1, 2.0);

INSERT INTO conditions VALUES (1, 1, 1.0),

(2, 1, 2.0);

It's not important.

When setting up an index scan for invalidations, a table attribute number was used instead of the corresponding index attribute number. While the attribute numbers happened to be the same, it isn't future proof to use the wrong attribute reference.

Since invalidations are inclusive in both ends, adjacent invalidations can be merged. However, adjacency wasn't accounted for when merging invalidations, which meant that a refresh could leave more invalidations in the log than strictly necessary. Note that this didn't otherwise affect the correctness of a refresh.

This change adds views for invalidation tables to simplify queries in the test.

erimatnor added the continuous_aggregate label Sep 24, 2020

erimatnor force-pushed the caggs-invalidation-fixes branch 2 times, most recently from b82b8c3 to 7a52a7b Compare September 24, 2020 12:22

erimatnor changed the title ~~Merge adjecent invalidations during refresh~~ Merge adjacent invalidations during refresh Sep 24, 2020

erimatnor force-pushed the caggs-invalidation-fixes branch 2 times, most recently from 75b9c59 to 81dade6 Compare September 24, 2020 12:31

erimatnor marked this pull request as ready for review September 24, 2020 13:03

erimatnor requested a review from a team as a code owner September 24, 2020 13:03

erimatnor requested review from mkindahl, k-rus, gayyappan, WireBaron, pmwkaa and svenklemm and removed request for a team and gayyappan September 24, 2020 13:03

pmwkaa approved these changes Sep 24, 2020

View reviewed changes

mkindahl reviewed Sep 24, 2020

View reviewed changes

mkindahl approved these changes Sep 25, 2020

View reviewed changes

erimatnor force-pushed the caggs-invalidation-fixes branch from 81dade6 to a7d135c Compare September 25, 2020 08:32

k-rus approved these changes Sep 25, 2020

View reviewed changes

erimatnor force-pushed the caggs-invalidation-fixes branch from a7d135c to f9fedd7 Compare September 25, 2020 10:29

erimatnor added 3 commits September 25, 2020 13:14

Cleanup invalidations tests

520bf2d

This change adds views for invalidation tables to simplify queries in the test.

erimatnor force-pushed the caggs-invalidation-fixes branch from ee189da to 520bf2d Compare September 25, 2020 11:14

erimatnor merged commit 2ecfefc into timescale:master Sep 25, 2020

erimatnor deleted the caggs-invalidation-fixes branch September 25, 2020 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge adjacent invalidations during refresh #2440

Merge adjacent invalidations during refresh #2440

erimatnor commented Sep 24, 2020 •

edited

codecov bot commented Sep 24, 2020 •

edited

mkindahl Sep 24, 2020

erimatnor Sep 25, 2020

mkindahl Sep 25, 2020

mkindahl Sep 25, 2020

k-rus left a comment

k-rus Sep 25, 2020

erimatnor Sep 25, 2020 •

edited

k-rus Sep 25, 2020

erimatnor Sep 25, 2020

k-rus Sep 25, 2020

erimatnor Sep 25, 2020

k-rus Sep 25, 2020

		* matches a window, and, optionally, adds the invalidation segments covered
		* by the window to the invalidation store in the passed in state. These

	INSERT INTO conditions VALUES(1, 1, 1.0), (2, 1, 2.0);
	INSERT INTO conditions VALUES (1, 1, 1.0),
	(2, 1, 2.0);

Merge adjacent invalidations during refresh #2440

Merge adjacent invalidations during refresh #2440

Conversation

erimatnor commented Sep 24, 2020 • edited

codecov bot commented Sep 24, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-rus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor Sep 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor commented Sep 24, 2020 •

edited

codecov bot commented Sep 24, 2020 •

edited

erimatnor Sep 25, 2020 •

edited