Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge adjacent invalidations during refresh #2440

Merged
merged 3 commits into from Sep 25, 2020

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Sep 24, 2020

Since invalidations are inclusive in both ends, adjacent invalidations
can be merged. However, adjacency wasn't accounted for when merging
invalidations, which meant that a refresh could leave more
invalidations in the log than strictly necessary. Note that this
didn't otherwise affect the correctness of a refresh.

The PR also includes the following fix:

Fix index attribute in invalidation scan

When setting up an index scan for invalidations, a table attribute
number was used instead of the corresponding index attribute
number. While the attribute numbers happened to be the same, it isn't
future proof to use the wrong attribute reference.

@erimatnor erimatnor force-pushed the caggs-invalidation-fixes branch 2 times, most recently from b82b8c3 to 7a52a7b Compare September 24, 2020 12:22
@erimatnor erimatnor changed the title Merge adjecent invalidations during refresh Merge adjacent invalidations during refresh Sep 24, 2020
@erimatnor erimatnor force-pushed the caggs-invalidation-fixes branch 2 times, most recently from 75b9c59 to 81dade6 Compare September 24, 2020 12:31
@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #2440 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2440   +/-   ##
=======================================
  Coverage   90.10%   90.11%           
=======================================
  Files         213      213           
  Lines       34336    34356   +20     
=======================================
+ Hits        30940    30961   +21     
+ Misses       3396     3395    -1     
Impacted Files Coverage Δ
src/process_utility.c 93.70% <100.00%> (+0.04%) ⬆️
tsl/src/continuous_aggs/invalidation.c 98.18% <100.00%> (+0.02%) ⬆️
tsl/src/continuous_aggs/refresh.c 99.32% <100.00%> (ø)
src/utils.h 71.42% <0.00%> (+26.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb4fec1...520bf2d. Read the comment docs.

@erimatnor erimatnor marked this pull request as ready for review September 24, 2020 13:03
@erimatnor erimatnor requested a review from a team as a code owner September 24, 2020 13:03
@erimatnor erimatnor requested review from mkindahl, k-rus, gayyappan, WireBaron, pmwkaa and svenklemm and removed request for a team and gayyappan September 24, 2020 13:03
Comment on lines +530 to +539
static bool
invalidations_can_be_merged(const Invalidation *a, const Invalidation *b)
{
/* To account for adjacency, expand one window 1 step in each
* direction. This makes adjacent invalidations overlapping. */
int64 a_start = int64_saturating_sub(a->lowest_modified_value, 1);
int64 a_end = int64_saturating_add(a->greatest_modified_value, 1);

return a_end >= b->lowest_modified_value && a_start <= b->greatest_modified_value;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that the invalidations are ordered so that a << b? If that is the case, an assertion would be good, if not, testing both orders is probably good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question. This function checks for overlap of two ranges and order shouldn't matter. Either they overlap or they don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I misread the code. Now I see what you did.

Comment on lines +530 to +539
static bool
invalidations_can_be_merged(const Invalidation *a, const Invalidation *b)
{
/* To account for adjacency, expand one window 1 step in each
* direction. This makes adjacent invalidations overlapping. */
int64 a_start = int64_saturating_sub(a->lowest_modified_value, 1);
int64 a_end = int64_saturating_add(a->greatest_modified_value, 1);

return a_end >= b->lowest_modified_value && a_start <= b->greatest_modified_value;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I misread the code. Now I see what you did.

Copy link
Contributor

@k-rus k-rus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I have two nits. See the comments.

Comment on lines 796 to 785
* matches a window, and, optionally, adds the invalidation segments covered
* by the window to the invalidation store in the passed in state. These
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this at all. What is the invalidation store? I know about hypertable invalidation log and cagg (or materialized hypertable) invalidation log. Also how is optionally specified, i.e., when is it done and when not?
What does the passed in state mean?
Taking in account my current knowledge about invalidation and working with it I am not be able to understand. I guess if somebody will need to deal with it later, will have hard time to understand too.

Copy link
Contributor Author

@erimatnor erimatnor Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function has a parameter called "state", which is of type CaggInvalidationState. The caller passes in this state as an argument to the function and it contains the invalidation store. The invalidation store is of type InvalidationStore (actually just a Tuplestorestate with tuple descriptor), which is defined in this file/header, and explained more extensively in the large comment at the top of this file.

I am happy to make this more clear, but I am not sure how to do it at this point. There is only on "state" passed into this function with said name. There is only one InvalidationStore so IMO, the reference is unambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation. I still don't understand why optionally. Do you mean if the invalidation store is not null in state? If so, then I suggest to omit optionally as it is confusing.
Now I understand what you mean by in the passed in state - I think there is some problem with the grammar, which didn't allow me to understand. May be:

Suggested change
* matches a window, and, optionally, adds the invalidation segments covered
* by the window to the invalidation store in the passed in state. These
* matches a window, and, optionally, adds the invalidation segments covered
* by the window to the invalidation store, which is given in the argument state. These

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed optionally and clarified the other parts.

@@ -1163,3 +1163,23 @@ ORDER BY 1,2;
10 | 1 | 10
(1 row)

-- Test that adjacent invalidations are merged
INSERT INTO conditions VALUES(1, 1, 1.0), (2, 1, 2.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be easier to read if it is on separate line as others, otherwise it feels like a hole between 1 and 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional in order to create a range invalidation between 1-2. Each statement is its own transaction and thus records a separate invalidation. I want to test a mix of both non-trivial (length > 1) and trivial (length=1) ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying!
May be:

Suggested change
INSERT INTO conditions VALUES(1, 1, 1.0), (2, 1, 2.0);
INSERT INTO conditions VALUES (1, 1, 1.0),
(2, 1, 2.0);

It's not important.

When setting up an index scan for invalidations, a table attribute
number was used instead of the corresponding index attribute
number. While the attribute numbers happened to be the same, it isn't
future proof to use the wrong attribute reference.
Since invalidations are inclusive in both ends, adjacent invalidations
can be merged. However, adjacency wasn't accounted for when merging
invalidations, which meant that a refresh could leave more
invalidations in the log than strictly necessary. Note that this
didn't otherwise affect the correctness of a refresh.
This change adds views for invalidation tables to simplify queries in
the test.
@erimatnor erimatnor merged commit 2ecfefc into timescale:master Sep 25, 2020
@erimatnor erimatnor deleted the caggs-invalidation-fixes branch September 25, 2020 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants