Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap invalidation threshold at last data bucket #2338

Merged

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Sep 8, 2020

When refreshing with an "infinite" refresh window going forward in
time, the invalidation threshold is also moved forward to the end of
the valid time range. This effectively renders the invalidation
threshold useless, leading to unnecessary write amplification.

To handle infinite refreshes better, this change caps the refresh
window at the end of the last bucket of data in the underlying
hypertable, as to not move the invalidation threshold further than
necessary. For instance, if the max time value in the hypertable is
11, a refresh command such as:

CALL refresh_continuous_aggregate(NULL, NULL);

would be turned into

CALL refresh_continuous_aggregate(NULL, 20);

assuming that a bucket starts at 10 and ends at 20 (exclusive). Thus
the invalidation threshold would at most move to 20, allowing the
threshold to still do its work once time again moves forward and
beyond it.

Note that one must never process invalidations beyond the invalidation
threshold without also moving it, as that would clear that area from
invalidations and thus prohibit refreshing that region once the
invalidation threshold is moved forward. Therefore, if we do not move
the threshold further than a certain point, we cannot refresh beyond
it either. An alternative, and perhaps safer, approach would be to
always invalidate the region over which the invalidation threshold is
moved (i.e., new_threshold - old_threshold). However, that is left for
a future change.

It would be possible to also cap non-infinite refreshes, e.g.,
refreshes that end at a higher time value than the max time value in
the hypertable. However, when an explicit end is specified, it might
be on purpose so optimizing this case is also left for the future.

Closes #2333

@erimatnor erimatnor added this to the 2.0.0 milestone Sep 8, 2020
@codecov
Copy link

codecov bot commented Sep 8, 2020

Codecov Report

Merging #2338 into master will increase coverage by 0.25%.
The diff coverage is 94.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2338      +/-   ##
==========================================
+ Coverage   90.13%   90.39%   +0.25%     
==========================================
  Files         212      213       +1     
  Lines       34391    34898     +507     
==========================================
+ Hits        31000    31545     +545     
+ Misses       3391     3353      -38     
Impacted Files Coverage Δ
src/catalog.h 100.00% <ø> (ø)
src/compat.h 100.00% <ø> (ø)
src/plan_expand_hypertable.c 94.13% <ø> (ø)
src/utils.h 100.00% <ø> (ø)
tsl/src/continuous_aggs/job.c 0.00% <ø> (-100.00%) ⬇️
tsl/src/fdw/data_node_scan_plan.c 97.15% <ø> (ø)
tsl/src/init.c 88.88% <ø> (ø)
tsl/test/src/test_chunk_stats.c 100.00% <ø> (ø)
tsl/src/remote/txn.c 87.63% <46.66%> (-1.17%) ⬇️
src/cross_module_fn.c 56.81% <50.00%> (-0.67%) ⬇️
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f32439...f0e3bfb. Read the comment docs.

@erimatnor erimatnor marked this pull request as ready for review September 8, 2020 08:47
@erimatnor erimatnor requested a review from a team as a code owner September 8, 2020 08:47
@erimatnor erimatnor requested review from pmwkaa, svenklemm, gayyappan, k-rus, mkindahl and WireBaron and removed request for a team and gayyappan September 8, 2020 08:47
Copy link
Contributor

@pmwkaa pmwkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

0 /*count*/);
if (res < 0)
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
Copy link
Contributor

@pmwkaa pmwkaa Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if this case is possible and should we test for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a good way to make SPI fail. Maybe inject a bad query; but it seems odd to deliberately do that. I think we have to trust that SPI works the way it should.

@k-rus
Copy link
Contributor

k-rus commented Sep 8, 2020

@erimatnor Is it applicable only in the case of refresh policy or with manual refresh too? From the commit message it sounds that both ways are covered.

@erimatnor
Copy link
Contributor Author

@erimatnor Is it applicable only in the case of refresh policy or with manual refresh too? From the commit message it sounds that both ways are covered.

Both are covered. The policy uses the same "manual" refresh API underneath the hood.

Copy link
Contributor

@WireBaron WireBaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor questions/comments.

src/hypertable.c Show resolved Hide resolved
src/hypertable.c Outdated
if (SPI_connect() != SPI_OK_CONNECT)
elog(ERROR, "could not connect to SPI");

res = SPI_execute_with_args(command->data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just SPI_execute here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good question. This was code that existed previously, which I moved here for code-reuse purposes. I will see if a simple SPI_execute works (it should).

*
* The new invalidation threshold returned is the end of the given refresh
* window, unless it ends at "infinity" in which case the threshold is capped
* at the end of the last bucket materialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will return infinity when that's the end of the refresh window and there's no materialized data yet. Is this correct? Should you mention this case in the comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Not exactly intended, not wrong either, but certainly non-optimal. I fixed this so that it returns the min time value in that case and we avoid moving the threshold unnecessarily. I added extra test cases to cover this corner-case.

@k-rus
Copy link
Contributor

k-rus commented Sep 9, 2020

@erimatnor A suggestion to improve the commit message, the second paragraph:

To handle infinite refreshes better, this change caps the invalidation
forward to the end of the last
bucket of data in the underlying hypertable. Subsequently the refresh window
is caped backward from +infinity to the end of the last
bucket of data in the underlying hypertable. Caping the refresh window is necessary,
since one cannot
refresh and process invalidations beyond the invalidation threshold,
as that would clear that area from invalidations and thus prohibit
refreshing that region once the invalidation threshold is moved
forward. An alternative, and perhaps safer, approach would be to
always invalidate the region over which the invalidation threshold is
moved (i.e., new_threshold - old_threshold). However, that is left for
a future change.

@erimatnor erimatnor force-pushed the caggs-cap-invalidation-threshold branch 5 times, most recently from 34cf22a to 726cf06 Compare September 9, 2020 10:59
@erimatnor
Copy link
Contributor Author

@erimatnor A suggestion to improve the commit message, the second paragraph:

To handle infinite refreshes better, this change caps the invalidation
forward to the end of the last
bucket of data in the underlying hypertable. Subsequently the refresh window
is caped backward from +infinity to the end of the last
bucket of data in the underlying hypertable. Caping the refresh window is necessary,
since one cannot
refresh and process invalidations beyond the invalidation threshold,
as that would clear that area from invalidations and thus prohibit
refreshing that region once the invalidation threshold is moved
forward. An alternative, and perhaps safer, approach would be to
always invalidate the region over which the invalidation threshold is
moved (i.e., new_threshold - old_threshold). However, that is left for
a future change.

I changed to commit message, hopefully to your liking. I didn't really understand your suggested change, however, in particular "Subsequently the refresh window is caped backward from +infinity to the end of the last bucket...".

Copy link
Contributor

@k-rus k-rus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Few nits.

src/hypertable.c Outdated Show resolved Hide resolved
bool
continuous_agg_invalidation_threshold_set(int32 raw_hypertable_id, int64 invalidation_threshold)
int64
invalidation_threshold_set_or_get(int32 raw_hypertable_id, int64 invalidation_threshold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set is confusing, since it updates value if applicable. I was already confused by the original function name when reviewed another PR. May be update?
set_or_get is confusing with or, since it always gets up-to-date threshold value. I think or_get can be omitted.
So the suggestion:

Suggested change
invalidation_threshold_set_or_get(int32 raw_hypertable_id, int64 invalidation_threshold)
invalidation_threshold_update(int32 raw_hypertable_id, int64 invalidation_threshold)

Copy link
Contributor Author

@erimatnor erimatnor Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to point out that the existing threshold is returned, which is not clear if it is just called update. Update, like set, implies it is always updated, but it is only set if the new value is greater than the old one.

set_or_get is literally what it is doing; it either sets the threshold to the new value or returns the old one. If the returned value equal or higher than the value given is input, one knows that the threshold was not updated/set.

Comment on lines +284 to +288
if (TS_TIME_IS_INTEGER_TIME(refresh_window->type))
max_refresh = TS_TIME_IS_MAX(refresh_window->end, refresh_window->type);
else
max_refresh = TS_TIME_IS_END(refresh_window->end, refresh_window->type) ||
TS_TIME_IS_NOEND(refresh_window->end, refresh_window->type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to have this in a function, so it can be utilised in other places when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do that when, and if, the need arises.

tsl/src/continuous_aggs/refresh.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/refresh.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/refresh.c Show resolved Hide resolved
@k-rus
Copy link
Contributor

k-rus commented Sep 9, 2020

I changed to commit message, hopefully to your liking. I didn't really understand your suggested change, however, in particular "Subsequently the refresh window is caped backward from +infinity to the end of the last bucket...".

The new commit message doesn't improve the most confusing sentence. It says:

this change caps the invalidation
threshold, and subsequently the refresh window, at the end of the last
bucket of data in the underlying hypertable.

The refreshed window is capped from +infinity, which is the request to user. While the invalidation threshold cannot be capped, but should be increased forward. The confusion comes from that the candidate invalidation threshold is capped, not the invalidation threshold itself. Will it be better?:

this change caps the refresh window at the of the last bucket of data in underlying hypertable, and the invalidation threshold is move forward to the capped value.

@erimatnor erimatnor force-pushed the caggs-cap-invalidation-threshold branch from 726cf06 to b5b30ae Compare September 9, 2020 16:04
When refreshing with an "infinite" refresh window going forward in
time, the invalidation threshold is also moved forward to the end of
the valid time range. This effectively renders the invalidation
threshold useless, leading to unnecessary write amplification.

To handle infinite refreshes better, this change caps the refresh
window at the end of the last bucket of data in the underlying
hypertable, as to not move the invalidation threshold further than
necessary. For instance, if the max time value in the hypertable is
11, a refresh command such as:

```
CALL refresh_continuous_aggregate(NULL, NULL);
```
would be turned into
```
CALL refresh_continuous_aggregate(NULL, 20);
```

assuming that a bucket starts at 10 and ends at 20 (exclusive). Thus
the invalidation threshold would at most move to 20, allowing the
threshold to still do its work once time again moves forward and
beyond it.

Note that one must never process invalidations beyond the invalidation
threshold without also moving it, as that would clear that area from
invalidations and thus prohibit refreshing that region once the
invalidation threshold is moved forward. Therefore, if we do not move
the threshold further than a certain point, we cannot refresh beyond
it either. An alternative, and perhaps safer, approach would be to
always invalidate the region over which the invalidation threshold is
moved (i.e., new_threshold - old_threshold). However, that is left for
a future change.

It would be possible to also cap non-infinite refreshes, e.g.,
refreshes that end at a higher time value than the max time value in
the hypertable. However, when an explicit end is specified, it might
be on purpose so optimizing this case is also left for the future.

Closes timescale#2333
@erimatnor erimatnor force-pushed the caggs-cap-invalidation-threshold branch from b5b30ae to f0e3bfb Compare September 9, 2020 17:20
@erimatnor erimatnor merged commit f49492b into timescale:master Sep 9, 2020
@erimatnor erimatnor deleted the caggs-cap-invalidation-threshold branch September 9, 2020 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not move invalidation threshold further than necessary
4 participants