Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cagg refresh for small invalidations #2926

Conversation

erimatnor
Copy link
Contributor

This PR includes two changes:

Change 1:

The refreshing of a continuous aggregate is slow when many small
invalidations are generated by frequent single row insert
backfills. This change adds an optimization that merges small
invalidations by first expanding invalidations to full bucket
boundaries. There is really no reason to maintain invalidations that
aren't covering full buckets since refresh windows are already aligned
to buckets anyway.

Change 2:

When there are many small (e.g., single timestamp) invalidations that
cannot be merged despite expanding invalidations to full buckets
(e.g., invalidations are spread across every second bucket in the
worst case), it might no longer be beneficial to materialize every
invalidation separately.

Instead, this change adds a threshold for the number of invalidations
used by the refresh (currently 10 by default) above which
invalidations are merged into one range based on the lowest and
greatest invalidated time value.

The limit can be controlled by an anonymous session variable for
debugging and tweaking purposes. It might be considered for promotion
to an official GUC in the future.

Fixes #2867

@codecov
Copy link

codecov bot commented Feb 10, 2021

Codecov Report

Merging #2926 (0fcdf01) into master (d43e54a) will increase coverage by 0.12%.
The diff coverage is 94.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2926      +/-   ##
==========================================
+ Coverage   90.15%   90.28%   +0.12%     
==========================================
  Files         212      212              
  Lines       34799    34894      +95     
==========================================
+ Hits        31373    31503     +130     
+ Misses       3426     3391      -35     
Impacted Files Coverage Δ
src/compat.h 100.00% <ø> (ø)
tsl/src/remote/connection_cache.c 90.40% <ø> (ø)
tsl/src/bgw_policy/continuous_aggregate_api.c 93.64% <90.75%> (-4.01%) ⬇️
tsl/src/continuous_aggs/invalidation.c 97.94% <97.14%> (-0.17%) ⬇️
tsl/src/continuous_aggs/refresh.c 97.58% <98.14%> (-0.16%) ⬇️
src/continuous_agg.c 90.99% <100.00%> (+0.34%) ⬆️
src/process_utility.c 93.85% <100.00%> (-0.02%) ⬇️
tsl/src/bgw_policy/job.c 97.34% <100.00%> (-0.02%) ⬇️
tsl/src/continuous_aggs/create.c 96.92% <100.00%> (-0.01%) ⬇️
tsl/src/continuous_aggs/insert.c 85.71% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a711fc...0fcdf01. Read the comment docs.

@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch from baa6165 to 47b883f Compare February 11, 2021 08:25
@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch from 47b883f to 6a2167c Compare February 17, 2021 10:53
@erimatnor erimatnor marked this pull request as ready for review February 17, 2021 11:44
@erimatnor erimatnor requested a review from a team as a code owner February 17, 2021 11:44
@erimatnor erimatnor self-assigned this Feb 17, 2021
* and move one step down to the end value of the previous
* bucket. Remember that invalidations are inclusive, so the "greatest"
* value should be the last value of the last full bucket. */
max_bucket_end = ts_time_bucket_by_type(bucket_width, ts_time_get_max(timetype), timetype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible optimization: cache min_bucket_start and max_bucket_end so that we don't have to recompute this while processing invalidations in a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but figured it wasn't worth it. Perhaps you have a suggestion of how to easily cache it without making this unnecessarily complex?

@@ -290,15 +291,71 @@ log_refresh_window(int elevel, const ContinuousAgg *cagg, const InternalTimeRang
DatumGetCString(OidFunctionCall1(outfuncid, end_ts)));
}

#define MAX_INVALIDATIONS_PER_REFRESH 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buckets are big, this could be a lot of data and we might run into some of the issues that were previously solved by setting max_interval_per_job. Probably makes sense to add a guc as part of the PR.

Copy link
Contributor Author

@erimatnor erimatnor Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics here are different from max_interval_per_job. The old max_interval_per_job setting is actually already present in a similar form as the user- or policy-specified refresh window; i.e., the user controls the max amount materialized by specifying a specific refresh window size. So, if there's a lot of data to materialize, the way to deal with that is to specify a smaller refresh window, instead doing several refreshes. This becomes clear if you think about the worst case situation, i.e., when every bucket in the refresh window is invalid. In that case, this max invalidatoins per refresh value doesn't matter because you need to refresh the whole window anyway.

This max value here is about how we optimize the materialization of the specified refresh window when we are not in the worst case and it is possible to break the window down in smaller regions. Pre-2.0, everything from the earliest invalidation and forward was re-materialized no matter whether the entire range needed re-materialization or not. Post-2.0 we break that down so that we only materialize the regions/buckets that are actually invalid. However, if there are a lot of small regions that are invalidated (e.g., every other bucket in the worst case), it is cheaper to merge that into a large range. The session variable below controls the threshold.

@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch from 6a2167c to 2ec7c2b Compare February 18, 2021 09:54
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few things that are not clear, could be, or is a problem. If you do not think they are, please respond to them.

I also think that is not a good idea to have the max_invalidations_per_refresh for the user function and especially having it to default to 10 (which it does, from my understanding). For the policy runs, this is just a way to throttle the background workers, which is fine, but when calling this from the command line the result will be that the entire range is not refreshed and the user will have to type the command several times, which will be annoying. I think it might be better to add a parameter to the function which defaults to do a full refresh and ensure that the policy set a value for this parameter instead. This will make sure that policies can be throttled and have a good default value, while command-line calls will refresh the entire range.

tsl/src/continuous_aggs/invalidation.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/invalidation.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/refresh.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/refresh.c Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/refresh.c Show resolved Hide resolved
Copy link
Contributor

@pmwkaa pmwkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, don't have much to add, just a couple of comments

tsl/src/continuous_aggs/refresh.c Outdated Show resolved Hide resolved
tsl/test/sql/continuous_aggs_invalidation.sql Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/invalidation.c Outdated Show resolved Hide resolved
@pmwkaa pmwkaa mentioned this pull request Feb 18, 2021
@erimatnor
Copy link
Contributor Author

erimatnor commented Feb 18, 2021

I also think that is not a good idea to have the max_invalidations_per_refresh for the user function and especially having it to default to 10 (which it does, from my understanding). For the policy runs, this is just a way to throttle the background workers, which is fine, but when calling this from the command line the result will be that the entire range is not refreshed and the user will have to type the command several times, which will be annoying.

So, this seems to be a misunderstanding of what the changes do, and that's on me for failing to make clear and perhaps a badly named variable.

Two things that I'd like to clarify:

  • The changes in this PR in no way changes the semantics of refresh_continuous_aggregate. After a call to this function, everything in the given window will be up-to-date. This hasn't changed. What happens internally, though, is that we break a refresh down into a set of multiple sub-refreshes (materialization) on the ranges (buckets) that are actually out-of-date. But if that turns out to be 100 separate refreshes, this is likely to be slower than one single refresh across the range determined by the min invalidated time and max invalidated time. The max_invalidations_per_refresh just sets the threshold of where we merge those separate ranges into one big range. It does not mean that we leave some ranges out-of-date.
  • None of the changes in this PR has anything to do with throttling policies. Not sure where that impression came from.

I've tried to change the comments and variable names to make this more clear.

@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch 2 times, most recently from 86aa95d to 4d22b64 Compare February 18, 2021 19:56
@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch from 4d22b64 to 2d36331 Compare February 19, 2021 07:34
@mkindahl
Copy link
Contributor

I also think that is not a good idea to have the max_invalidations_per_refresh for the user function and especially having it to default to 10 (which it does, from my understanding). For the policy runs, this is just a way to throttle the background workers, which is fine, but when calling this from the command line the result will be that the entire range is not refreshed and the user will have to type the command several times, which will be annoying.

So, this seems to be a misunderstanding of what the changes do, and that's on me for failing to make clear and perhaps a badly named variable.

Two things that I'd like to clarify:

* The changes in this PR in no way changes the semantics of `refresh_continuous_aggregate`. After a call to this function, everything in the given window will be up-to-date. This hasn't changed. What happens internally, though, is that we break a refresh down into a set of multiple sub-refreshes (materialization) on the ranges (buckets) that are actually out-of-date. But if that turns out to be 100 separate refreshes, this is likely to be slower than one single refresh across the range determined by the min invalidated time and max invalidated time. The `max_invalidations_per_refresh` just sets the threshold of where we merge those separate ranges into one big range. It does _not_ mean that we leave some ranges out-of-date.

* None of the changes in this PR has anything to do with throttling policies. Not sure where that impression came from.

I've tried to change the comments and variable names to make this more clear.

I understand what the variable do, but IMHO the problem is that the default behavior is to not process the entire range when it is "too complicated" (when we have a lot of buckets that are not connected). We had several comments before when REFRESH MATERIALIZED VIEW didn't refresh everything, so I think we should make sure that we, by default, refresh the entire range even if that happens to take a long time.

@erimatnor
Copy link
Contributor Author

erimatnor commented Feb 19, 2021

I understand what the variable do, but IMHO the problem is that the default behavior is to not process the entire range when it is "too complicated" (when we have a lot of buckets that are not connected). We had several comments before when REFRESH MATERIALIZED VIEW didn't refresh everything, so I think we should make sure that we, by default, refresh the entire range even if that happens to take a long time.

Above the threshold we actually process more than necessary, so it is actually the opposite of your understanding. When you say we had comments about REFRESH MATERIALIZED VIEW, I assume you are referring to pre-2.0 behavior. That functionality was entirely different from what we do here, so there is no clear comparison. The semantics of refresh_continuous_aggregate here is to leave everything in the user-specified window/range up-to-date after the call completes. So, your statement about not processing the entire range is not the way it works.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification that the changes does not affect the range that the refresh function process. LGTM.

@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch 2 times, most recently from 2e47dcc to f1569d1 Compare February 19, 2021 10:49
The refreshing of a continuous aggregate is slow when many small
invalidations are generated by frequent single row insert
backfills. This change adds an optimization that merges small
invalidations by first expanding invalidations to full bucket
boundaries. There is really no reason to maintain invalidations that
aren't covering full buckets since refresh windows are already aligned
to buckets anyway.

Fixes timescale#2867
When there are many small (e.g., single timestamp) invalidations that
cannot be merged despite expanding invalidations to full buckets
(e.g., invalidations are spread across every second bucket in the
worst case), it might no longer be beneficial to materialize every
invalidation separately.

Instead, this change adds a threshold for the number of invalidations
used by the refresh (currently 10 by default) above which
invalidations are merged into one range based on the lowest and
greatest invalidated time value.

The limit can be controlled by an anonymous session variable for
debugging and tweaking purposes. It might be considered for promotion
to an official GUC in the future.

Fixes timescale#2867
@erimatnor erimatnor force-pushed the optimize-cagg-refresh-for-small-invalidations branch from f1569d1 to 0fcdf01 Compare February 19, 2021 10:58
@erimatnor
Copy link
Contributor Author

@mkindahl @pmwkaa Changed the following based on feedback.

  • Updated comments for min/max bucket calculation (also made a small tweak to calculating the end of the last bucket)
  • Added warning for the session variable parsing, and also added support for trailing whitespaces
  • Added tests for non-parsable input of session variable

@mkindahl
Copy link
Contributor

I understand what the variable do, but IMHO the problem is that the default behavior is to not process the entire range when it is "too complicated" (when we have a lot of buckets that are not connected). We had several comments before when REFRESH MATERIALIZED VIEW didn't refresh everything, so I think we should make sure that we, by default, refresh the entire range even if that happens to take a long time.

Above the threshold we actually process more than necessary, so it is actually the opposite of your understanding. When you say we had comments about REFRESH MATERIALIZED VIEW, I assume you are referring to pre-2.0 behavior. That functionality was entirely different from what we do here, so there is no clear comparison. The semantics of refresh_continuous_aggregate here is to leave everything in the user-specified window/range up-to-date after the call completes. So, your statement about not processing the entire range is not the way it works.

Good to hear. Then I misunderstood how it worked.

@erimatnor erimatnor merged commit cc287f9 into timescale:master Feb 19, 2021
@erimatnor erimatnor deleted the optimize-cagg-refresh-for-small-invalidations branch February 19, 2021 12:27
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.2 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* #2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* #2850 Set status for backend in background jobs
* #2883 Fix join qual propagation for nested joins
* #2884 Add GUC to control join qual propagation
* #2885 Fix compressed chunk check when disabling compression
* #2908 Fix changing column type of clustered hypertables
* #2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* #2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* #2850 Set status for backend in background jobs
* #2883 Fix join qual propagation for nested joins
* #2884 Add GUC to control join qual propagation
* #2885 Fix compressed chunk check when disabling compression
* #2908 Fix changing column type of clustered hypertables
* #2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* #2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* #2850 Set status for backend in background jobs
* #2883 Fix join qual propagation for nested joins
* #2884 Add GUC to control join qual propagation
* #2885 Fix compressed chunk check when disabling compression
* #2908 Fix changing column type of clustered hypertables
* #2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* #2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* #2850 Set status for backend in background jobs
* #2883 Fix join qual propagation for nested joins
* #2884 Add GUC to control join qual propagation
* #2885 Fix compressed chunk check when disabling compression
* #2908 Fix changing column type of clustered hypertables
* #2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* timescale#2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* timescale#2850 Set status for backend in background jobs
* timescale#2883 Fix join qual propagation for nested joins
* timescale#2884 Add GUC to control join qual propagation
* timescale#2885 Fix compressed chunk check when disabling compression
* timescale#2908 Fix changing column type of clustered hypertables
* timescale#2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
@erimatnor erimatnor mentioned this pull request Feb 19, 2021
erimatnor added a commit that referenced this pull request Feb 19, 2021
This maintenance release contains bugfixes since the 2.0.1 release. We
deem it high priority for upgrading.

The bug fixes in this release address issues with joins, the status of
background jobs, and disabling compression. It also includes
enhancements to continuous aggregates, including improved validation
of policies and optimizations for faster refreshes when there are a
lot of invalidations.

**Minor features**
* #2926 Optimize cagg refresh for small invalidations

**Bugfixes**
* #2850 Set status for backend in background jobs
* #2883 Fix join qual propagation for nested joins
* #2884 Add GUC to control join qual propagation
* #2885 Fix compressed chunk check when disabling compression
* #2908 Fix changing column type of clustered hypertables
* #2942 Validate continuous aggregate policy

**Thanks**
* @zeeshanshabbir93 for reporting the issue with full outer joins
* @Antiarchitect for reporting the issue with slow refreshes of
* @diego-hermida for reporting the issue about being unable to disable
  compression
* @mtin for reporting the issue about wrong job status
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 6, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 6, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 6, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 6, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 8, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 12, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change. To notify
background workers(i.e., refresh jobs) of changed settings, set the GUC
context to PGC_SUSET.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Mar 12, 2024
PR timescale#2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change.
jnidzwetzki added a commit that referenced this pull request Mar 12, 2024
PR #2926 introduced a session-based configuration parameter for the CAgg
refresh behavior. If more individual refreshes have to be carried out
than specified by this setting, a refresh for a larger window is
performed.

It is mentioned in the original PR that this setting should be converted
into a GUC later. This PR performs the proposed change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TimescaleDB 2.x Continuous Aggrecation long recalculation
4 participants