New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cagg refresh for small invalidations #2926
Optimize cagg refresh for small invalidations #2926
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2926 +/- ##
==========================================
+ Coverage 90.15% 90.28% +0.12%
==========================================
Files 212 212
Lines 34799 34894 +95
==========================================
+ Hits 31373 31503 +130
+ Misses 3426 3391 -35
Continue to review full report at Codecov.
|
baa6165
to
47b883f
Compare
47b883f
to
6a2167c
Compare
* and move one step down to the end value of the previous | ||
* bucket. Remember that invalidations are inclusive, so the "greatest" | ||
* value should be the last value of the last full bucket. */ | ||
max_bucket_end = ts_time_bucket_by_type(bucket_width, ts_time_get_max(timetype), timetype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible optimization: cache min_bucket_start and max_bucket_end so that we don't have to recompute this while processing invalidations in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but figured it wasn't worth it. Perhaps you have a suggestion of how to easily cache it without making this unnecessarily complex?
tsl/src/continuous_aggs/refresh.c
Outdated
@@ -290,15 +291,71 @@ log_refresh_window(int elevel, const ContinuousAgg *cagg, const InternalTimeRang | |||
DatumGetCString(OidFunctionCall1(outfuncid, end_ts))); | |||
} | |||
|
|||
#define MAX_INVALIDATIONS_PER_REFRESH 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the buckets are big, this could be a lot of data and we might run into some of the issues that were previously solved by setting max_interval_per_job. Probably makes sense to add a guc as part of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics here are different from max_interval_per_job
. The old max_interval_per_job
setting is actually already present in a similar form as the user- or policy-specified refresh window; i.e., the user controls the max amount materialized by specifying a specific refresh window size. So, if there's a lot of data to materialize, the way to deal with that is to specify a smaller refresh window, instead doing several refreshes. This becomes clear if you think about the worst case situation, i.e., when every bucket in the refresh window is invalid. In that case, this max invalidatoins per refresh value doesn't matter because you need to refresh the whole window anyway.
This max value here is about how we optimize the materialization of the specified refresh window when we are not in the worst case and it is possible to break the window down in smaller regions. Pre-2.0, everything from the earliest invalidation and forward was re-materialized no matter whether the entire range needed re-materialization or not. Post-2.0 we break that down so that we only materialize the regions/buckets that are actually invalid. However, if there are a lot of small regions that are invalidated (e.g., every other bucket in the worst case), it is cheaper to merge that into a large range. The session variable below controls the threshold.
6a2167c
to
2ec7c2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few things that are not clear, could be, or is a problem. If you do not think they are, please respond to them.
I also think that is not a good idea to have the max_invalidations_per_refresh
for the user function and especially having it to default to 10 (which it does, from my understanding). For the policy runs, this is just a way to throttle the background workers, which is fine, but when calling this from the command line the result will be that the entire range is not refreshed and the user will have to type the command several times, which will be annoying. I think it might be better to add a parameter to the function which defaults to do a full refresh and ensure that the policy set a value for this parameter instead. This will make sure that policies can be throttled and have a good default value, while command-line calls will refresh the entire range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, don't have much to add, just a couple of comments
So, this seems to be a misunderstanding of what the changes do, and that's on me for failing to make clear and perhaps a badly named variable. Two things that I'd like to clarify:
I've tried to change the comments and variable names to make this more clear. |
86aa95d
to
4d22b64
Compare
4d22b64
to
2d36331
Compare
I understand what the variable do, but IMHO the problem is that the default behavior is to not process the entire range when it is "too complicated" (when we have a lot of buckets that are not connected). We had several comments before when |
Above the threshold we actually process more than necessary, so it is actually the opposite of your understanding. When you say we had comments about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification that the changes does not affect the range that the refresh function process. LGTM.
2e47dcc
to
f1569d1
Compare
The refreshing of a continuous aggregate is slow when many small invalidations are generated by frequent single row insert backfills. This change adds an optimization that merges small invalidations by first expanding invalidations to full bucket boundaries. There is really no reason to maintain invalidations that aren't covering full buckets since refresh windows are already aligned to buckets anyway. Fixes timescale#2867
When there are many small (e.g., single timestamp) invalidations that cannot be merged despite expanding invalidations to full buckets (e.g., invalidations are spread across every second bucket in the worst case), it might no longer be beneficial to materialize every invalidation separately. Instead, this change adds a threshold for the number of invalidations used by the refresh (currently 10 by default) above which invalidations are merged into one range based on the lowest and greatest invalidated time value. The limit can be controlled by an anonymous session variable for debugging and tweaking purposes. It might be considered for promotion to an official GUC in the future. Fixes timescale#2867
f1569d1
to
0fcdf01
Compare
@mkindahl @pmwkaa Changed the following based on feedback.
|
Good to hear. Then I misunderstood how it worked. |
This maintenance release contains bugfixes since the 2.0.2 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * #2926 Optimize cagg refresh for small invalidations **Bugfixes** * #2850 Set status for backend in background jobs * #2883 Fix join qual propagation for nested joins * #2884 Add GUC to control join qual propagation * #2885 Fix compressed chunk check when disabling compression * #2908 Fix changing column type of clustered hypertables * #2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * #2926 Optimize cagg refresh for small invalidations **Bugfixes** * #2850 Set status for backend in background jobs * #2883 Fix join qual propagation for nested joins * #2884 Add GUC to control join qual propagation * #2885 Fix compressed chunk check when disabling compression * #2908 Fix changing column type of clustered hypertables * #2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * #2926 Optimize cagg refresh for small invalidations **Bugfixes** * #2850 Set status for backend in background jobs * #2883 Fix join qual propagation for nested joins * #2884 Add GUC to control join qual propagation * #2885 Fix compressed chunk check when disabling compression * #2908 Fix changing column type of clustered hypertables * #2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * #2926 Optimize cagg refresh for small invalidations **Bugfixes** * #2850 Set status for backend in background jobs * #2883 Fix join qual propagation for nested joins * #2884 Add GUC to control join qual propagation * #2885 Fix compressed chunk check when disabling compression * #2908 Fix changing column type of clustered hypertables * #2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * timescale#2926 Optimize cagg refresh for small invalidations **Bugfixes** * timescale#2850 Set status for backend in background jobs * timescale#2883 Fix join qual propagation for nested joins * timescale#2884 Add GUC to control join qual propagation * timescale#2885 Fix compressed chunk check when disabling compression * timescale#2908 Fix changing column type of clustered hypertables * timescale#2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
This maintenance release contains bugfixes since the 2.0.1 release. We deem it high priority for upgrading. The bug fixes in this release address issues with joins, the status of background jobs, and disabling compression. It also includes enhancements to continuous aggregates, including improved validation of policies and optimizations for faster refreshes when there are a lot of invalidations. **Minor features** * #2926 Optimize cagg refresh for small invalidations **Bugfixes** * #2850 Set status for backend in background jobs * #2883 Fix join qual propagation for nested joins * #2884 Add GUC to control join qual propagation * #2885 Fix compressed chunk check when disabling compression * #2908 Fix changing column type of clustered hypertables * #2942 Validate continuous aggregate policy **Thanks** * @zeeshanshabbir93 for reporting the issue with full outer joins * @Antiarchitect for reporting the issue with slow refreshes of * @diego-hermida for reporting the issue about being unable to disable compression * @mtin for reporting the issue about wrong job status
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change. To notify background workers(i.e., refresh jobs) of changed settings, set the GUC context to PGC_SUSET.
PR timescale#2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change.
PR #2926 introduced a session-based configuration parameter for the CAgg refresh behavior. If more individual refreshes have to be carried out than specified by this setting, a refresh for a larger window is performed. It is mentioned in the original PR that this setting should be converted into a GUC later. This PR performs the proposed change.
This PR includes two changes:
Change 1:
The refreshing of a continuous aggregate is slow when many small
invalidations are generated by frequent single row insert
backfills. This change adds an optimization that merges small
invalidations by first expanding invalidations to full bucket
boundaries. There is really no reason to maintain invalidations that
aren't covering full buckets since refresh windows are already aligned
to buckets anyway.
Change 2:
When there are many small (e.g., single timestamp) invalidations that
cannot be merged despite expanding invalidations to full buckets
(e.g., invalidations are spread across every second bucket in the
worst case), it might no longer be beneficial to materialize every
invalidation separately.
Instead, this change adds a threshold for the number of invalidations
used by the refresh (currently 10 by default) above which
invalidations are merged into one range based on the lowest and
greatest invalidated time value.
The limit can be controlled by an anonymous session variable for
debugging and tweaking purposes. It might be considered for promotion
to an official GUC in the future.
Fixes #2867