-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cagg watermark caching #2828
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2828 +/- ##
==========================================
+ Coverage 90.07% 90.19% +0.12%
==========================================
Files 212 212
Lines 34760 34721 -39
==========================================
+ Hits 31309 31316 +7
+ Misses 3451 3405 -46
Continue to review full report at Codecov.
|
0386e73
to
dd552a4
Compare
dd552a4
to
70a51e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
5f5fa8c
to
3646062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Made a few suggestions to add to the test.
8c8e9d6
to
00a18c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on the tests, I think it covers the situations. Would be good if there were a way to check that the cached variable is really gone after the transaction (e.g., a separate test function that calls watermark_valid
or so), but that is a stretch.
I thought about that, but it requires additional boilerplate code to essentially test that PostgreSQL memory context callbacks work the way they should. I did manually test this, FWIW. I think we mostly care about returning the "right" watermark (as opposed to being reset after a transaction), which is covered by tests by calling the function multiple times after changes in the materialization table. I think we care less about what happens outside a transaction. The only concern there would be a memory leak, I guess; but then we are testing memory contexts. |
The internal function `cagg_watermark` returns the last/max bucket in a continuous aggregate and is used to get the point where to union the raw and materialized data when using real-time aggregation. Since the function is marked `STABLE` the planner should be able to constify the function so that it need not be called repeatedly during execution. However, this optimization is not guaranteed and the function might be constified many times during planning if it occurs, e.g., as an index scan condition on many chunks. This leads to slow queries due to repeated calls to the function. Previously, the watermark was cached in the function's `fn_extra` state, but this state doesn't survive all repeated calls of the function and doesn't help if the function occurs many times in the same plan. To make the performance more reliable, the watermark is now cached in a global variable that is cleared at transaction end, when the input argument (materialized hypertable queried) changes, or a new command is executed. This should guarantee that the watermark only needs to be looked up once per query. Fixes timescale#2826
00a18c8
to
09bf94a
Compare
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when seting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script **Thanks** * @sgorsh for reporting an issue when using pgAdmin on windows * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @PhilippJust for reporting an issue with add_job and initial_start * @alex88 for reporting an issue with joined hypertables
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when seting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when seting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when seting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when seting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when setting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node * timescale#2866 Avoid partitionwise planning of partialize_agg * timescale#2868 Fix corruption in gapfill plan * timescale#2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when setting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node * timescale#2866 Avoid partitionwise planning of partialize_agg * timescale#2868 Fix corruption in gapfill plan * timescale#2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * timescale#2772 Always validate existing database and extension * timescale#2780 Fix config enum entries for remote data fetcher * timescale#2806 Add check for dropped chunk on update * timescale#2828 Improve cagg watermark caching * timescale#2842 Do not mark job as started when setting next_start field * timescale#2845 Fix continuous aggregate privileges during upgrade * timescale#2851 Fix nested loop joins that involve compressed chunks * timescale#2860 Fix projection in ChunkAppend nodes * timescale#2861 Remove compression stat update from update script * timescale#2865 Apply volatile function quals at decompresschunk node * timescale#2866 Avoid partitionwise planning of partialize_agg * timescale#2868 Fix corruption in gapfill plan * timescale#2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * #2772 Always validate existing database and extension * #2780 Fix config enum entries for remote data fetcher * #2806 Add check for dropped chunk on update * #2828 Improve cagg watermark caching * #2842 Do not mark job as started when setting next_start field * #2845 Fix continuous aggregate privileges during upgrade * #2851 Fix nested loop joins that involve compressed chunks * #2860 Fix projection in ChunkAppend nodes * #2861 Remove compression stat update from update script * #2865 Apply volatile function quals at decompresschunk node * #2866 Avoid partitionwise planning of partialize_agg * #2868 Fix corruption in gapfill plan * #2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * #2772 Always validate existing database and extension * #2780 Fix config enum entries for remote data fetcher * #2806 Add check for dropped chunk on update * #2828 Improve cagg watermark caching * #2838 Fix catalog repair in update script * #2842 Do not mark job as started when setting next_start field * #2845 Fix continuous aggregate privileges during upgrade * #2851 Fix nested loop joins that involve compressed chunks * #2860 Fix projection in ChunkAppend nodes * #2861 Remove compression stat update from update script * #2865 Apply volatile function quals at decompresschunk node * #2866 Avoid partitionwise planning of partialize_agg * #2868 Fix corruption in gapfill plan * #2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
The internal function
cagg_watermark
returns the last/max bucket ina continuous aggregate and is used to get the point where to union the
raw and materialized data when using real-time aggregation.
Since the function is marked
STABLE
the planner should be able toconstify the function so that it need not be called repeatedly during
execution. However, this optimization is not guaranteed and the
function might be constified many times during planning if it occurs,
e.g., as an index scan condition on many chunks. This leads to slow
queries due to repeated calls to the function.
Previously, the watermark was cached in the function's
fn_extra
state, but this state doesn't survive all repeated calls of the
function and doesn't help if the function occurs many times in the
same plan.
To make the performance more reliable, the watermark is now cached in
a global variable that is cleared at transaction end, when the input
argument (materialized hypertable queried) changes, or a new command
is executed. This should guarantee that the watermark only needs to be
looked up once per query.
Fixes #2826