Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cagg watermark caching #2828

Merged
merged 1 commit into from
Jan 18, 2021

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Jan 14, 2021

The internal function cagg_watermark returns the last/max bucket in
a continuous aggregate and is used to get the point where to union the
raw and materialized data when using real-time aggregation.

Since the function is marked STABLE the planner should be able to
constify the function so that it need not be called repeatedly during
execution. However, this optimization is not guaranteed and the
function might be constified many times during planning if it occurs,
e.g., as an index scan condition on many chunks. This leads to slow
queries due to repeated calls to the function.

Previously, the watermark was cached in the function's fn_extra
state, but this state doesn't survive all repeated calls of the
function and doesn't help if the function occurs many times in the
same plan.

To make the performance more reliable, the watermark is now cached in
a global variable that is cleared at transaction end, when the input
argument (materialized hypertable queried) changes, or a new command
is executed. This should guarantee that the watermark only needs to be
looked up once per query.

Fixes #2826

@codecov
Copy link

codecov bot commented Jan 14, 2021

Codecov Report

Merging #2828 (09bf94a) into master (19d3912) will increase coverage by 0.12%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2828      +/-   ##
==========================================
+ Coverage   90.07%   90.19%   +0.12%     
==========================================
  Files         212      212              
  Lines       34760    34721      -39     
==========================================
+ Hits        31309    31316       +7     
+ Misses       3451     3405      -46     
Impacted Files Coverage Δ
src/continuous_agg.c 90.64% <100.00%> (+0.25%) ⬆️
src/bgw/scheduler.c 82.59% <0.00%> (-0.89%) ⬇️
src/loader/bgw_message_queue.c 87.09% <0.00%> (-0.65%) ⬇️
src/import/planner.c 70.30% <0.00%> (+11.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19d3912...09bf94a. Read the comment docs.

@erimatnor erimatnor marked this pull request as ready for review January 14, 2021 13:31
@erimatnor erimatnor requested a review from a team as a code owner January 14, 2021 13:31
@erimatnor erimatnor requested review from pmwkaa, k-rus, svenklemm and mkindahl and removed request for a team, pmwkaa, k-rus, svenklemm and mkindahl January 14, 2021 13:31
@erimatnor erimatnor force-pushed the improve-cagg-watermark branch 5 times, most recently from 0386e73 to dd552a4 Compare January 14, 2021 15:09
@erimatnor erimatnor requested review from k-rus, pmwkaa, mkindahl and svenklemm and removed request for k-rus, pmwkaa, mkindahl and svenklemm January 14, 2021 15:49
Copy link
Contributor

@pmwkaa pmwkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

src/continuous_agg.c Show resolved Hide resolved
@erimatnor erimatnor force-pushed the improve-cagg-watermark branch 2 times, most recently from 5f5fa8c to 3646062 Compare January 15, 2021 13:11
Copy link
Contributor

@gayyappan gayyappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Made a few suggestions to add to the test.

@erimatnor erimatnor force-pushed the improve-cagg-watermark branch 2 times, most recently from 8c8e9d6 to 00a18c8 Compare January 18, 2021 11:46
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on the tests, I think it covers the situations. Would be good if there were a way to check that the cached variable is really gone after the transaction (e.g., a separate test function that calls watermark_valid or so), but that is a stretch.

src/continuous_agg.c Outdated Show resolved Hide resolved
src/continuous_agg.c Show resolved Hide resolved
tsl/test/sql/continuous_aggs_query.sql.in Show resolved Hide resolved
tsl/test/sql/continuous_aggs_query.sql.in Outdated Show resolved Hide resolved
@erimatnor
Copy link
Contributor Author

erimatnor commented Jan 18, 2021

Good job on the tests, I think it covers the situations. Would be good if there were a way to check that the cached variable is really gone after the transaction (e.g., a separate test function that calls watermark_valid or so), but that is a stretch.

I thought about that, but it requires additional boilerplate code to essentially test that PostgreSQL memory context callbacks work the way they should. I did manually test this, FWIW.

I think we mostly care about returning the "right" watermark (as opposed to being reset after a transaction), which is covered by tests by calling the function multiple times after changes in the materialization table. I think we care less about what happens outside a transaction. The only concern there would be a memory leak, I guess; but then we are testing memory contexts.

The internal function `cagg_watermark` returns the last/max bucket in
a continuous aggregate and is used to get the point where to union the
raw and materialized data when using real-time aggregation.

Since the function is marked `STABLE` the planner should be able to
constify the function so that it need not be called repeatedly during
execution. However, this optimization is not guaranteed and the
function might be constified many times during planning if it occurs,
e.g., as an index scan condition on many chunks. This leads to slow
queries due to repeated calls to the function.

Previously, the watermark was cached in the function's `fn_extra`
state, but this state doesn't survive all repeated calls of the
function and doesn't help if the function occurs many times in the
same plan.

To make the performance more reliable, the watermark is now cached in
a global variable that is cleared at transaction end, when the input
argument (materialized hypertable queried) changes, or a new command
is executed. This should guarantee that the watermark only needs to be
looked up once per query.

Fixes timescale#2826
@erimatnor erimatnor merged commit 472df05 into timescale:master Jan 18, 2021
@erimatnor erimatnor deleted the improve-cagg-watermark branch January 18, 2021 13:41
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 27, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when seting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script

**Thanks**
* @sgorsh for reporting an issue when using pgAdmin on windows
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @PhilippJust for reporting an issue with add_job and initial_start
* @alex88 for reporting an issue with joined hypertables
@svenklemm svenklemm mentioned this pull request Jan 27, 2021
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 27, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when seting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 27, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when seting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 27, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when seting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 27, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when seting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 28, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when setting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node
* timescale#2866 Avoid partitionwise planning of partialize_agg
* timescale#2868 Fix corruption in gapfill plan
* timescale#2874 Fix partitionwise agg crash due to uninitialized memory

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @markatosi for reporting a segfault with partitionwise aggregates enabled
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
* @WarriorOfWire for reporting the bug with gapfill queries not being
  able to find pathkey item to sort
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 28, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when setting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node
* timescale#2866 Avoid partitionwise planning of partialize_agg
* timescale#2868 Fix corruption in gapfill plan
* timescale#2874 Fix partitionwise agg crash due to uninitialized memory

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @markatosi for reporting a segfault with partitionwise aggregates enabled
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
* @WarriorOfWire for reporting the bug with gapfill queries not being
  able to find pathkey item to sort
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Jan 28, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* timescale#2772 Always validate existing database and extension
* timescale#2780 Fix config enum entries for remote data fetcher
* timescale#2806 Add check for dropped chunk on update
* timescale#2828 Improve cagg watermark caching
* timescale#2842 Do not mark job as started when setting next_start field
* timescale#2845 Fix continuous aggregate privileges during upgrade
* timescale#2851 Fix nested loop joins that involve compressed chunks
* timescale#2860 Fix projection in ChunkAppend nodes
* timescale#2861 Remove compression stat update from update script
* timescale#2865 Apply volatile function quals at decompresschunk node
* timescale#2866 Avoid partitionwise planning of partialize_agg
* timescale#2868 Fix corruption in gapfill plan
* timescale#2874 Fix partitionwise agg crash due to uninitialized memory

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @markatosi for reporting a segfault with partitionwise aggregates enabled
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
* @WarriorOfWire for reporting the bug with gapfill queries not being
  able to find pathkey item to sort
svenklemm added a commit that referenced this pull request Jan 28, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* #2772 Always validate existing database and extension
* #2780 Fix config enum entries for remote data fetcher
* #2806 Add check for dropped chunk on update
* #2828 Improve cagg watermark caching
* #2842 Do not mark job as started when setting next_start field
* #2845 Fix continuous aggregate privileges during upgrade
* #2851 Fix nested loop joins that involve compressed chunks
* #2860 Fix projection in ChunkAppend nodes
* #2861 Remove compression stat update from update script
* #2865 Apply volatile function quals at decompresschunk node
* #2866 Avoid partitionwise planning of partialize_agg
* #2868 Fix corruption in gapfill plan
* #2874 Fix partitionwise agg crash due to uninitialized memory

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @markatosi for reporting a segfault with partitionwise aggregates enabled
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
* @WarriorOfWire for reporting the bug with gapfill queries not being
  able to find pathkey item to sort
svenklemm added a commit that referenced this pull request Jan 28, 2021
This maintenance release contains bugfixes since the 2.0.0 release. We deem it
high priority for upgrading.

In particular the fixes contained in this maintenance release address issues
in continuous aggregates, compression, JOINs with hypertables and when
upgrading from previous versions.

**Bugfixes**
* #2772 Always validate existing database and extension
* #2780 Fix config enum entries for remote data fetcher
* #2806 Add check for dropped chunk on update
* #2828 Improve cagg watermark caching
* #2838 Fix catalog repair in update script
* #2842 Do not mark job as started when setting next_start field
* #2845 Fix continuous aggregate privileges during upgrade
* #2851 Fix nested loop joins that involve compressed chunks
* #2860 Fix projection in ChunkAppend nodes
* #2861 Remove compression stat update from update script
* #2865 Apply volatile function quals at decompresschunk node
* #2866 Avoid partitionwise planning of partialize_agg
* #2868 Fix corruption in gapfill plan
* #2874 Fix partitionwise agg crash due to uninitialized memory

**Thanks**
* @alex88 for reporting an issue with joined hypertables
* @brian-from-quantrocket for reporting an issue with extension update and dropped chunks
* @dhodyn for reporting an issue when joining compressed chunks
* @markatosi for reporting a segfault with partitionwise aggregates enabled
* @PhilippJust for reporting an issue with add_job and initial_start
* @sgorsh for reporting an issue when using pgAdmin on windows
* @WarriorOfWire for reporting the bug with gapfill queries not being
  able to find pathkey item to sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Real-time aggregation sometimes slow
4 participants