Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lost concurrent CAgg updates #6443

Merged
merged 1 commit into from Jan 2, 2024

Conversation

jnidzwetzki
Copy link
Member

@jnidzwetzki jnidzwetzki commented Dec 19, 2023

When two CAggs on the same hypertable are refreshed at the same type, we had a race condition on the invalidation threshold table occur.

So far, the table has been locked with a non-self-conflicting lock. Therefore, both scanners ran at the same time, but only one was able to lock the threshold value with a proper tuple lock. The other scanner ignored this failing lock and just returned. Therefore, the field computed_invalidation_threshold was never populated and still contains 0.

So, invalidation_threshold_set_or_get returns and refresh end interval of 0. As a consequence, the if (refresh_window.start >= refresh_window.end) branch in continuous_agg_refresh_internal could be taken and we return from the refresh without doing any work.

This patch adds proper error reporting and also implements some retry logic (inspired by RelationFindReplTupleSeq) to avoid these problems. A self-conficting lock is not used due to the problems discussed in #5809.

Copy link

codecov bot commented Dec 19, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (28a1ebe) 87.33% compared to head (6df2989) 87.31%.

Files Patch % Lines
tsl/src/continuous_aggs/invalidation_threshold.c 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6443      +/-   ##
==========================================
- Coverage   87.33%   87.31%   -0.02%     
==========================================
  Files         187      187              
  Lines       41869    41834      -35     
  Branches     9320     9304      -16     
==========================================
- Hits        36567    36529      -38     
  Misses       3626     3626              
- Partials     1676     1679       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jnidzwetzki jnidzwetzki force-pushed the concurrent_cagg_refresh branch 4 times, most recently from 6434b30 to 21dc711 Compare December 19, 2023 14:07
@jnidzwetzki jnidzwetzki marked this pull request as ready for review December 19, 2023 14:27
Copy link

@gayyappan, @fabriziomello: please review this pull request.

Powered by pull-review

@jnidzwetzki jnidzwetzki force-pushed the concurrent_cagg_refresh branch 3 times, most recently from d3247b0 to 1702083 Compare December 20, 2023 10:46
@jnidzwetzki jnidzwetzki added this to the TimescaleDB 2.13.1 milestone Dec 21, 2023
When two CAggs on the same hypertable are refreshed at the same type, we
had a race condition on the invalidation threshold table occur.

So far, the table has been locked with a non-self-conflicting lock.
Therefore, both scanners ran at the same time, but only one was able to
lock the threshold value with a proper tuple lock. The other scanner
ignored this failing lock and just returned. Therefore, the field
computed_invalidation_threshold was never populated and still contains
0.

So, invalidation_threshold_set_or_get returns and refresh end interval
of 0. As a consequence, the `if (refresh_window.start >=
refresh_window.end)` branch in continuous_agg_refresh_internal could be
taken and we return from the refresh without doing any work.

This patch adds proper error reporting and also implements some retry
logic to avoid these problems. A self-conficting lock is not used due to
the problems discussed in timescale#5809.
@jnidzwetzki jnidzwetzki merged commit ac97c56 into timescale:main Jan 2, 2024
47 checks passed
@jnidzwetzki jnidzwetzki deleted the concurrent_cagg_refresh branch January 2, 2024 11:43
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
@jnidzwetzki jnidzwetzki mentioned this pull request Jan 3, 2024
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* #6365 Use numrows_pre_compression in approximate row count
* #6377 Use processed group clauses in PG16
* #6384 Change bgw_log_level to use PGC_SUSET
* #6393 Disable vectorized sum for expressions.
* #6408 Fix groupby pathkeys for gapfill in PG16
* #6428 Fix index matching during DML decompression
* #6439 Fix compressed chunk permission handling on PG16
* #6443 Fix lost concurrent CAgg updates
* #6454 Fix unique expression indexes on compressed chunks
* #6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* #6365 Use numrows_pre_compression in approximate row count
* #6377 Use processed group clauses in PG16
* #6384 Change bgw_log_level to use PGC_SUSET
* #6393 Disable vectorized sum for expressions.
* #6405 Read CAgg watermark from materialized data
* #6408 Fix groupby pathkeys for gapfill in PG16
* #6428 Fix index matching during DML decompression
* #6439 Fix compressed chunk permission handling on PG16
* #6443 Fix lost concurrent CAgg updates
* #6454 Fix unique expression indexes on compressed chunks
* #6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants