Fix tuple concurrently deleted error with multiple continuous aggregates #2051

gayyappan · 2020-07-01T22:13:19Z

When we have multiple continuous aggregates defined on
the same hypertable, they could try to delete the hypertable
invalidation logs concurrently. Resolve this by serializing
invalidation log deletes by hypertable id.

Fixes #1940

codecov · 2020-07-01T22:27:11Z

Codecov Report

Merging #2051 into master will increase coverage by 0.12%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master    #2051      +/-   ##
==========================================
+ Coverage   90.08%   90.20%   +0.12%     
==========================================
  Files         211      211              
  Lines       33679    33644      -35     
==========================================
+ Hits        30340    30349       +9     
+ Misses       3339     3295      -44

Impacted Files	Coverage Δ
tsl/src/continuous_aggs/materialize.c	`92.90% <86.66%> (-0.18%)`	⬇️
src/loader/bgw_message_queue.c	`84.51% <0.00%> (-2.59%)`	⬇️
src/import/planner.c	`70.30% <0.00%> (+11.12%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d649d49...1b22ac8. Read the comment docs.

JLockerman · 2020-07-02T15:53:47Z

tsl/src/continuous_aggs/materialize.c

@@ -793,6 +806,11 @@ materialization_invalidation_log_get_range(int32 materialization_id, Oid type, i
 	return invalidation_range;
 }

+/* We already locked the materialization_invalidation_log table in
+ * ShareRowExclusiveLock mode for the Session in the first txn.


where do we take a session lock for the materialization_invalidation_log? The only ones I see we take, and the only ones mentioned in the comment on line 156 are: the materialization table, the partial view and the raw hypertable...

fixed the comment. It should be session lock for materialization table.

What is a session lock? I guess it is confusing because it sounds like the lock is held for the duration of the session. Is it even possible to hold locks across multiple transactions unless it is an advisory lock? The documentation says: "Once acquired, a lock is normally held until the end of the transaction."

the session lock is for the duration of the materialization - spans the transactions.

cevian

Looks ok as long as the catalog scan does not release the row lock

cevian · 2020-07-02T19:01:41Z

tsl/src/continuous_aggs/materialize.c

+							   .enabled = true,
+						   } };
+	/* acquire lock explicitly.
+	 * the catalog scanning code , releases it at end of the call


is the row lock also released at by catalog scanning code, or no?

tuple locks are released at the end of the txn.

erimatnor

I didn't understand the following sentence in the commit message: "Resolve this be serializing
invalidation log deletes by hypertable id." You might want to take a look at that.

Also think we should always lock instead of conditionally based on whether there is more than one continuous aggregate. I imagine a second aggregate can be concurrently created and refreshed, which means we end up with the same situation although this is not visible to the first job.

You should also take a look at how you are handling locking on the invalidation threshold table. I believe there is an issue there since you are keeping the row-exclusive lock on the table until the end of the TX (in addition to the tuple lock). This seems to negate the purpose of the tuple lock and is much more heavy-handed.

erimatnor · 2020-07-03T06:59:06Z

tsl/src/continuous_aggs/materialize.c

+{
+	if (tinfo->lockresult != TM_Ok)
+	{
+		ereport(ERROR,


Not sure we should throw a generic error for all lock results here. Let's say we get TM_Updated (the row was updated by other tx)---this doesn't seem unlikely and is not necessarily "unexpected". Perhaps we should have a less jarring error for that case at least?

Anything other than TM_Ok is not valid here. Knowing the specific failure doesn't seem to be helpful since we have to abort the transaction anyway.

But the user might see this error and an update doesn't seem like it would be "unexpected". Maybe not common, but not unexpected either.

erimatnor · 2020-07-03T07:01:15Z

tsl/src/continuous_aggs/materialize.c

+				Int32GetDatum(raw_hypertable_id));
+
+	/* lock table in RowExclusive mode and the row with AccessExclusive */
+	ScannerCtx scanctx = { .table = catalog_get_table_id(catalog,


mixed declaration and code. I suggest you move this above the ScanKeyInit. It should be safe to initialize the scankey after this declaration even if the scankey is referenced by the scanctx.

erimatnor · 2020-07-03T07:03:21Z

tsl/src/continuous_aggs/materialize.c

+	invalidation_threshold_table_relation =
+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   RowExclusiveLock);
+	ts_scanner_scan(&scanctx);


I suggest you check the return value here (num tuples found) and raise an error in case of not 1.

erimatnor · 2020-07-03T07:09:44Z

tsl/src/continuous_aggs/materialize.c

 	caggs = ts_continuous_aggs_find_by_raw_table_id(cagg_data.raw_hypertable_id);
+	if (list_length(caggs) > 1)


Isn't it more safe to always lock here? Could there be a case when cagg_2 is created in another transaction, but not yet visible to this transaction? If cagg_2 is immediately refreshed, I imagine that you won't lock in that case, although you still have concurrent refreshes.

erimatnor · 2020-07-03T07:13:35Z

tsl/src/continuous_aggs/materialize.c

+		/* REFRESH cagg_1; REFRESH cagg_2 can try to read+delete the
+		 * invalidation logs for the same hypertable.
+		 * If both REFRESH jobs try to delete the
+		 * same set of rows, we run into "tuple concurrently deleted error".


I guess we could in theory handle this without locking. A concurrently deleted tuple might not strictly be an error since it doesn't really matter which job deleted the tuple. If the job that experience the error aimed to delete the tuple itself, then I guess it is fine for that job to proceed as if it actually succeeded.

This can, of course, be left as a future optimization. But I still figured it is worth pointing this out. Maybe leave a comment.

erimatnor · 2020-07-03T07:19:00Z

tsl/src/continuous_aggs/materialize.c

+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   RowExclusiveLock);
+	ts_scanner_scan(&scanctx);
+	table_close(invalidation_threshold_table_relation, NoLock); /*release lock at end


I don't understand why we keep the table lock here (in RowExclusiveLock) mode. That seems to negate the purpose of taking the tuple lock. I guess RowExclusiveLock doesn't conflict with itself, but does it block other stuff that we shouldn't block?

Or, am I missing something here?

changed to AccessShare based on our discssion.

Lock still held here. I thought that the table lock was going to be released here since tuple lock is held and that is the actual serialization point.

I misunderstood that - modified so that we only have the tuple lock.

erimatnor · 2020-07-03T07:25:27Z

tsl/src/continuous_aggs/materialize.c

+	 */
+	invalidation_threshold_table_relation =
+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   RowExclusiveLock);


Why is this lock RowExclusiveLock? Are we modifying this table in addition to taking the tuple lock? I suggest adding a comment on why this is the appropriate lock mode since it is non-obvious.

k-rus

I am not sure it is safe to not lock with one cagg. Also will the test be deterministic?

k-rus · 2020-07-03T06:45:56Z

tsl/src/continuous_aggs/materialize.c

+		 * If both REFRESH jobs try to delete the
+		 * same set of rows, we run into "tuple concurrently deleted error".
+		 * (github issue 1940)
+		 * Prevent this by serializing on the raw hypertable row


Suggested change

* Prevent this by serializing on the raw hypertable row

* Prevent this by serializing on the raw hypertable row.

k-rus · 2020-07-03T06:46:53Z

tsl/src/continuous_aggs/materialize.c

+	return SCAN_DONE;
+}
+
+/* lock row corresponding to hypertable id in


Suggested change

/* lock row corresponding to hypertable id in

/* Lock row corresponding to hypertable id in

k-rus · 2020-07-03T06:54:00Z

tsl/src/continuous_aggs/materialize.c

+
+/* lock row corresponding to hypertable id in
+ * continuous_aggs_invalidation_threshold table in AccessExclusive mode
+ * block till lock is acquired.


I don't follow the comment. Specifically, block what?

Is it?:

Suggested change

* block till lock is acquired.

* block continuous_aggs_invalidation_threshold table till lock is acquired.

The process is waiting on the lock till it is acquired.

Thank you for clarifying. I suggest to fix the grammar, so it is easier to read later. I guess comma or and is missing.

k-rus · 2020-07-03T07:09:12Z

tsl/src/continuous_aggs/materialize.c

 	caggs = ts_continuous_aggs_find_by_raw_table_id(cagg_data.raw_hypertable_id);
+	if (list_length(caggs) > 1)


Do I understand correctly that the lock is only obtained when there are more than one cagg is defined, so there is a risk of concurrency? What's about a corner case when second cagg is created after materialising the first one passed this line? Can it be a race condition if meterializing the first one is slow or interrupted?
Or is this condition to deal with a deadlock?

@k-rus @erimatnor Yes, the corner case is possible. But not fully convinced that we should deal with the error for the corner case. The locks have a concurrency cost. So acquiring a lock only if it is necessary to do so seems to be a good trade off to me. ( There is an additional optimization possible here : a first time cagg does not have to deal with any invalidations since it has to fully materialize all the data. But we don't optimize for that case. Then this code might actually not be exercised in this scenario).

Do we know how big this cost is and do we throw a reasonable error when we hit the corner case? If not, I think we should cover the corner-case too and care about "correctness" prior to optimizing. We can always optimize further at a later time.

I would like to add: which PG processes will be affected by this cost? Will insertion to hypertable or querying hypertable be affected by this concurrency overhead?

k-rus · 2020-07-03T07:12:25Z

tsl/src/continuous_aggs/materialize.c

+	/* acquire lock explicitly.
+	 * the catalog scanning code , releases it at end of the call


Since they are complete sentences:

Suggested change

/* acquire lock explicitly.

* the catalog scanning code , releases it at end of the call

/* Acquire lock explicitly.

* The catalog scanning code , releases it at end of the call.

k-rus · 2020-07-03T07:25:54Z

tsl/src/continuous_aggs/materialize.c

+	ScannerCtx scanctx = { .table = catalog_get_table_id(catalog,
+														 CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),


Will it be more efficient, but still correct to call catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD) once, store in a variable, and use in both places, here and below?

k-rus · 2020-07-03T07:29:14Z

tsl/src/continuous_aggs/materialize.c

+				Int32GetDatum(raw_hypertable_id));
+
+	/* lock table in RowExclusive mode and the row with AccessExclusive */
+	ScannerCtx scanctx = { .table = catalog_get_table_id(catalog,


This declaration is placed after calling ScanKeyInit. Would it be better to do all declarations before the call?

IIRC we said that is okay to mix declarations and code. We are on C99.

More of a suggestion. But FWIW, we haven't used mixed-declaration-and-code in most of our other code. Not the appropriate place to discuss decisions. But this is suggested by two reviewers and seems like an easy fix.

I think the struct initialization is more readable here. scanctx depends on scankey. That's why there is mixed declaration in this block

get_lowest_invalidated_time_for_hypertable contains similar code with variable declaration first:

ScanKeyData scankey[1]; ScannerCtx scanctx; ScanKeyInit(&scankey[0], Anum_continuous_aggs_invalidation_threshold_pkey_hypertable_id, BTEqualStrategyNumber, F_INT4EQ, Int32GetDatum(ts_hypertable_relid_to_id(hypertable_relid))); scanctx = (ScannerCtx){ .table = catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD), .index = catalog_get_index(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD_PKEY), .nkeys = 1, .scankey = scankey, .tuple_found = &invalidation_tuple_found, .filter = NULL, .data = &min_val, .lockmode = AccessShareLock, .scandirection = ForwardScanDirection, .result_mctx = NULL, };

I don't have strong opinion, but there might be a test, which will fail declaration after code. However, I might be mistaking.

Thanks for the pointer.

k-rus · 2020-07-03T07:38:55Z

tsl/src/continuous_aggs/materialize.c

+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   RowExclusiveLock);
+	ts_scanner_scan(&scanctx);
+	table_close(invalidation_threshold_table_relation, NoLock); /*release lock at end


I don't follow why a lock needs to be obtained if the entire table is locked until the end of the transaction.

I think I understand now that the tuple is locked for the session, which is longer than the transaction, right? Where is the lock on the tuple released?

tuple locks are released at the end of the transaction.

So is it really necessary to obtain the tuple lock if the table is already locked for the same duration?

Yes, since a concurrent transaction will not conflict on the table lock. The tuple lock helps serialize the transactions. This begs the question is the tuple lock sufficient? : I believe that you need a table lock before you acquire the tuple lock. (I am going to change the table lock from RowExclusive to AccessShare)

k-rus · 2020-07-03T07:48:25Z

tsl/test/isolation/expected/continuous_aggs_multi.out

+step UnlockInvRow: ROLLBACK;
+R1: LOG:  new materialization range not found for public.ts_continuous_test (time column time): not enough new data past completion threshold of 30 as of 29
+R1: LOG:  materializing continuous aggregate public.continuous_view_1: processing invalidations, no new range
+step Refresh1: <... completed>
+R2: LOG:  new materialization range not found for public.ts_continuous_test (time column time): not enough new data past completion threshold of 30 as of 29
+R2: LOG:  materializing continuous aggregate public.continuous_view_2: processing invalidations, no new range
+step Refresh2: <... completed>


Is it deterministic that Refresh1 will be executed first after the lock is released and then Refresh2?

No, cannot be guaranteed. But if Refresh1 starts before Refresh2, it would acquire the released lock first. The sessions are generally started in the order in which they appear in the spec file.

Does this test reproduce the bug before the fix?

All this test proves is the refresh gets serialized on the hypertable row lock. This is true for all the concurrency tests. They do not necessarily "catch" all bugs with the code.

k-rus · 2020-07-03T07:52:25Z

@gayyappan Is this PR related to any public issue?

@akamensky

This maintenance release contains bugfixes since the 1.7.1 release. We deem it medium priority for upgrading. **Features** * timescale#1877 Add support for fast pruning of inlined functions **Bugfixes** * timescale#1908 Fix drop_chunks with unique constraints when cascade_to_materializations is false * timescale#1915 Check for database in extension_current_state * timescale#1918 Unify chunk index creation * timescale#1932 Change compression locking order * timescale#1938 Fix gapfill locf treat_null_as_missing * timescale#1982 Check for disabled telemetry earlier * timescale#1984 Fix compression bit array left shift count * timescale#1997 Add checks for read-only transactions * timescale#2002 Reset restoring gucs rather than explicitly setting 'off' * timescale#2028 Fix locking in drop_chunks * timescale#2031 Enable compression for tables with compound foreign key * timescale#2039 Fix segfault in create_trigger_handler * timescale#2043 Fix segfault in cagg_update_view_definition * timescale#2046 Use index tablespace during chunk creation * timescale#2047 Better handling of chunk insert state destruction * timescale#2049 Fix handling of PlaceHolderVar in DecompressChunk * timescale#2051 Fix tuple concurrently deleted error with multiple continuous aggregates **Thanks** * @akamensky for reporting an issue with telemetry and an issue with drop_chunks * @darko408 for reporting an issue with decompression * @Dmitri191 for reporting an issue with failing background workers * @eduardotsj for reporting an issue with indexes not inheriting tablespace settings * @FourSeventy for reporting an issue with multiple continuous aggregrates * @fvannee for contributing optimizations for pruning inlined functions * @jflambert for reporting an issue with failing telemetry jobs * @nbouscal for reporting an issue with compression jobs locking referenced tables * @Nicolai6120 for reporting an issue with locf and treat_null_as_missing * @NomAnor for reporting an issue with expression index with table references * @Olernov for contributing a fix for compressing tables with compound foreign keys * @werjo for reporting an issue with drop_chunks and unique constraints

@akamensky

This maintenance release contains bugfixes since the 1.7.1 release. We deem it medium priority for upgrading. In particular the fixes contained in this maintenance release address bugs in continuous aggregates, drop_chunks and compression. **Features** * timescale#1877 Add support for fast pruning of inlined functions **Bugfixes** * timescale#1908 Fix drop_chunks with unique constraints when cascade_to_materializations is false * timescale#1915 Check for database in extension_current_state * timescale#1918 Unify chunk index creation * timescale#1932 Change compression locking order * timescale#1938 Fix gapfill locf treat_null_as_missing * timescale#1982 Check for disabled telemetry earlier * timescale#1984 Fix compression bit array left shift count * timescale#1997 Add checks for read-only transactions * timescale#2002 Reset restoring gucs rather than explicitly setting 'off' * timescale#2028 Fix locking in drop_chunks * timescale#2031 Enable compression for tables with compound foreign key * timescale#2039 Fix segfault in create_trigger_handler * timescale#2043 Fix segfault in cagg_update_view_definition * timescale#2046 Use index tablespace during chunk creation * timescale#2047 Better handling of chunk insert state destruction * timescale#2049 Fix handling of PlaceHolderVar in DecompressChunk * timescale#2051 Fix tuple concurrently deleted error with multiple continuous aggregates **Thanks** * @akamensky for reporting an issue with telemetry and an issue with drop_chunks * @darko408 for reporting an issue with decompression * @Dmitri191 for reporting an issue with failing background workers * @eduardotsj for reporting an issue with indexes not inheriting tablespace settings * @FourSeventy for reporting an issue with multiple continuous aggregrates * @fvannee for contributing optimizations for pruning inlined functions * @jflambert for reporting an issue with failing telemetry jobs * @nbouscal for reporting an issue with compression jobs locking referenced tables * @Nicolai6120 for reporting an issue with locf and treat_null_as_missing * @NomAnor for reporting an issue with expression index with table references * @Olernov for contributing a fix for compressing tables with compound foreign keys * @werjo for reporting an issue with drop_chunks and unique constraints

erimatnor

Approving, but I suggest you go through the nits and try to address them.

erimatnor · 2020-07-03T18:07:52Z

tsl/src/continuous_aggs/materialize.c

+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("unexpected return value %d in lock_invalidation_threhold_hypertable_row",


Using the function name in error messages is redundant as elog already tracks this and the function and line number can be shown if the right log configuration is set. I think it is better to formulate a good error message to describe the error, e.g.: "duplicate catalog entry for hypertable "%s"". As a last resort, use the __func__ macro as that also handles function name changes.

erimatnor · 2020-07-03T18:11:50Z

tsl/src/continuous_aggs/materialize.c

+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   AccessShareLock);
+	retcnt = ts_scanner_scan(&scanctx);
+	if (retcnt > 1)


Is anything but 1 expected here? Isn't 0 (or less) also errors?

could be 0 if there are no entries in the table yet for that hypertable.

erimatnor · 2020-07-03T18:16:26Z

tsl/src/continuous_aggs/materialize.c

+		table_open(catalog_get_table_id(catalog, CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),
+				   RowExclusiveLock);
+	ts_scanner_scan(&scanctx);
+	table_close(invalidation_threshold_table_relation, NoLock); /*release lock at end


Lock still held here. I thought that the table lock was going to be released here since tuple lock is held and that is the actual serialization point.

When we have multiple continuous aggregates defined on the same hypertable, they could try to delete the hypertable invalidation logs concurrently. Resolve this by serializing invalidation log deletes by raw hypertable id. Fixes timescale#1940

@akamensky

This maintenance release contains bugfixes since the 1.7.1 release. We deem it medium priority for upgrading. In particular the fixes contained in this maintenance release address bugs in continuous aggregates, drop_chunks and compression. **Features** * timescale#1877 Add support for fast pruning of inlined functions **Bugfixes** * timescale#1908 Fix drop_chunks with unique constraints when cascade_to_materializations is false * timescale#1915 Check for database in extension_current_state * timescale#1918 Unify chunk index creation * timescale#1932 Change compression locking order * timescale#1938 Fix gapfill locf treat_null_as_missing * timescale#1982 Check for disabled telemetry earlier * timescale#1984 Fix compression bit array left shift count * timescale#1997 Add checks for read-only transactions * timescale#2002 Reset restoring gucs rather than explicitly setting 'off' * timescale#2028 Fix locking in drop_chunks * timescale#2031 Enable compression for tables with compound foreign key * timescale#2039 Fix segfault in create_trigger_handler * timescale#2043 Fix segfault in cagg_update_view_definition * timescale#2046 Use index tablespace during chunk creation * timescale#2047 Better handling of chunk insert state destruction * timescale#2049 Fix handling of PlaceHolderVar in DecompressChunk * timescale#2051 Fix tuple concurrently deleted error with multiple continuous aggregates **Thanks** * @akamensky for reporting an issue with telemetry and an issue with drop_chunks * @darko408 for reporting an issue with decompression * @Dmitri191 for reporting an issue with failing background workers * @eduardotsj for reporting an issue with indexes not inheriting tablespace settings * @FourSeventy for reporting an issue with multiple continuous aggregrates * @fvannee for contributing optimizations for pruning inlined functions * @jflambert for reporting an issue with failing telemetry jobs * @nbouscal for reporting an issue with compression jobs locking referenced tables * @Nicolai6120 for reporting an issue with locf and treat_null_as_missing * @NomAnor for reporting an issue with expression index with table references * @Olernov for contributing a fix for compressing tables with compound foreign keys * @werjo for reporting an issue with drop_chunks and unique constraints

@akamensky

This maintenance release contains bugfixes since the 1.7.1 release. We deem it medium priority for upgrading. In particular the fixes contained in this maintenance release address bugs in continuous aggregates, drop_chunks and compression. **Features** * #1877 Add support for fast pruning of inlined functions **Bugfixes** * #1908 Fix drop_chunks with unique constraints when cascade_to_materializations is false * #1915 Check for database in extension_current_state * #1918 Unify chunk index creation * #1932 Change compression locking order * #1938 Fix gapfill locf treat_null_as_missing * #1982 Check for disabled telemetry earlier * #1984 Fix compression bit array left shift count * #1997 Add checks for read-only transactions * #2002 Reset restoring gucs rather than explicitly setting 'off' * #2028 Fix locking in drop_chunks * #2031 Enable compression for tables with compound foreign key * #2039 Fix segfault in create_trigger_handler * #2043 Fix segfault in cagg_update_view_definition * #2046 Use index tablespace during chunk creation * #2047 Better handling of chunk insert state destruction * #2049 Fix handling of PlaceHolderVar in DecompressChunk * #2051 Fix tuple concurrently deleted error with multiple continuous aggregates **Thanks** * @akamensky for reporting an issue with telemetry and an issue with drop_chunks * @darko408 for reporting an issue with decompression * @Dmitri191 for reporting an issue with failing background workers * @eduardotsj for reporting an issue with indexes not inheriting tablespace settings * @FourSeventy for reporting an issue with multiple continuous aggregrates * @fvannee for contributing optimizations for pruning inlined functions * @jflambert for reporting an issue with failing telemetry jobs * @nbouscal for reporting an issue with compression jobs locking referenced tables * @Nicolai6120 for reporting an issue with locf and treat_null_as_missing * @NomAnor for reporting an issue with expression index with table references * @Olernov for contributing a fix for compressing tables with compound foreign keys * @werjo for reporting an issue with drop_chunks and unique constraints

@akamensky

This maintenance release contains bugfixes since the 1.7.1 release. We deem it medium priority for upgrading. In particular the fixes contained in this maintenance release address bugs in continuous aggregates, drop_chunks and compression. **Features** * #1877 Add support for fast pruning of inlined functions **Bugfixes** * #1908 Fix drop_chunks with unique constraints when cascade_to_materializations is false * #1915 Check for database in extension_current_state * #1918 Unify chunk index creation * #1932 Change compression locking order * #1938 Fix gapfill locf treat_null_as_missing * #1982 Check for disabled telemetry earlier * #1984 Fix compression bit array left shift count * #1997 Add checks for read-only transactions * #2002 Reset restoring gucs rather than explicitly setting 'off' * #2028 Fix locking in drop_chunks * #2031 Enable compression for tables with compound foreign key * #2039 Fix segfault in create_trigger_handler * #2043 Fix segfault in cagg_update_view_definition * #2046 Use index tablespace during chunk creation * #2047 Better handling of chunk insert state destruction * #2049 Fix handling of PlaceHolderVar in DecompressChunk * #2051 Fix tuple concurrently deleted error with multiple continuous aggregates **Thanks** * @akamensky for reporting an issue with telemetry and an issue with drop_chunks * @darko408 for reporting an issue with decompression * @Dmitri191 for reporting an issue with failing background workers * @eduardotsj for reporting an issue with indexes not inheriting tablespace settings * @FourSeventy for reporting an issue with multiple continuous aggregrates * @fvannee for contributing optimizations for pruning inlined functions * @jflambert for reporting an issue with failing telemetry jobs * @nbouscal for reporting an issue with compression jobs locking referenced tables * @Nicolai6120 for reporting an issue with locf and treat_null_as_missing * @NomAnor for reporting an issue with expression index with table references * @Olernov for contributing a fix for compressing tables with compound foreign keys * @werjo for reporting an issue with drop_chunks and unique constraints

JLockerman reviewed Jul 2, 2020

View reviewed changes

gayyappan changed the title ~~(WIP) github issue 1940~~ Fix tuple concurrently deleted error with multiple continuous aggregates Jul 2, 2020

gayyappan force-pushed the cagg_multi branch from 79c76f5 to 25f9082 Compare July 2, 2020 18:52

cevian approved these changes Jul 2, 2020

View reviewed changes

gayyappan force-pushed the cagg_multi branch 2 times, most recently from 4dfd050 to 3b5a760 Compare July 2, 2020 20:56

gayyappan marked this pull request as ready for review July 2, 2020 20:58

gayyappan requested a review from a team as a code owner July 2, 2020 20:58

gayyappan requested review from mkindahl, k-rus, WireBaron, erimatnor and svenklemm and removed request for a team July 2, 2020 20:58

erimatnor requested changes Jul 3, 2020

View reviewed changes

erimatnor reviewed Jul 3, 2020

View reviewed changes

k-rus reviewed Jul 3, 2020

View reviewed changes

gayyappan force-pushed the cagg_multi branch 2 times, most recently from bca00aa to 9d5d807 Compare July 3, 2020 15:45

gayyappan requested review from erimatnor and k-rus July 3, 2020 15:54

gayyappan force-pushed the cagg_multi branch from 9d5d807 to f9e2640 Compare July 3, 2020 18:00

erimatnor approved these changes Jul 3, 2020

View reviewed changes

erimatnor added the bug label Jul 3, 2020

erimatnor added this to the 1.7.2 milestone Jul 3, 2020

gayyappan force-pushed the cagg_multi branch 2 times, most recently from 7786b8d to b6bb989 Compare July 3, 2020 21:53

gayyappan force-pushed the cagg_multi branch from b6bb989 to 1b22ac8 Compare July 3, 2020 22:28

gayyappan merged commit 43edbf8 into timescale:master Jul 4, 2020

svenklemm mentioned this pull request Jul 4, 2020

Release 1.7.2 #2056

Merged

		caggs = ts_continuous_aggs_find_by_raw_table_id(cagg_data.raw_hypertable_id);
		if (list_length(caggs) > 1)

	* Prevent this by serializing on the raw hypertable row
	* Prevent this by serializing on the raw hypertable row.

	/* lock row corresponding to hypertable id in
	/* Lock row corresponding to hypertable id in

	* block till lock is acquired.
	* block continuous_aggs_invalidation_threshold table till lock is acquired.

		/* acquire lock explicitly.
		* the catalog scanning code , releases it at end of the call

		ScannerCtx scanctx = { .table = catalog_get_table_id(catalog,
		CONTINUOUS_AGGS_INVALIDATION_THRESHOLD),

Fix tuple concurrently deleted error with multiple continuous aggregates #2051

Fix tuple concurrently deleted error with multiple continuous aggregates #2051

Conversation

gayyappan commented Jul 1, 2020 • edited

codecov bot commented Jul 1, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gayyappan Jul 3, 2020 • edited

Choose a reason for hiding this comment

cevian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor left a comment • edited

Choose a reason for hiding this comment

erimatnor Jul 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor Jul 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-rus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-rus Jul 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-rus commented Jul 3, 2020

erimatnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gayyappan commented Jul 1, 2020 •

edited

codecov bot commented Jul 1, 2020 •

edited

gayyappan Jul 3, 2020 •

edited

erimatnor left a comment •

edited

erimatnor Jul 3, 2020 •

edited

erimatnor Jul 3, 2020 •

edited

k-rus Jul 3, 2020 •

edited