-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix subtransaction resource owner #5668
Conversation
4c735f6
to
87b7218
Compare
Codecov Report
@@ Coverage Diff @@
## main #5668 +/- ##
==========================================
+ Coverage 90.97% 90.99% +0.02%
==========================================
Files 230 230
Lines 54569 54568 -1
==========================================
+ Hits 49642 49653 +11
+ Misses 4927 4915 -12
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@fabriziomello, @nikkhils: please review this pull request.
|
Note that this can trigger an assertion inside |
It is difficult to create a good test. It was tested manually: mats@fury:~/work/timescale/timescaledb+telemetry/build-14$ psql
Expanded display is used automatically.
Null display is "[NULL]".
psql (14.6)
Type "help" for help.
mats=# SELECT id AS job_id FROM _timescaledb_config.bgw_job
WHERE proc_schema = '_timescaledb_internal'
AND proc_name = 'policy_telemetry' \gset
mats=# CALL run_job(:job_id);
NOTICE: the "timescaledb" extension is up-to-date
CALL |
@@ -346,7 +347,8 @@ calculate_next_start_on_failure(TimestampTz finish_time, int consecutive_failure | |||
consecutive_failures); | |||
Assert(consecutive_failures > 0 && multiplier < 63); | |||
|
|||
MemoryContext oldctx; | |||
MemoryContext oldctx = CurrentMemoryContext; | |||
ResourceOwner oldowner = CurrentResourceOwner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to place it inside the PG_TRY
- so that its guarded by the try?
I think in some odd cases BeginInternalSubTransaction
may error out...possibly leaving it in inconsistent state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... maybe. Will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can error out and it is better to move the call to BeginInternalSubTransaction()
inside the PG_TRY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see CurrentResourceOwner
changed anywhere - I'm just wondering how does that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is changed inside BeginInternalSubTransaction
, ReleaseCurrentSubTransaction
, and RollbackAndReleaseCurrentSubTransaction
.
e2ec356
to
02ebf97
Compare
|
||
if (mark) | ||
ts_bgw_job_stat_mark_end(job, had_error ? JOB_FAILURE : JOB_SUCCESS); | ||
ts_bgw_job_stat_mark_end(job, result ? JOB_SUCCESS : JOB_FAILURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this very puzzling...there is no contract for what implementations of job_main_func
supposed to live up to....
...and now this commit is just negating that...seems like ts_telemetry_main
is the only implementation ? 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but don't want to make larger changes than necessary since things have a tendency to break. Fine with refactoring code as a separate PR though.
@@ -346,7 +347,8 @@ calculate_next_start_on_failure(TimestampTz finish_time, int consecutive_failure | |||
consecutive_failures); | |||
Assert(consecutive_failures > 0 && multiplier < 63); | |||
|
|||
MemoryContext oldctx; | |||
MemoryContext oldctx = CurrentMemoryContext; | |||
ResourceOwner oldowner = CurrentResourceOwner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see CurrentResourceOwner
changed anywhere - I'm just wondering how does that change?
ErrorData *errdata = CopyErrorData(); | ||
ereport(LOG, | ||
(errcode(ERRCODE_INTERNAL_ERROR), | ||
errmsg("could not calculate next start on failure: resetting value"), | ||
errdetail("Error: %s.", errdata->message))); | ||
FlushErrorState(); | ||
RollbackAndReleaseCurrentSubTransaction(); | ||
} | ||
PG_END_TRY(); | ||
Assert(CurrentMemoryContext == oldctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense to move
MemoryContextSwitchTo(oldctx);
CurrentResourceOwner = oldowner;
from both try/catch branches to here - instead of this assert ?
it would make the CopyErrorData
use a different mem context; but I think that doesn't make much difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is following the pattern used in PostgreSQL to handle similar situations. Trying to "optimize" the code here runs the risk of introducing subtle errors. The comment on CopyErrorData says:
/*
* CopyErrorData --- obtain a copy of the topmost error stack entry
*
* This is only for use in error handler code. The data is copied into the
* current memory context, so callers should always switch away from
* ErrorContext first; otherwise it will be lost when FlushErrorState is done.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes ; I know - but the code uses the results right away in the next line to create an error message...so it doesn't escape
it could be left as-is
When executing a subtransaction using `BeginInternalSubTransaction` the memory context switches from the current context to `CurTransactionContext` and when the transaction is aborted or committed using `ReleaseCurrentSubTransaction` or `RollbackAndReleaseCurrentSubTransaction` respectively, it will not restore to the previous memory context or resource owner but rather use `TopTransactionContext`. Because of this, both the memory context and the resource owner will be wrong when executing `calculate_next_start_on_failure`, which causes `run_job` to generate an error when used with the telemetry job. This commit fixes this by saving both the resource owner and the memory context before starting the internal subtransaction and restoring it after finishing the internal subtransaction. Since the `ts_bgw_job_run_and_set_next_start` was incorrectly reading the wrong result from the telemetry job, this commit fixes this as well. Note that `ts_bgw_job_run_and_set_next_start` is only used when running the telemetry job, so it does not cause issues for other jobs.
8157026
to
1448cf4
Compare
This release includes these noteworthy features: * compressed hypertable enhancements: * UPDATE/DELETE support * ON CONFLICT DO UPDATE * Join support for hierarchical Continougs Aggregates * performance improvements **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * #5212 Allow pushdown of reference table joins * #5221 Improve Realtime Continuous Aggregate performance * #5252 Improve unique constraint support on compressed hypertables * #5339 Support UPDATE/DELETE on compressed hypertables * #5344 Enable JOINS for Hierarchical Continuous Aggregates * #5361 Add parallel support for partialize_agg() * #5417 Refactor and optimize distributed COPY * #5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * #5547 Skip Ordered Append when only 1 child node is present * #5510 Propagate vacuum/analyze to compressed chunks * #5584 Reduce decompression during constraint checking * #5530 Optimize compressed chunk resorting * #5639 Support sending telemetry event reports **Bugfixes** * #5396 Fix SEGMENTBY columns predicates to be pushed down * #5427 Handle user-defined FDW options properly * #5442 Decompression may have lost DEFAULT values * #5459 Fix issue creating dimensional constraints * #5570 Improve interpolate error message on datatype mismatch * #5573 Fix unique constraint on compressed tables * #5615 Add permission checks to run_job() * #5614 Enable run_job() for telemetry job * #5578 Fix on-insert decompression after schema changes * #5613 Quote username identifier appropriately * #5525 Fix tablespace for compressed hypertable and corresponding toast * #5642 Fix ALTER TABLE SET with normal tables * #5666 Reduce memory usage for distributed analyze * #5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages
When executing a subtransaction using
BeginInternalSubTransaction
the memory context switches from the current context toCurTransactionContext
and when the transaction is aborted or committed usingReleaseCurrentSubTransaction
orRollbackAndReleaseCurrentSubTransaction
respectively, it will not restore to the previous memory context or resource owner but rather useTopTransactionContext
. Because of this, both the memory context and the resource owner will be wrong when executingcalculate_next_start_on_failure
, which causesrun_job
to generate an error when used with the telemetry job.This commit fixes this by saving both the resource owner and the memory context before starting the internal subtransaction and restoring it after finishing the internal subtransaction.
Since the
ts_bgw_job_run_and_set_next_start
was incorrectly reading the wrong result from the telemetry job, this commit fixes this as well. Note thatts_bgw_job_run_and_set_next_start
is only used when running the telemetry job, so it does not cause issues for other jobs.