-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: add_job or an alter_job can crash an unrelated running job #5537
Comments
In the main scheduler loop, the |
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
A simpler reproducible testcase that doesn't involve any tables or transactions : Create a proc that does nothing other than pg_sleeping and couple of jobs that use it : -- Proc that sleeps for 1m - to keep the test jobs in running state
CREATE OR REPLACE PROCEDURE proc_that_sleeps(job_id INT, config JSONB)
LANGUAGE PLPGSQL AS
$$
BEGIN
PERFORM pg_sleep(60);
END
$$;
-- create new jobs and ensure that the second one gets scheduled
-- before the first one by adjusting the initial_start values
SELECT add_job('proc_that_sleeps', '1h', initial_start => now()::timestamptz + interval '2s');
SELECT add_job('proc_that_sleeps', '1h', initial_start => now()::timestamptz - interval '2s'); Wait for the jobs to start running :
Now add create another job : SELECT add_job('proc_that_sleeps', '1h'); This will cause the job with id 1000 to fail :
|
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes timescale#5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes #5537
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes #5537 (cherry picked from commit a383c8d)
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted by job_id to one that is sorted by next_start time. Fix that by properly copying the scheduled_jobs list into a new list and use that for sorting. Fixes #5537 (cherry picked from commit a383c8d)
What type of bug is this?
Crash
What subsystems and features are affected?
Background worker, User-Defined Action (UDA)
What happened?
The start_scheduled_jobs function mistakenly sorts the
scheduled_jobs
list in-place. As a result, when the ts_update_scheduled_jobs_list function compares the updated list of scheduled jobs with the existing scheduled jobs list, it is comparing a list that is sorted byjob_id
to one that is sorted bynext_start
time.This inconsistency can lead to various issues, such as erroneously calling
terminate_and_cleanup_job
on a running job that doesn't require termination. This will cause the job to terminate with the following log :The job_errors table also reports the error :
TimescaleDB version affected
main (517dee9)
PostgreSQL version used
15.2
What operating system did you use?
Ubuntu 22.04
What installation method did you use?
Source
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
Not applicable
How can we reproduce the bug?
Session 1 - Run the following SQL script to create the procedure and related jobs
Session 2 - create a new job that is unrelated to the jobs above. When this is executed in a new session, it will crash the first job that started at
initial_start => now()::timestamptz + interval '2s'
.The text was updated successfully, but these errors were encountered: