-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job heartbeat support #436
Comments
@timgit I was thinking about this exact addition to As the OP mentioned a heartbeat column in the jobs table would allow for a maintenance task or other workers to detect the stuck job and perhaps move it into the What are your thoughts on this? I am happy to work on this update/feature as we have a a fairly urgent need. It should also benefit other |
+1 to this, would love to see pg-boss better recognize when a job has stopped for reasons such as the worker process crashing. |
Internal maintenance guarantees active jobs are retried or failed after their timeout/expiration. This behaves similarly to the visibility timeout in SQS. You should tune the expiration to what should be considered normal execution time as well. |
Well, you can but at the same time sometimes some task might take longer, then you can either increase the expected time to way longer than the normal execution time or risk same task be run multiple times. Heartbeat should reasonably easy to implement and would solve that issue at the cost of an extra column in the job table. |
If we had a field
last_heartbeat
, we could more quickly detect when a job fails due to process crashes, etc where it may not properly update to failed.One example would be to send a heartbeat every 15 seconds (setting the column to
NOW()
) and marking a job as failed/expired if the heartbeat is over 60 seconds old.The text was updated successfully, but these errors were encountered: