Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional error in QueueScheduler for repeatable jobs #731

Closed
velsa opened this issue Sep 1, 2021 · 2 comments
Closed

Occasional error in QueueScheduler for repeatable jobs #731

velsa opened this issue Sep 1, 2021 · 2 comments

Comments

@velsa
Copy link

velsa commented Sep 1, 2021

I have a repeatable job that is running twice a day and is set up via cron in QueueScheduler.

Every few days I get this error:

Error: Job repeat:fb04dbd25286bf65d1ae24bf67dc5918:1630412100000 is not in the active state. delayed
    at Function.finishedErrors (/app/node_modules/bullmq/src/classes/scripts.ts:250:16)
    at Job.moveToFailed (/app/node_modules/bullmq/src/classes/job.ts:499:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at handleFailed (/app/node_modules/bullmq/src/classes/worker.ts:368:9)
    at Worker.retryIfFailed (/app/node_modules/bullmq/src/classes/worker.ts:525:16)
    at Worker.run (/app/node_modules/bullmq/src/classes/worker.ts:185:28)

This triggers the job to run again, which obviously produces unwanted side effects.

I can see that the worker has finished successfully and it seems that the error is due to the scheduler not being able to move the job from active state to delayed state because it is already in the delayed state.

I am not sure what triggers this behavior, because it is not easily reproducible and happens only once or twice a week.

Any help on fixing/debugging this issue is appreciated.

@manast
Copy link
Contributor

manast commented Sep 2, 2021

Is it possible the job has stalled so that the QueueScheduler has moved it back to wait but later the worker (the first one that stalled) continues processing and completes, then you will get this error. If this is the case, and another worker picked the job and completed successfully, then the attempts field in the job hash should be > 0.
A way to avoid this would be to make sure that your worker is less CPU intensive, or take advantage of sandboxed processors.

@velsa
Copy link
Author

velsa commented Sep 3, 2021

Thanks a lot!
I've missed the part about sandboxed processors in the docs, will be using them from now on for my heavy image processing )

@velsa velsa closed this as completed Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants