Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Concurrency Lock / Not In Active State Errors #1028

Closed
lukepolo opened this issue Jan 26, 2022 · 6 comments
Closed

High Concurrency Lock / Not In Active State Errors #1028

lukepolo opened this issue Jan 26, 2022 · 6 comments

Comments

@lukepolo
Copy link
Contributor

We are getting a ton of errors when we have a high amount of jobs and a high number of concurrency : (600+)

2022-01-26T15:15:25.691Z cloud:BullMQQueue <5> worker error: Job 7726161 is not in the active state. delayed
2022-01-26T15:13:51.000Z cloud:BullMQQueue <5> worker error: Missing lock for job 15853622. failed

Do we know why these can happen, is there anything I can do to prevent this from happening?

@manast
Copy link
Contributor

manast commented Jan 27, 2022

Have you checked if you have many stalled jobs? Maybe when you have so much concurrency the CPU is being saturated and is not able to keep up with maintaining the locks, then the jobs will stall and move back to wait, if the original job that was stalled continues to work then it will eventually fail with that missing lock error since it does not own the lock anymore. The jobs may have completed already by another worker anyway.

@manast
Copy link
Contributor

manast commented Jan 27, 2022

So basically the recommendation here would be to reduce the concurrency per worker and instead add more physical workers.

@lukepolo
Copy link
Contributor Author

Yah we’re going to take a closer look glad to see we were in the right track

@lukepolo
Copy link
Contributor Author

could this have been related?

#1064

We still get these even only very low CPU (on my local machine with 5 concurrency and ~50 jobs)

@manast
Copy link
Contributor

manast commented Feb 13, 2022

@lukepolo Are you using flows or just standard jobs? any chance to produce a simple test case that reproduces the issue so that we can take a deeper look at it?

@lukepolo
Copy link
Contributor Author

Yup I’ll make a repo to recreate it this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants