New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The active job count is sometimes wrong #2268
Comments
Actually, this is by design. For a very small amount of time it could happen that a "marker" is moved to the active set, and then quickly removed. We need this marker in order to signal workers that there are priority jobs available. It should not be off more than 1 AFAIK. |
Yeah, we cannot use sets unfortunatelly, since the only command that blocks and moves an item from a list to another is BRPOPLPUSH. But I think it is strange that you get so many markers, that does not look right. By any chance could you create a test where this happens? |
Unfortunately, I don't know a sure way of reproducing it. I was trying to find a pattern of it happening but couldn't find any. But, how the queuing is handled in my case is the follows:
Just trying to give you an idea of the architecture in case you can get any clues. Btw, where does it make sure only one marker gets inserted at a time? |
@haqadn I have identified an scenario where this happens. I have a possible solution for this, but it requires some architectural changes, they are not big and they will actually improve the performance overall, but it will take some time to put in place. Maybe by the end of the next week we will have something. A possible workaround if you do not check the count too often is to get all the items in the active list and then count all that are different from 0:0. The key holding the active list is in |
Thanks @manast for looking into it. I am actually already using this workaround. We do need to check the counts and do it 20-30 times a minute for a few queues. However, this is holding up well so far. Looking forward to seeing the actual fix implemented! |
This should be resolved now in v5. |
Version
4.11.4
Platform
NodeJS
What happened?
The active job count is inflated.
How to reproduce.
It's not reproducable 100% of the time.
However, it happens when there is 20+ concurrent workers consuming the queue and a prioritized job fails. The
bull:<queue-name>:active
list gets appended with a0:0
value. The active count would include items with this value.Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: