Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The active job count is sometimes wrong #2268

Closed
1 task done
haqadn opened this issue Nov 8, 2023 · 8 comments
Closed
1 task done

[Bug]: The active job count is sometimes wrong #2268

haqadn opened this issue Nov 8, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@haqadn
Copy link

haqadn commented Nov 8, 2023

Version

4.11.4

Platform

NodeJS

What happened?

The active job count is inflated.

How to reproduce.

It's not reproducable 100% of the time.

However, it happens when there is 20+ concurrent workers consuming the queue and a prioritized job fails. The bull:<queue-name>:active list gets appended with a 0:0 value. The active count would include items with this value.

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@haqadn haqadn added the bug Something isn't working label Nov 8, 2023
@manast
Copy link
Contributor

manast commented Nov 8, 2023

Actually, this is by design. For a very small amount of time it could happen that a "marker" is moved to the active set, and then quickly removed. We need this marker in order to signal workers that there are priority jobs available. It should not be off more than 1 AFAIK.

@haqadn
Copy link
Author

haqadn commented Nov 8, 2023

It does go beyond 1. This screenshot is from a live database.
image

It poses a problem in my use-case since we scale up and down the number of workers based on the active+waiting+prioritized job count.

I guess this wouldn't be a problem if set was being used instead of list.

@manast
Copy link
Contributor

manast commented Nov 8, 2023

Yeah, we cannot use sets unfortunatelly, since the only command that blocks and moves an item from a list to another is BRPOPLPUSH. But I think it is strange that you get so many markers, that does not look right. By any chance could you create a test where this happens?

@haqadn
Copy link
Author

haqadn commented Nov 8, 2023

Unfortunately, I don't know a sure way of reproducing it. I was trying to find a pattern of it happening but couldn't find any. But, how the queuing is handled in my case is the follows:

  1. There is an HTTP server that receives requests and creates jobs.
  2. The jobs are pushed to a redis cluster with 1 read/write and a few read-only replicas.
  3. Some workers that can horizontally scale. We try to keep the number of workers same as active+waiting/prioritized so that the jobs get processed in minimum wait time.
  4. If a job fails, it is retried a few times.

Just trying to give you an idea of the architecture in case you can get any clues.

Btw, where does it make sure only one marker gets inserted at a time?

@roggervalf
Copy link
Collaborator

@manast
Copy link
Contributor

manast commented Nov 9, 2023

@haqadn I have identified an scenario where this happens. I have a possible solution for this, but it requires some architectural changes, they are not big and they will actually improve the performance overall, but it will take some time to put in place. Maybe by the end of the next week we will have something. A possible workaround if you do not check the count too often is to get all the items in the active list and then count all that are different from 0:0. The key holding the active list is in bull:myqueuename:active.

@haqadn
Copy link
Author

haqadn commented Nov 9, 2023

Thanks @manast for looking into it. I am actually already using this workaround. We do need to check the counts and do it 20-30 times a minute for a few queues. However, this is holding up well so far.

Looking forward to seeing the actual fix implemented!

@manast
Copy link
Contributor

manast commented Dec 21, 2023

This should be resolved now in v5.

@manast manast closed this as completed Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants