New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Groups Architecture/Docs #13
Comments
@ethanresnick thanks for the interest in the Pro version. Regarding performance what it means is that the performance is almost not affected by the number of groups, of course, the groups logic add a bit of extra work in the lua scripts, but the O complexity for picking the next job to process is at worst O(log n) as we use ZPOPMIN from redis to get the next group job. I will try to rewrite the text to be more precise on what is the complexity and added costs of using groups. Particularly options such as priority and lifo will ignore the group settings, so we are actually planning to throw an exception if "incompatible" options are used together with groups. |
Thanks @manast! This is very helpful. O(log n) is what I hoped for for picking the next group. I imagine that enqueuing and dequeuing with groups also just has some constant time overhead? I.e., it seems like there'd be work to do to move the job from the group into the main queue (when its time comes), and maybe some bookkeeping on enqueue? That's probably not relevant for my use case, but I'm just trying to make sure I have the correct mental model. Besides that, I do have a couple more questions about rate limiting with groups:
Beyond rate limiting:
|
Note: edited my message above to consolidate my questions and add one more. |
Please see my answers below.
Weighted round robin is not a rate limiter algorithm, but a scheduler algorithm. For rate limit we use just a counter per group and if the counter reaches its max setting for the time amount setting it rate limits the group until the key holding the counter expires.
That's currently not possible. It could be implemented though, it could be an additional API that you can use to manually rate limit a group for a certain amount of time.
Yes, that's how it works, however, you cannot specify a different rate limit setting for every group, in your example, all groups would be rate-limited at 5 jobs/second. So if you for example have 100 groups, then the maximum throughput would be 500 jobs/second for the whole queue.
If the task is retried it will be re-added to the group and processed until the group is not rate limited anymore.
Yes, you just add the job with the groupId and then you have a new group. If all the jobs for a given groupId has been processed then that group does not consume any space anymore.
The priority/lifo are global settings, not group dependant, as such if you need this functionality you can just use a Queue without groups. Implementing priority inside groups is very very complex so this is not something we are planning to do. Lifo on the other hand is possible. |
Hi @manast, thanks for your detailed answers! And apologies for my slow reply; I had to switch to working on another feature, but I'm now back to work on the feature for which I'd use BullMQ. A couple final questions:
|
Enabling a different rate limit per group is something that fits in the current design. Being able to dynamically change the rate limit of a given group is another story. Currently, when you enable rate limits for groups you do it when you instantiate a Worker. So all workers must have the same settings for consistent behavior. In theory, you could just close all workers and then instantiate new ones with a different setting, but I am not sure this will fulfill your requirements. If you signup for the Pro version we would prioritize the feature of different rate limits per group.
This is currently achieved by using idempotence and user-generated job ids. Because if you add a job with the same id as an existing one it will be ignored. However, you need to make sure that you keep enough old jobs for this mechanism to work. I see the value in having an atomic "addAndComplete" operation, and this is something that we could also implement, however having access to the lua scripts from the client side is not a good idea because they are too low-level and we will need to handle a lot of issues from people using them directly. |
Hmm, ok. Let's say I could get away with not adjusting the rate limit dynamically for an existing group; I would still definitely need some way to add a new group at runtime, with a distinct limit. In the current version of Bull, a new group can get created transparently when I try to enqueue to it, but that only works because it doesn't need a custom rate limit. This issue seems like it goes back to my question from earlier (i.e., trying to understand why the rate limits are set on the worker at all, when these limits seem like a property of the queue). Is the issue that the worker needs this state about the limits in memory for some reason? If not, would it be easy to create a new API on the queue for setting up the limits? The initial groups and their limits (including a default limit for transparently-created new groups) could be set with a constructor option, like they are on
Again, that's awesome! If we went this route, would you be able to give any kind of timeline? Or how does that normally work?
Yes, I understand that (including the caveat about needing to keep enough completed jobs, which is an extra thing to think about, but is usually not too bad). I was really just giving the transactional publishing thing as an example. My bigger point was that there are probably a ton of feature requests that people could implement on their own, without having to fork Bull, if there were some sort of Redis api. Like, if I did want to implement some sort of custom scheduling, I could write that myself in Redis (which would be much easier and more reliable than writing it in Node), and then just call |
It is more complicated than it may seem at first sight. Since the queue is distributed, any change in a global queue setting would need to be propagated to all the workers and give some guarantees, there are many edge cases, it is really too complex for something that currently is not needed by the vast majority of users.
Redis does not support calling another lua script from inside lua. The other problem is that the lua code handling the queue is really quite complex, we put a lot of effort in figuring out all the edge cases and this is not something that the average coder would be able to do without investing a lot of work, specially if you are not already familiar with all the inner workings. |
Btw, I am aware of Redis 7.0 functions, but we are not going to start using it in the short term. |
Ok. I definitely understand that the workers need some knowledge of the state of the queue (e.g., its name/prefix, to know where to look for jobs), but I'm a bit surprised that they need knowledge of the rate limits. I'll take your word for it though. In that case, do you have any advice then on how I might implement something like this? (I.e., something like different groups, with different rate limits that can be adjusted at runtime?)
Yeah, Redis functions are what I was thinking of, or a redis module, which I see you explored in the past. But I can understand why neither of those are super appealing given that a lot of deployments don't support them yet. |
Hello, I'm evaluating BullMQ Pro for my current project. It seems to have a lot of unique and powerful features, and I appreciate all the work you've put into it over so many years. I plan to request a trial of BullMQ Pro but, before I actually dive into the code and try to build a proof of concept, I was was hoping to understand the architecture a bit better.
In particular, I'm trying to understand how the groups are implemented at a high level, to get a sense of what kinds of problems I might run into. The docs say only that groups "[don't] have any impact on performance", but what does this mean exactly? Does it literally mean that they have no performance impact for absolutely all use cases and usage patterns? How is that possible?
I'm also trying to understand how groups interact with the other BullMQ ordering features. For example, if I add an job to a group and also specify a
priority
, or specifylifo: true
, how is that interpreted?Thanks for your help here!
The text was updated successfully, but these errors were encountered: