Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: is there any reason why the max stream length is set to 10.000? #110

Closed
dirkgroenen opened this issue Jan 13, 2020 · 2 comments
Closed

Comments

@dirkgroenen
Copy link

First of all; thanks for all the hard work currently done to this new version! 馃憦

We currently have a setup of Bull which we use to retrieve and process large datasets in the background. After having it run on production for a month or two we noticed our Redis instance was consuming over 2 GB of memory. After a more thorough inspection it seemed that this was mostly caused by the event stream which keeps reference of processed jobs which have big data and returnvalue properties.

We're currently thinking about reducing the streams.events.maxLen to a (way) lower value, but before doing so I would like to know what the (potential) impact could be.

I've done some tests and looked around in the code and I can't seem to find any reason why we couldn't lower it to something like 250 (e.g.). Can you elaborate on the reason bind the default setting's value of 10.000 and whether it would potentially harm anything when we set it to for example 250? Thanks!

@manast
Copy link
Contributor

manast commented Jan 13, 2020

In legacy Bull we used PubSub for delivering global events. This works well but have some important limitations:

  • you do not get any guarantees that your event listeners will receive all the events. Due to network issues for example you may loose some event.
  • its almost impossible to create a UI where you use the events to update the status of the jobs.

These two problems are solved by using streams. As long as you have an eventId that represents your last received event you can replay events until you catch up to real time. Soon all the getters will also return the last eventId, so that you can update an UI relaying on the events.

In your case if you do not care about global events you can set this value to 1. If you care, then just find a value that is dimensioned for your queues.

The thinking here can be something like, how many seconds of events would I like to keep so that I can handle a network partition or similar?.

@dirkgroenen
Copy link
Author

Thanks for your explanation @manast. 馃憤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants