Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs randomly failing with error messages, but worker never gets called #1330

Closed
soulchild opened this issue Jul 20, 2022 · 2 comments
Closed

Comments

@soulchild
Copy link

We recently switched from a self-hosted Redis instance to the Azure Cache for Redis SaaS offering.

First, everything worked as expected. A couple of days ago, the queue started behaving in weird ways: Sometimes, adding a job works flawlessly. Adding the exact same job a second time shortly after the first one failed, sporadically results in the job immediately failing.

We added (console) logging to the worker events (active, error, etc.) and in the processing code as well, but we get absolutely no output. It looks like the job processing code doesn't run at all. What makes this even weirder is the fact that the job fails with an error message which is clearly there in our job processing code. But logging something right before throwing the error does not yield any log output. We added job.log() calls in our worker event handlers, but the log on the failed job is empty.

So, to sum this up: The jobs seem to fail sporadically without even invoking the processing code and pulling some (maybe old?) error message seemingly out of thin air.

I switched back to the self-hosted Redis instance and the problem went away immediately.

I've used bullmq (always with self-hosted Redis instances) for quite some time in different projects now and never saw anything like this.

Is it possible that the bullmq data in the Azure Redis instance somehow got corrupted or something like that? Does this ring any bell for the maintainers?

Thank you! 🙏

@manast
Copy link
Contributor

manast commented Jul 20, 2022

It is difficult to make a good assessment without more information such as the content of the failed error messages. One thing I would check in your case is to verify that the "maxmemory" setting in the hosted redis instance is set to "noeviction" (https://redis.io/docs/manual/eviction/), as otherwise Redis will work as a cache and remove random keys which of course can make the queue behave very strange.
If the setting is correct, I advise you to create a smaller test app using the hosted Redis instance and keep adding features until you trigger the strange behaviour, usually that gives you the information needed to sort out the problem.

@soulchild
Copy link
Author

soulchild commented Jul 20, 2022

Of course it was all our fault: A second (older) instance of our application was still running elsewhere and was pointing to the same Azure Redis instance, basically snatching jobs from the other system. Doh! 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants