Jobs randomly failing with error messages, but worker never gets called #1330

soulchild · 2022-07-20T13:32:28Z

We recently switched from a self-hosted Redis instance to the Azure Cache for Redis SaaS offering.

First, everything worked as expected. A couple of days ago, the queue started behaving in weird ways: Sometimes, adding a job works flawlessly. Adding the exact same job a second time shortly after the first one failed, sporadically results in the job immediately failing.

We added (console) logging to the worker events (active, error, etc.) and in the processing code as well, but we get absolutely no output. It looks like the job processing code doesn't run at all. What makes this even weirder is the fact that the job fails with an error message which is clearly there in our job processing code. But logging something right before throwing the error does not yield any log output. We added job.log() calls in our worker event handlers, but the log on the failed job is empty.

So, to sum this up: The jobs seem to fail sporadically without even invoking the processing code and pulling some (maybe old?) error message seemingly out of thin air.

I switched back to the self-hosted Redis instance and the problem went away immediately.

I've used bullmq (always with self-hosted Redis instances) for quite some time in different projects now and never saw anything like this.

Is it possible that the bullmq data in the Azure Redis instance somehow got corrupted or something like that? Does this ring any bell for the maintainers?

Thank you! 🙏

The text was updated successfully, but these errors were encountered:

manast · 2022-07-20T13:56:06Z

It is difficult to make a good assessment without more information such as the content of the failed error messages. One thing I would check in your case is to verify that the "maxmemory" setting in the hosted redis instance is set to "noeviction" (https://redis.io/docs/manual/eviction/), as otherwise Redis will work as a cache and remove random keys which of course can make the queue behave very strange.
If the setting is correct, I advise you to create a smaller test app using the hosted Redis instance and keep adding features until you trigger the strange behaviour, usually that gives you the information needed to sort out the problem.

soulchild · 2022-07-20T15:52:28Z

Of course it was all our fault: A second (older) instance of our application was still running elsewhere and was pointing to the same Azure Redis instance, basically snatching jobs from the other system. Doh! 😂

soulchild closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs randomly failing with error messages, but worker never gets called #1330

Jobs randomly failing with error messages, but worker never gets called #1330

soulchild commented Jul 20, 2022

manast commented Jul 20, 2022

soulchild commented Jul 20, 2022 •

edited

Jobs randomly failing with error messages, but worker never gets called #1330

Jobs randomly failing with error messages, but worker never gets called #1330

Comments

soulchild commented Jul 20, 2022

manast commented Jul 20, 2022

soulchild commented Jul 20, 2022 • edited

soulchild commented Jul 20, 2022 •

edited