Workers stop processing jobs after Redis reconnect #648

akramali86 · 2021-07-19T14:15:30Z

In production we're using Amazon Elasticache with BullMQ^1.34.2

We're finding that in the event of a failover the following error is emitted by the workers UNBLOCKED force unblock from blocking operation, instance state changed (master -> replica?) and workers stop processing jobs. But jobs are still able to be queued.

Currently we have to redeploy our app to rectify this issue. Is there anything we can do to handle this error so that when Redis reconnects it can start processing jobs again? Thanks.

The text was updated successfully, but these errors were encountered:

manast · 2021-07-20T10:02:17Z

Yeah, I think i know why this happens. There is a loop inside BullMQ that throws an exception in this case and stops looping. We have a fix in older Bull that I can port to BullMQ that should resolve the issue though.

akramali86 · 2021-07-20T10:14:45Z

thanks @manast. In the meantime would you see any issue with us manually calling the run method in an interval to restart the worker? I know it's not very elegant, but it seems to work. Just wondering if it would cause any memory leaks etc.

Example:

const { Worker } = require('bullmq');

const worker = new Worker('worker');

setInterval(() => {
    if (!worker.running && !worker.closing) {
        console.log('Restarting worker');
        worker.run().catch(() => {
            worker.running = false;
            console.log('Could not restart worker');
        });
    }
}, 60000);

manast · 2021-07-20T10:33:39Z

I do not see any issue with the naked eye, it should work.

sven-codeculture · 2021-07-22T06:48:27Z

We were having the same issue some days ago (the worker just died without any notification) but after intrucing the isRunning() on the worker everything seems to work fine for us when we just restart the workers (however, we do it in kubernetes via health check fail, and restart the worker pod if it dies.)

manast · 2021-07-22T08:53:03Z

@sven-codeculture what do you mean with "it died without any notification" ?

github-actions · 2021-07-22T16:04:17Z

🎉 This issue has been resolved in version 1.40.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

manast added the bug Something isn't working label Jul 21, 2021

manast mentioned this issue Jul 22, 2021

Fix/improve error handling run loop #655

Merged

manast closed this as completed in #655 Jul 22, 2021

github-actions bot added the released label Jul 22, 2021

iamnafets mentioned this issue Apr 7, 2022

Scheduler stops after Redis reconnect #1181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers stop processing jobs after Redis reconnect #648

Workers stop processing jobs after Redis reconnect #648

akramali86 commented Jul 19, 2021

manast commented Jul 20, 2021

akramali86 commented Jul 20, 2021 •

edited

manast commented Jul 20, 2021

sven-codeculture commented Jul 22, 2021

manast commented Jul 22, 2021

github-actions bot commented Jul 22, 2021

Workers stop processing jobs after Redis reconnect #648

Workers stop processing jobs after Redis reconnect #648

Comments

akramali86 commented Jul 19, 2021

manast commented Jul 20, 2021

akramali86 commented Jul 20, 2021 • edited

manast commented Jul 20, 2021

sven-codeculture commented Jul 22, 2021

manast commented Jul 22, 2021

github-actions bot commented Jul 22, 2021

akramali86 commented Jul 20, 2021 •

edited