Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers stop processing jobs after Redis reconnect #648

Closed
akramali86 opened this issue Jul 19, 2021 · 6 comments · Fixed by #655
Closed

Workers stop processing jobs after Redis reconnect #648

akramali86 opened this issue Jul 19, 2021 · 6 comments · Fixed by #655
Labels
bug Something isn't working released

Comments

@akramali86
Copy link

In production we're using Amazon Elasticache with BullMQ^1.34.2

We're finding that in the event of a failover the following error is emitted by the workers UNBLOCKED force unblock from blocking operation, instance state changed (master -> replica?) and workers stop processing jobs. But jobs are still able to be queued.

Currently we have to redeploy our app to rectify this issue. Is there anything we can do to handle this error so that when Redis reconnects it can start processing jobs again? Thanks.

@manast
Copy link
Contributor

manast commented Jul 20, 2021

Yeah, I think i know why this happens. There is a loop inside BullMQ that throws an exception in this case and stops looping. We have a fix in older Bull that I can port to BullMQ that should resolve the issue though.

@akramali86
Copy link
Author

akramali86 commented Jul 20, 2021

thanks @manast. In the meantime would you see any issue with us manually calling the run method in an interval to restart the worker? I know it's not very elegant, but it seems to work. Just wondering if it would cause any memory leaks etc.

Example:

const { Worker } = require('bullmq');

const worker = new Worker('worker');

setInterval(() => {
    if (!worker.running && !worker.closing) {
        console.log('Restarting worker');
        worker.run().catch(() => {
            worker.running = false;
            console.log('Could not restart worker');
        });
    }
}, 60000);

@manast
Copy link
Contributor

manast commented Jul 20, 2021

I do not see any issue with the naked eye, it should work.

@manast manast added the bug Something isn't working label Jul 21, 2021
@sven-codeculture
Copy link
Contributor

We were having the same issue some days ago (the worker just died without any notification) but after intrucing the isRunning() on the worker everything seems to work fine for us when we just restart the workers (however, we do it in kubernetes via health check fail, and restart the worker pod if it dies.)

@manast
Copy link
Contributor

manast commented Jul 22, 2021

@sven-codeculture what do you mean with "it died without any notification" ?

@github-actions
Copy link

🎉 This issue has been resolved in version 1.40.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants