Open
Description
Summary
We are using Graphile Worker in production for a while now, more specifically this PR (#474). It works remarkably well. However, every few months the LISTEN/NOTIFY connection seems to drop.
Our logs show ECONNREFUSED at about the time this starts.
ERROR: Failed during pool sweep (migrationNumber=19): Error: connect ECONNREFUSED
...
ERROR: Failed to update heartbeat for pool pool-7699a0aba218dd3402: Error: connect ECONNREFUSED
After that, it is pretty obvious from the log frequency that we only poll every few minutes and not using LISTEN/NOTIFY anymore.
(upload not working, retrying in a few minutes)
The "fix" is to simply restart.
Steps to reproduce
Not really sure, just happens irregularly. Sorry!
Expected results
A worker should recover.
Actual results
LISTEN/NOTIFY never recovers.
Additional context
Postgres v15
Node 20