Skip to content

LISTEN/NOTIFY sometimes drops without recovery #541

Open
@psteinroe

Description

@psteinroe

Summary

We are using Graphile Worker in production for a while now, more specifically this PR (#474). It works remarkably well. However, every few months the LISTEN/NOTIFY connection seems to drop.

Our logs show ECONNREFUSED at about the time this starts.

ERROR: Failed during pool sweep (migrationNumber=19): Error: connect ECONNREFUSED
...
ERROR: Failed to update heartbeat for pool pool-7699a0aba218dd3402: Error: connect ECONNREFUSED

After that, it is pretty obvious from the log frequency that we only poll every few minutes and not using LISTEN/NOTIFY anymore.

(upload not working, retrying in a few minutes)

The "fix" is to simply restart.

Steps to reproduce

Not really sure, just happens irregularly. Sorry!

Expected results

A worker should recover.

Actual results

LISTEN/NOTIFY never recovers.

Additional context

Postgres v15
Node 20

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛 bug😓 cannot-reproduceSomeone has attempted but failed to reproduce this; create an example repo to demonstrate the issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions