world-postgres: stream readers can stall after LISTEN disconnects or missed NOTIFY event

## Bug

`@workflow/world-postgres` currently relies on PostgreSQL `LISTEN/NOTIFY` for live stream chunk delivery.

This is fragile: `NOTIFY` is only a wake-up signal for currently connected listeners, not a durable backlog. If the dedicated `LISTEN workflow_event_chunk` client disconnects, or if a notification is missed during reconnect, chunks can still be written to the `streams` table successfully while active readers stop receiving live updates indefinitely.

In other words: the `streams` table is the source of truth, and `LISTEN/NOTIFY` should only be used to wake readers up to re-query chunks newer than their last delivered `chunk_id`.

## Symptoms

In production, after the dedicated LISTEN client is dropped:

- `writeToStream(...)` continues inserting chunk rows successfully.
- `pg_notify(...)` continues executing successfully.
- `readFromStream(...)` readers may receive the initial query batch, then never receive subsequent chunks.
- Restarting the pod restores delivery until the next LISTEN disconnect.

This silently halts live in-process delivery in the affected process, while persisted stream rows remain intact.

## Proposed fix

This needs two layers:

1. Make `listenChannel` resilient:
   - attach `error` and `end` handlers to the dedicated `pg.Client`
   - reconnect with bounded exponential backoff
   - re-run `LISTEN workflow_event_chunk` after reconnect
   - stop reconnect attempts on `close()`

2. Make `readFromStream` resilient to missed notifications:
   - keep a per-reader `lastChunkId`
   - load initial chunks from the `streams` table
   - on notification, query `streams WHERE chunk_id > lastChunkId`
   - periodically run the same query as a polling fallback
   - dedupe/order by `chunk_id`
   - stop polling on EOF, cancel, or controller close

This makes `world-postgres` stream delivery durable even when the LISTEN connection is interrupted.

## Relation to other work

This is compatible with #1847, but it is a lower-level `world-postgres` reliability issue. Core-level stream reconnect cannot recover notifications that PostgreSQL never delivered to a disconnected LISTEN client. The Postgres world still needs to treat the table as the durable source of truth.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

world-postgres: stream readers can stall after LISTEN disconnects or missed NOTIFY event #1855

Bug

Symptoms

Proposed fix

Relation to other work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

world-postgres: stream readers can stall after LISTEN disconnects or missed NOTIFY event #1855

Description

Bug

Symptoms

Proposed fix

Relation to other work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions