Skip to content

[FEATURE] Implement reconnect logic and exponential backoff in Subscriber#11

Merged
DmytroHavryshGoTo merged 2 commits into
mainfrom
feature/reconnect
May 11, 2026
Merged

[FEATURE] Implement reconnect logic and exponential backoff in Subscriber#11
DmytroHavryshGoTo merged 2 commits into
mainfrom
feature/reconnect

Conversation

@DmytroHavryshGoTo
Copy link
Copy Markdown
Contributor

Summary

Make RedisStream::Subscriber.listen survive Redis restarts. Previously the listener crashed on any connection drop; now it logs the disconnect, retries with exponential backoff using PING to detect recovery, and resumes consumption. If Redis comes back without persistence (or the stream/group was wiped), NOGROUP is detected and the consumer groups are recreated automatically.

What changed

lib/redis_stream/subscriber.rb:

  • Wrapped the xreadgroup loop with rescues for Redis::BaseConnectionError and Redis::CommandError.
  • On connection error: call wait_for_reconnect, which probes PING with exponential backoff (0.5s → 30s cap, doubling) until Redis is reachable, then resumes the consume loop.
  • On NOGROUP: call ensure_groups to recreate consumer groups on all streams, then resume.
  • Other CommandErrors still propagate (e.g. WRONGTYPE).
  • Group creation extracted to ensure_groups(streams, group), called both on startup and after NOGROUP.
  • Diagnostic logging added at every state change (disconnect, each reconnect attempt + outcome, reconnect success with downtime, group creation, NOGROUP recovery). Logs go to stderr via warn with a [redis_stream] prefix.

PING is used (not xreadgroup) for recovery detection because xreadgroup blocks waiting for messages — without a probe that returns immediately, we couldn't distinguish "Redis is back but no traffic" from "Redis still down."

Behavior

Scenario Before After
Redis restart Listener crashes Logs disconnect, retries with backoff, resumes
Redis flushed / different instance (NOGROUP) Listener crashes Recreates groups, resumes
Network blip Listener crashes Reconnects (typically <1s)
WRONGTYPE or other command errors Propagates Propagates (unchanged)
Broken poison message handler Propagates Propagates (unchanged)

Backoff caps at 30s. Downtime is measured with Process::CLOCK_MONOTONIC (immune to NTP jumps) and reported in the reconnect log: reconnected to redis after 4 attempt(s); downtime 3.50s.

Sample output

image

Copy link
Copy Markdown
Contributor

@mesmerze mesmerze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please consider #12 🙏

Comment thread lib/redis_stream/subscriber.rb Outdated
Comment on lines +5 to +6
INITIAL_BACKOFF = 0.5
MAX_BACKOFF = 30.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INITIAL_RECONNECT_BACKOFF
MAX_RECONNECT_BACKOFF
🙏

@DmytroHavryshGoTo DmytroHavryshGoTo merged commit d197952 into main May 11, 2026
1 check passed
@DmytroHavryshGoTo DmytroHavryshGoTo deleted the feature/reconnect branch May 11, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants