Skip to content

Critical: Relay 'Reconnect to Relays' button disconnects but never reconnects; persists across app restart and cache clear #86

@variablefate

Description

@variablefate

Severity

Critical — effectively bricks both apps (rider + drivestr). No Nostr → no rides, no offers, no chat, no profile/wallet sync. User cannot recover via normal means (force-close, cache clear all fail).

Steps to Reproduce

  1. Open the app, navigate to Settings → Relay Management.
  2. Tap "Reconnect to Relays".
  3. Observe: all relays drop to disconnected and never come back.
  4. Tap the button again → no effect (does not force reconnect).
  5. Force-close the app and relaunch → still no relay connections.
  6. Clear app cache → still no relay connections.

Expected Behavior

  • Tapping "Reconnect to Relays" should briefly drop and re-establish WebSocket connections to all configured relays.
  • At minimum, a subsequent tap, or a fresh app launch, should restore connectivity.

Actual Behavior

  • All relays go to DISCONNECTED and stay there.
  • The button becomes a no-op on repeat presses.
  • App restart and cache clear do not recover the connection state.

Affected Code (both apps share common/)

Handler wiring (identical in both MainActivities):

onReconnect = {
    nostrService.relayManager.disconnectAll()
    nostrService.relayManager.connectAll()
}

UI button: common/src/main/java/com/ridestr/common/ui/RelayManagementScreen.kt:122 — the isReconnecting flag flips back to false after a fixed delay regardless of actual connection result, which masks failure and produces the "button does nothing" feel on retry.

Manager / connection:

Suspected Root Cause(s)

A disconnectAll() immediately followed by connectAll() runs on the UI thread. RelayConnection.disconnect() sets shouldReconnect = false and closes the socket asynchronously outside the lock. connect() then sets shouldReconnect = true and opens a new socket. Several things can go wrong:

  1. Stale-callback path: the new socket's onOpen may fire before the old socket's onClosed arrives. The onClosed guard checks socket !== webSocket and bails — but if any callback orders differently against state assignment, we can land in a state where _state.value == DISCONNECTED while socket references a now-dead WebSocket.

  2. connect() short-circuit: connect() returns early if state is CONNECTING or CONNECTED. If a previous reconnect attempt left state stuck in CONNECTING (e.g. a never-completed handshake on a torn-down socket), subsequent presses become no-ops. This matches the "second press does nothing" symptom.

  3. Survives restart / cache clear: This is the strongest clue that something persisted matters. Cache clear preserves SharedPreferences (only cache dir is wiped). If the user has custom relays saved and one or more of them is unreachable, the relay list itself may be the actual problem — but the UI gives zero feedback. Worth verifying whether the user has custom relays configured. Also worth checking that RelayManager is initialized from the current effective relay list rather than RelayConfig.DEFAULT_RELAYS (see RelayManager.kt:43) — if NostrService is constructed once with defaults, custom relays set later may never be honored.

  4. No retry budget reset on manual reconnect: reconnectAttempts backoff persists across the manual button press. If we've already backed off to 60s, the user may think nothing's happening when in fact a delayed retry is pending.

Suggested Fix Direction (for triage, not prescriptive)

  • Make onReconnect call a dedicated relayManager.forceReconnectAll() that:
    • Resets reconnectAttempts to 0 on each connection,
    • Awaits actual socket teardown before re-opening (don't fire-and-forget),
    • Re-reads the effective relay list from SettingsRepository so custom relay changes take effect,
    • Returns a result the UI can surface (success / per-relay failure with reason).
  • Surface per-relay errors in the UI rather than only showing aggregate connected count.
  • Add an "are you using custom relays?" diagnostic line on the Relay Management screen.

Environment

  • Reported via internal test on 2026-05-14.
  • Branch: claude/cranky-curie-1d8700 (master at fa54d0a).
  • App(s): rider-app and drivestr (shared common/ code — both should be affected; please confirm).

Acceptance Criteria

  • Tapping "Reconnect to Relays" reliably drops and re-establishes connections.
  • On failure, the UI shows which relay(s) failed and why.
  • App relaunch always re-establishes connections (no persistent stuck state).
  • Custom relay edits take effect on next reconnect without app restart.
  • Integration test covering rapid disconnect → connect cycle and the "second press" case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions