Skip to content

fix: bound /dev/tty write so hooks can't hang when Warp UI is unresponsive#37

Open
tebayoso wants to merge 1 commit into
warpdotdev:mainfrom
tebayoso:tebayoso/fix-warp-notify-hang-on-frozen-warp
Open

fix: bound /dev/tty write so hooks can't hang when Warp UI is unresponsive#37
tebayoso wants to merge 1 commit into
warpdotdev:mainfrom
tebayoso:tebayoso/fix-warp-notify-hang-on-frozen-warp

Conversation

@tebayoso
Copy link
Copy Markdown

@tebayoso tebayoso commented May 8, 2026

Problem

warp-notify.sh writes the OSC 777 notification with no timeout:

printf '\033]777;notify;%s;%s\007' "$TITLE" "$BODY" > /dev/tty 2>/dev/null || true

If the slave PTY's output buffer fills up and stops draining — for example, when Warp's UI is hung and not reading from the master side — this printf blocks indefinitely on the kernel write(). The block then propagates back up the hook chain (warp-notify.shon-prompt-submit.sh / on-stop.sh / etc. → the Claude Code session), freezing the whole CLI agent until Warp's renderer recovers.

The trailing || true shows the original author intended this to be best-effort; the gap is that "best-effort" was never bounded in time.

Reproducer

  1. Start a Claude Code session in Warp with this plugin installed.
  2. Cause Warp's renderer to stop draining the slave PTY's output buffer (in my case Warp's UI was unresponsive — osascript -e 'tell application \"Warp\" to activate' recovered it).
  3. Submit a prompt or trigger any hook that calls warp-notify.sh.

The hook subprocess parks in printf > /dev/tty and the Claude Code session is frozen until Warp drains the buffer. Observed in production as a 36-minute lockup; the only way to recover without kill was to wake Warp's UI from outside.

Diagnosed via lsof -p <hook_pid> (FD 1 → /dev/ttys000, blocked) and confirmed by writing one byte to the same TTY from a different shell — also blocked, ruling out anything specific to the hook process.

Fix

Wrap the printf in a pure-bash watchdog: spawn the writer as a backgrounded subshell, force-kill it after WARP_NOTIFY_TIMEOUT_SEC seconds (default 2) if it hasn't completed. The script stays best-effort (always exits 0). No external timeout(1) dependency, so it works the same on macOS bash 3.2 and Linux bash 5.

{ printf '%s' "$SEQ" > "$TARGET" 2>/dev/null; } &
writer_pid=$!
{ sleep "$TIMEOUT_SEC" 2>/dev/null; kill -KILL "$writer_pid" 2>/dev/null; } &
watchdog_pid=$!
wait "$writer_pid" 2>/dev/null
kill -KILL "$watchdog_pid" 2>/dev/null; wait "$watchdog_pid" 2>/dev/null
exit 0

Worst case impact: a hook adds at most ~2s instead of hanging the session forever. Same patch applied to the legacy variant.

A new WARP_NOTIFY_TARGET env var (default /dev/tty) lets tests redirect output without touching /dev/tty directly — needed because CI has no TTY.

Tests

Adds plugins/warp/tests/test-warp-notify.sh, picked up automatically by the existing globstar pattern in .github/workflows/test.yml.

Three scenarios per variant:

  • Fast path — writable target completes in <2s, exits 0.
  • Hang protection — point WARP_NOTIFY_TARGET at a FIFO with no reader. The kernel blocks open()/write() on such a FIFO the same way it blocks writes to a slave PTY whose master isn't reading, which is the exact failure mode reproduced in production. Verifies the watchdog kicks in within the configured timeout and the script still exits 0.
  • Default cap — without explicit WARP_NOTIFY_TIMEOUT_SEC, blocked writes still cap at the 2s default.

Runs locally on macOS and on Ubuntu (the CI runner). All 39 existing tests still pass.

Notes

  • This is complementary to fix: TTY detection for hook subprocesses #19 (TTY detection for hook subprocesses), not a replacement. fix: TTY detection for hook subprocesses #19 fixes the no TTY exists case by walking the parent chain; this PR fixes the TTY exists but isn't draining case. Both bugs cause silent hook failures, but the symptom of fix: TTY detection for hook subprocesses #19 is "no notification" while the symptom here is "the entire Claude session freezes." The two fixes can land in either order — they touch the same line but the merge is mechanical.
  • Doesn't change observable behavior in the happy path (write completes promptly, script exits 0 as before). Only changes behavior in the failure mode where the original code hung indefinitely.

…nsive

The OSC 777 notification write in warp-notify.sh blocks indefinitely if
the slave PTY's output buffer fills up and stops draining — for example,
when Warp's UI is hung and not reading from the master PTY. The block
propagates back up the hook chain (warp-notify.sh -> on-*.sh -> claude),
freezing the calling Claude Code session until Warp recovers. Observed
in production as a 36-minute session lockup.

Wrap the printf in a pure-bash watchdog: spawn the writer in the
background, force-kill it after WARP_NOTIFY_TIMEOUT_SEC seconds
(default 2) if the syscall hasn't returned. The script stays
best-effort and always exits 0. A new WARP_NOTIFY_TARGET env var lets
tests redirect output away from /dev/tty.

Adds plugins/warp/tests/test-warp-notify.sh which simulates the failure
mode via a FIFO with no reader (kernel-level identical to a slave PTY
whose master isn't reading) and verifies the watchdog fires within the
configured timeout. CI's globstar pattern picks up the new file
automatically.

Same patch applied to scripts/legacy/warp-notify.sh.
skspade added a commit to skspade/claude-code-warp that referenced this pull request May 13, 2026
… Warp UI is unresponsive

Resolution: hybrid of warpdotdev#44's tty walker and warpdotdev#37's timeout watchdog.
Each candidate tty is now tried with a TIMEOUT_SEC watchdog (default 2s)
so a hung Warp UI on one tty falls through to the next ancestor instead
of blocking forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant