fix(tmux): break out of ReadingCommandOutput on OpenSSH disconnect (#9900)#11217
fix(tmux): break out of ReadingCommandOutput on OpenSSH disconnect (#9900)#11217maxmilian wants to merge 1 commit into
Conversation
Issue warpdotdev#9900 reported that after an SSH session times out, Warp's tmux wrapper keeps injecting `new-window -d '(builtin echo ...)|cat; sleep 1'` fragments into the now-local zsh prompt, producing repeated `zsh: command not found: new-window` plus mangled `ltin echo` fragments. Root cause is at the tmux parser layer. When the wrapper command is in flight, the parser enters `ParserState::ReadingCommandOutput` after seeing `%begin`. From that state it accepts ANY byte as command-output content and only escapes when it sees `%end` or `%error`. If the multiplexed SSH transport dies before `%end` arrives, the parser stays in `ReadingCommandOutput` forever: every subsequent byte (the OpenSSH client's disconnect diagnostics, the local zsh re-prompt, and Warp's still-queued wrapper-command bytes) is silently swallowed as "more command output," `tmux_performer.exited` is never set, and the `ControlModeEvent::Exited` plumbing that would clear `tmux_control_mode_context` and tear down the multiplexer in the model never runs. Fix: inside `ReadingCommandOutput`, after each completed content line, check whether the line matches one of OpenSSH's standard, non-localized disconnect diagnostics: - `client_loop: send disconnect` - `packet_write_wait` - `ssh_exchange_identification` - `kex_exchange_identification` - `Read from remote host` - `Connection to ` ... ` closed` (both substrings on the same line) When one matches, emit `TmuxMessage::ParseError`. The existing parse-error recovery path (`ansi/mod.rs:471-481`) already (a) sets `tmux_performer.exited = true`, (b) emits `ControlModeEvent::Exited` upstream, and (c) resets the ansi state machine back to default. That reuse keeps the diff small and the blast radius narrow. Plain `Broken pipe` is deliberately NOT a needle: it is produced by many unrelated tools and would cause false positives on legitimate command output. The needles above are emitted directly by OpenSSH and are specific enough that real tmux responses do not contain them. Uses `memchr::memmem::find` for the substring scan (memchr is already a workspace dependency). Tests (9 new): - `ssh_disconnect_inside_begin_end_emits_parse_error` -- bug shape from warpdotdev#9900 with a single OpenSSH disconnect line. - `ssh_disconnect_client_loop_send_disconnect_pattern_triggers_parse_error` - `ssh_disconnect_connection_to_host_closed_pattern_triggers_parse_error` - `ssh_disconnect_packet_write_wait_pattern_triggers_parse_error` - `ssh_disconnect_ssh_exchange_identification_pattern_triggers_parse_error` - `ssh_disconnect_kex_exchange_identification_pattern_triggers_parse_error` - `issue_9900_repro_wrapper_echo_then_ssh_death_emits_parse_error` -- full repro using the byte stream from the issue (the `(builtin echo -n "^^^...|||")` wrapper echo arriving before the disconnect line). - `lone_broken_pipe_does_not_trigger_ssh_disconnect_detection` -- negative test guarding the deliberate exclusion of bare "Broken pipe". - `connection_to_and_closed_on_different_lines_does_not_trigger` -- negative test for the same-line requirement on the connection-closed pattern. All 24 tmux parser tests pass. cargo fmt + clippy clean. Note on scope: `ParserState::ReadingPaneOutput` has the same structural escape hatch problem but a different reachability profile: pane output is byte-escaped (`\NNN` octal for bytes < 32) so raw SSH disconnect text from the local PTY is unlikely to land there. Leaving that path for a follow-up to keep this PR focused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I reviewed this pull request and requested human review from: Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds a targeted tmux control-mode parser escape hatch for OpenSSH disconnect diagnostics that appear while ReadingCommandOutput is waiting for %end/%error, reusing the existing parse-error teardown path. The accompanying parser tests cover the regression stream, each supported OpenSSH diagnostic pattern, and representative false-positive cases.
Concerns
- No blocking correctness concerns found in the changed lines.
- No security concerns found in this diff.
Verdict
Found: 0 critical, 0 important, 0 suggestions
Approve
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
The control-plane reader task previously exited silently when stdout EOF'd unexpectedly (i.e. the underlying SSH transport died without emitting a tmux %exit). Detection deferred to the heartbeat path, which has interval 30s and timeout 10s — up to ~40s of "dead but appears alive" state during which send_command would write to a half-closed pipe and time out at RESPONSE_TIMEOUT (10s) per call instead of failing fast. This commit: - Adds signal_master_died(stdin, events_tx) helper that idempotently takes the stdin and emits MasterDied. Idempotent because the first caller takes the Option<ChildStdin>; subsequent callers see None and no-op, preventing duplicate events when reader and heartbeat race to detect. - Reader task now distinguishes clean exit (via ControlEvent::Exit) from unexpected exit (stdout EOF, parse error). On unexpected exit it invokes signal_master_died, but only when closing was not initiated by us (avoids spurious MasterDied on graceful close). - Heartbeat path routes its existing MasterDied emits through the same helper so the stdin is also closed there, making subsequent send_command calls fail immediately with "stdin closed" instead of waiting RESPONSE_TIMEOUT. Inspired by warpdotdev#11217 (SSH disconnect detection in tmux parser) and warpdotdev#11482 (fail-fast pattern for remote_server executor), but minimal: leverages the existing Arc<Mutex<Option<ChildStdin>>> indirection instead of adopting RwLock<Option<Arc<...>>>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes #9900 — after an SSH session times out, Warp keeps injecting tmux
new-window -d '(builtin echo …)|cat; sleep 1'wrapper bytes into the now-local zsh prompt, producing repeatedzsh: command not found: new-windowplus mangledltin echofragments.Root cause
At the tmux parser layer. After
%begin,ParserState::ReadingCommandOutputaccepts ANY byte as command-output content and only escapes when it sees%end/%error. If the multiplexed SSH transport dies first, the parser stays stuck in that state forever:…are all silently swallowed as "more command output."
tmux_performer.exitedis never set,ControlModeEvent::Exitednever fires, andtmux_control_mode_contextis never cleared. Each new completion request keeps queueing more wrapper commands.Empirical proof in
app/src/terminal/model/tmux/parser_tests.rs(test_begin_without_endwas the existing assertion that 0 messages are emitted in this scenario — that's exactly the latent bug).Fix
Inside
ReadingCommandOutput, after each completed content line, sniff for non-localized OpenSSH disconnect diagnostics:client_loop: send disconnectpacket_write_waitssh_exchange_identificationkex_exchange_identificationRead from remote hostConnection to…closed(both substrings on the same line)When matched, emit
TmuxMessage::ParseError. The existing parse-error recovery path (ansi/mod.rs:471-481) already (a) setstmux_performer.exited = true, (b) emitsControlModeEvent::Exitedupstream, and (c) resets the ansi state machine back to default. That reuse keeps the diff small.Plain
Broken pipeis deliberately NOT a needle (too generic; many tools print it). The 6 needles above are emitted directly by OpenSSH and unlikely to occur in legitimate tmux command responses.Uses
memchr::memmem::find(already a workspace dep).Tests
9 new parser tests, all pass (24 total in
parser_tests.rs):Broken pipedoes NOT trigger (deliberate exclusion)Connection to/closedon different lines does NOT triggercargo fmt+cargo clippy -p warp --lib --no-deps --testsclean.Scope
ParserState::ReadingPaneOutputhas the same structural escape hatch problem, but pane output is byte-escaped (\NNNoctal for bytes < 32), so raw SSH disconnect text from the local PTY is unlikely to land there. Left for a follow-up to keep this PR focused.Follow-up worth flagging
Pattern-matching OpenSSH stderr is a band-aid. A timeout-based escape from
ReadingCommandOutput(or tracking SSH process state) would catch all transport failures, not just OpenSSH's. The current approach was chosen for minimal blast radius; happy to follow up with the more general fix in a separate PR if maintainers want.🤖 Generated with Claude Code