Skip to content

fix(tmux): break out of ReadingCommandOutput on OpenSSH disconnect (#9900)#11217

Open
maxmilian wants to merge 1 commit into
warpdotdev:masterfrom
maxmilian:fix-9900-ssh-tmux-stale
Open

fix(tmux): break out of ReadingCommandOutput on OpenSSH disconnect (#9900)#11217
maxmilian wants to merge 1 commit into
warpdotdev:masterfrom
maxmilian:fix-9900-ssh-tmux-stale

Conversation

@maxmilian
Copy link
Copy Markdown
Contributor

Summary

Fixes #9900 — after an SSH session times out, Warp keeps injecting tmux new-window -d '(builtin echo …)|cat; sleep 1' wrapper bytes into the now-local zsh prompt, producing repeated zsh: command not found: new-window plus mangled ltin echo fragments.

Root cause

At the tmux parser layer. After %begin, ParserState::ReadingCommandOutput accepts ANY byte as command-output content and only escapes when it sees %end / %error. If the multiplexed SSH transport dies first, the parser stays stuck in that state forever:

  • The local OpenSSH client's disconnect diagnostics
  • The local zsh re-prompt
  • Warp's still-queued wrapper bytes

…are all silently swallowed as "more command output." tmux_performer.exited is never set, ControlModeEvent::Exited never fires, and tmux_control_mode_context is never cleared. Each new completion request keeps queueing more wrapper commands.

Empirical proof in app/src/terminal/model/tmux/parser_tests.rs (test_begin_without_end was the existing assertion that 0 messages are emitted in this scenario — that's exactly the latent bug).

Fix

Inside ReadingCommandOutput, after each completed content line, sniff for non-localized OpenSSH disconnect diagnostics:

  • client_loop: send disconnect
  • packet_write_wait
  • ssh_exchange_identification
  • kex_exchange_identification
  • Read from remote host
  • Connection to closed (both substrings on the same line)

When matched, emit TmuxMessage::ParseError. The existing parse-error recovery path (ansi/mod.rs:471-481) already (a) sets tmux_performer.exited = true, (b) emits ControlModeEvent::Exited upstream, and (c) resets the ansi state machine back to default. That reuse keeps the diff small.

Plain Broken pipe is deliberately NOT a needle (too generic; many tools print it). The 6 needles above are emitted directly by OpenSSH and unlikely to occur in legitimate tmux command responses.

Uses memchr::memmem::find (already a workspace dep).

Tests

9 new parser tests, all pass (24 total in parser_tests.rs):

  • Regression test mirroring the byte stream from the issue body
  • 1 positive test per supported needle (6 tests)
  • 2 negative tests:
    • lone Broken pipe does NOT trigger (deliberate exclusion)
    • Connection to / closed on different lines does NOT trigger

cargo fmt + cargo clippy -p warp --lib --no-deps --tests clean.

Scope

ParserState::ReadingPaneOutput has the same structural escape hatch problem, but pane output is byte-escaped (\NNN octal for bytes < 32), so raw SSH disconnect text from the local PTY is unlikely to land there. Left for a follow-up to keep this PR focused.

Follow-up worth flagging

Pattern-matching OpenSSH stderr is a band-aid. A timeout-based escape from ReadingCommandOutput (or tracking SSH process state) would catch all transport failures, not just OpenSSH's. The current approach was chosen for minimal blast radius; happy to follow up with the more general fix in a separate PR if maintainers want.

🤖 Generated with Claude Code

Issue warpdotdev#9900 reported that after an SSH session times out, Warp's tmux
wrapper keeps injecting `new-window -d '(builtin echo ...)|cat; sleep 1'`
fragments into the now-local zsh prompt, producing repeated
`zsh: command not found: new-window` plus mangled `ltin echo` fragments.

Root cause is at the tmux parser layer. When the wrapper command is in
flight, the parser enters `ParserState::ReadingCommandOutput` after
seeing `%begin`. From that state it accepts ANY byte as command-output
content and only escapes when it sees `%end` or `%error`. If the
multiplexed SSH transport dies before `%end` arrives, the parser stays
in `ReadingCommandOutput` forever: every subsequent byte (the OpenSSH
client's disconnect diagnostics, the local zsh re-prompt, and Warp's
still-queued wrapper-command bytes) is silently swallowed as "more
command output," `tmux_performer.exited` is never set, and the
`ControlModeEvent::Exited` plumbing that would clear
`tmux_control_mode_context` and tear down the multiplexer in the model
never runs.

Fix: inside `ReadingCommandOutput`, after each completed content line,
check whether the line matches one of OpenSSH's standard, non-localized
disconnect diagnostics:

  - `client_loop: send disconnect`
  - `packet_write_wait`
  - `ssh_exchange_identification`
  - `kex_exchange_identification`
  - `Read from remote host`
  - `Connection to ` ... ` closed` (both substrings on the same line)

When one matches, emit `TmuxMessage::ParseError`. The existing
parse-error recovery path (`ansi/mod.rs:471-481`) already (a) sets
`tmux_performer.exited = true`, (b) emits `ControlModeEvent::Exited`
upstream, and (c) resets the ansi state machine back to default. That
reuse keeps the diff small and the blast radius narrow.

Plain `Broken pipe` is deliberately NOT a needle: it is produced by many
unrelated tools and would cause false positives on legitimate command
output. The needles above are emitted directly by OpenSSH and are
specific enough that real tmux responses do not contain them.

Uses `memchr::memmem::find` for the substring scan (memchr is already
a workspace dependency).

Tests (9 new):
- `ssh_disconnect_inside_begin_end_emits_parse_error` -- bug shape from
  warpdotdev#9900 with a single OpenSSH disconnect line.
- `ssh_disconnect_client_loop_send_disconnect_pattern_triggers_parse_error`
- `ssh_disconnect_connection_to_host_closed_pattern_triggers_parse_error`
- `ssh_disconnect_packet_write_wait_pattern_triggers_parse_error`
- `ssh_disconnect_ssh_exchange_identification_pattern_triggers_parse_error`
- `ssh_disconnect_kex_exchange_identification_pattern_triggers_parse_error`
- `issue_9900_repro_wrapper_echo_then_ssh_death_emits_parse_error` --
  full repro using the byte stream from the issue (the `(builtin echo
  -n "^^^...|||")` wrapper echo arriving before the disconnect line).
- `lone_broken_pipe_does_not_trigger_ssh_disconnect_detection` --
  negative test guarding the deliberate exclusion of bare "Broken pipe".
- `connection_to_and_closed_on_different_lines_does_not_trigger` --
  negative test for the same-line requirement on the connection-closed
  pattern.

All 24 tmux parser tests pass. cargo fmt + clippy clean.

Note on scope: `ParserState::ReadingPaneOutput` has the same structural
escape hatch problem but a different reachability profile: pane output
is byte-escaped (`\NNN` octal for bytes < 32) so raw SSH disconnect
text from the local PTY is unlikely to land there. Leaving that path
for a follow-up to keep this PR focused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cla-bot cla-bot Bot added the cla-signed label May 18, 2026
@github-actions github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label May 18, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 18, 2026

@maxmilian

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I reviewed this pull request and requested human review from: @warpdotdev/oss-maintainers.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds a targeted tmux control-mode parser escape hatch for OpenSSH disconnect diagnostics that appear while ReadingCommandOutput is waiting for %end/%error, reusing the existing parse-error teardown path. The accompanying parser tests cover the regression stream, each supported OpenSSH diagnostic pattern, and representative false-positive cases.

Concerns

  • No blocking correctness concerns found in the changed lines.
  • No security concerns found in this diff.

Verdict

Found: 0 critical, 0 important, 0 suggestions

Approve

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@oz-for-oss oz-for-oss Bot requested review from a team and peicodes and removed request for a team May 18, 2026 17:13
ATERCATES pushed a commit to ATERCATES/warp that referenced this pull request May 22, 2026
The control-plane reader task previously exited silently when stdout
EOF'd unexpectedly (i.e. the underlying SSH transport died without
emitting a tmux %exit). Detection deferred to the heartbeat path,
which has interval 30s and timeout 10s — up to ~40s of "dead but
appears alive" state during which send_command would write to a
half-closed pipe and time out at RESPONSE_TIMEOUT (10s) per call
instead of failing fast.

This commit:
- Adds signal_master_died(stdin, events_tx) helper that idempotently
  takes the stdin and emits MasterDied. Idempotent because the first
  caller takes the Option<ChildStdin>; subsequent callers see None
  and no-op, preventing duplicate events when reader and heartbeat
  race to detect.
- Reader task now distinguishes clean exit (via ControlEvent::Exit)
  from unexpected exit (stdout EOF, parse error). On unexpected exit
  it invokes signal_master_died, but only when closing was not
  initiated by us (avoids spurious MasterDied on graceful close).
- Heartbeat path routes its existing MasterDied emits through the
  same helper so the stdin is also closed there, making subsequent
  send_command calls fail immediately with "stdin closed" instead
  of waiting RESPONSE_TIMEOUT.

Inspired by warpdotdev#11217 (SSH disconnect detection in tmux
parser) and warpdotdev#11482 (fail-fast pattern for remote_server executor),
but minimal: leverages the existing Arc<Mutex<Option<ChildStdin>>>
indirection instead of adopting RwLock<Option<Arc<...>>>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed external-contributor Indicates that a PR has been opened by someone outside the Warp team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command wrapper emits truncated ltin echo fragments and stray new-window token after SSH session drops, causing repeated command not found errors in zsh

1 participant