Skip to content

wait_for_channel and signal_channel block the FastMCP event loop (sync subprocess.run) #18

@tony

Description

@tony

Summary

wait_for_channel and signal_channel are registered as synchronous (def, not async def) tools that run blocking subprocess.run(..., timeout=N) calls on FastMCP's event-loop thread. For the duration of the timeout the server cannot service stdio reads/writes, respond to the MCP client's keepalive pings, or run other tool handlers. Clients whose stdio keepalive is shorter than the wait_for_channel timeout can disconnect mid-wait.

0.1.0a2 explicitly addressed the same class of bug for wait_for_text / wait_for_content_change — per CHANGES, "unblock FastMCP event loop via asyncio.to_thread around capture_pane" (commit 0a408fe) — but left the channel-wait pair untouched.

Evidence

src/libtmux_mcp/tools/wait_for_tools.py:99-154wait_for_channel is a plain def decorated with the sync @handle_tool_errors. The body is a synchronous subprocess.run with a user-supplied timeout of up to 30 s by default and no upper bound:

@handle_tool_errors
def wait_for_channel(
    channel: str,
    timeout: float = 30.0,
    socket_name: str | None = None,
) -> str:
    ...
    try:
        subprocess.run(argv, check=True, capture_output=True, timeout=timeout)
    except subprocess.TimeoutExpired as e:
        ...

src/libtmux_mcp/tools/wait_for_tools.py:157-196signal_channel has the same structure (sync def, sync subprocess.run bounded by _SIGNAL_TIMEOUT_SECONDS). Signalling is normally fast, but the pattern is still wrong.

Contrast — already-correct pattern: src/libtmux_mcp/tools/pane_tools/wait.py:100 declares async def wait_for_text(...) and offloads blocking work via asyncio.to_thread(...). That is the pattern the channel pair should match.

Reproduction

Environment: libtmux-mcp==0.1.0a2 via uvx libtmux-mcp==0.1.0a2, connected to any MCP client with stdio keepalive. The event-loop block is client-agnostic; the observed-disconnect behavior in step 4 was seen with Claude Code but any client whose stdio keepalive is shorter than the wait_for_channel timeout is at risk.

  1. Create an isolated tmux server so this doesn't pollute default: tmux -L wfc_repro new-session -ds repro.
  2. From the MCP client, call wait_for_channel on a channel that will never be signalled, with a timeout safely larger than the client's stdio keepalive: {"channel": "unsignalled", "timeout": 30, "socket_name": "wfc_repro"}.
  3. While the call is pending, call any other libtmux-mcp tool (e.g. list_sessions) from the same client. Observe it is queued — the server cannot respond until wait_for_channel returns or times out.
  4. After the timeout fires, ToolError is raised. On Claude Code specifically, the client-side MCP harness marks the server as disconnected (system-reminder: "The following deferred tools are no longer available (their MCP server disconnected)") even though the Python process is still alive per ps.
  5. Tear down: tmux -L wfc_repro kill-server.

Observed during 0.1.0a2 smoke-testing with timeout=2.0 — the minimum timeout at which the disconnect was seen. The disconnect is not guaranteed on every client or every call, but the event-loop block itself is 100% reproducible.

Expected behavior

While wait_for_channel is pending, the server continues to service other tool calls, MCP pings, and cancellation signals — cooperative concurrency as the rest of the tool surface already provides.

Suggested fix

Port the asyncio.to_thread(...) pattern used for wait_for_text:

@handle_tool_errors_async
async def wait_for_channel(
    channel: str,
    timeout: float = 30.0,
    socket_name: str | None = None,
) -> str:
    server = _get_server(socket_name=socket_name)
    cname = _validate_channel_name(channel)
    argv = _tmux_argv(server, "wait-for", cname)
    try:
        await asyncio.to_thread(
            subprocess.run,
            argv, check=True, capture_output=True, timeout=timeout,
        )
    except subprocess.TimeoutExpired as e:
        ...

Same treatment for signal_channel. handle_tool_errors_async already exists in _utils.py (added in 0.1.0a2 per commit a13898f) — no new infrastructure required.

Tests

Add regression coverage in tests/test_wait_for_tools.py:

  1. Assert both functions are asyncio.iscoroutinefunction(...).
  2. Fire wait_for_channel with a 2 s unsignalled timeout on a dedicated event loop, and concurrently schedule an asyncio.sleep(0) coroutine via asyncio.gather. Assert the concurrent coroutine completes in ≪ 2 s — today it would block for the full timeout.
  3. Consider mirroring the CancelledError propagation tests already present for wait_for_text in tests/test_pane_tools.py — the channel pair should respect the same contract.

Related

  • Commit 0a408fe mcp(fix[pane_tools/wait]): unblock FastMCP event loop via asyncio.to_thread around capture_pane — the analogous fix for the pane-wait pair.
  • Commit a13898f mcp(refactor[_utils]): add handle_tool_errors_async for Context-using tools — the decorator needed for the port.
  • CHANGES 0.1.0a2 › What's new › New tools › Waits lists channel tools alongside pane-wait tools; only the latter got the event-loop fix.

Environment

  • libtmux-mcp: 0.1.0a2 (PyPI + local HEAD)
  • FastMCP: 3.2.4
  • tmux: 3.6a
  • Python: 3.13
  • OS: Linux (WSL2)
  • MCP client observed disconnecting: Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions