Skip to content

feat(errors): clean error results, recovery hints, quieter logs#64

Merged
tony merged 23 commits into
mainfrom
fastmcp-v3.4.x
Jun 6, 2026
Merged

feat(errors): clean error results, recovery hints, quieter logs#64
tony merged 23 commits into
mainfrom
fastmcp-v3.4.x

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented Jun 6, 2026

Summary

  • Bump the fastmcp floor to >=3.4.0, unlocking ToolError(log_level=...) (3.3) and ToolResult(is_error=True) (3.4)
  • Fix error-message mangling: failed tool calls passed through fastmcp's stock -32603 transform, so agents received Internal error: Pane not found: %5 — messages now arrive exactly as raised
  • Add structured _meta on error results: error_type (originating exception class), expected (agent-correctable vs operator fault), suggestion (recovery hint when present)
  • Add recovery hints on not-found errors pointing at the discovery tools (list_sessions / list_windows / list_panes) that resolve stale or guessed ids
  • Add synthesized hints for unrecognized arguments: the error names exactly which argument(s) to remove or fix, and flags wait_for_previous — a scheduling flag Gemini CLI leaks into batched tool calls — with the connected client's identity from the MCP initialize handshake
  • Demote expected, agent-correctable failures — unknown ids, invalid arguments (including schema-level rejections raised before tool code runs), tier denials, transient tmux errors — to WARNING in the log stream; a missing tmux binary and unexpected exceptions stay at ERROR

Changes by area

Error classification

  • src/libtmux_mcp/_utils.py: ExpectedToolError(ToolError) defaults log_level=WARNING and carries an optional suggestion. _map_exception_to_tool_error returns it for the expected libtmux branches and attaches discovery-tool suggestions on the not-found branches. Direct validation raises across the tool modules and the SafetyMiddleware tier denial use it.
  • src/libtmux_mcp/middleware.py: _is_schema_validation_error classifies fastmcp's pre-dispatch pydantic argument validation as expected/WARNING — it fires before the decorators can classify it, and output-shape failures can't be confused with it (fastmcp's tool layer converts those to error results itself).

Error results

  • src/libtmux_mcp/middleware.py: ToolErrorResultMiddleware(ErrorHandlingMiddleware) converts tool-call exceptions into ToolResult(is_error=True) with the clean message plus the _meta payload, and logs at the error's own log_level. The inherited on_message keeps the MCP -32002 resource-not-found transform for non-tool messages.
  • src/libtmux_mcp/middleware.py: _unexpected_kwargs + _client_label synthesize the unrecognized-argument suggestion; the client label comes from fastmcp_context.session.client_params.clientInfo, is None-safe at every hop, and only words the message — it never gates behavior.
  • src/libtmux_mcp/middleware.py: TailPreservingResponseLimitingMiddleware preserves is_error through truncation — the stock path rebuilds the result without it, turning an oversized error into an apparent success that the MCP SDK client then rejects against the tool's output schema.
  • src/libtmux_mcp/server.py: the middleware stack swaps the stock error middleware for the subclass; the ordering comment documents the invariant that error→result conversion must stay outside the audit/retry/safety trio.

Docs

  • docs/topics/logging.md: the fastmcp.errors channel and the WARNING-expected / ERROR-operator-fault level contract.
  • docs/topics/architecture.md: full middleware request flow and the three-boundary error design (tool classification, schema classification, conversion).
  • docs/topics/gotchas.md: Gemini CLI's wait_for_previous injection — symptom, self-recovery flow, and why rejection beats stripping.
  • Error-prose mentions across tool pages link to the autodoc'd ExpectedToolError entry via {exc} roles.

Design decisions

  • Convert at the boundary, not in the tools: errors keep propagating as exceptions through audit/retry/safety and only become results in the outermost error layer. Returning error results from inside tools would disable readonly retries (fastmcp's RetryMiddleware matches LibTmuxException via __cause__) and make the audit log record failures as outcome=ok.
  • WARNING/ERROR partition: WARNING means "the agent can correct course"; ERROR means "an operator must act" (missing tmux binary) or "possible server bug" (unexpected exception). A WARNING-heavy, ERROR-quiet stream is healthy.
  • No structured_content on error results: tools declare output schemas for their success payloads and clients validate structuredContent against them — an error-shaped payload there would fail validation on strict clients. Details ride in _meta instead.
  • Suggestion policy: raise-site hints only on not-found errors (discovery tools are the canonical fix for the most common agent mistake); synthesized hints on unrecognized-argument rejections (names the exact keys, plus the client-flag note for wait_for_previous). Unknown arguments are rejected, never silently stripped — MemPalace patched the same Gemini leak by popping the key (fix: Ignore wait_for_previous argument to support Gemini MCP clients MemPalace/mempalace#322) and later whitelisting against the schema, but on a server with mutating/destructive tools silent dropping would let a typo'd flag (enter on send_keys) run with defaults.

Before / After

Agent-visible result for a tool denied by the safety tier:

Before:

Internal error: Tool 'send_keys' is not available at the current safety level. Set LIBTMUX_SAFETY=destructive to enable destructive tools.

After:

Tool 'send_keys' is not available at the current safety level. Set LIBTMUX_SAFETY=destructive to enable destructive tools.

with _meta: {"error_type": "ExpectedToolError", "expected": true}.

Verification

Every direct raise in the tool modules is tier-classified — stock ToolError survives only as the ERROR-tier return inside the exception mapper:

rg -n "raise ToolError\(" src/

Test plan

  • Expected libtmux failures map to ExpectedToolError at WARNING; tmux-missing and unexpected exceptions stay at ERROR — test_map_exception_*
  • Wire-level error results carry the unmangled message and error_type / expected / suggestion meta — test_tool_error_result_*
  • Not-found branches carry discovery-tool suggestions; other branches stay hint-free — test_map_exception_suggestion_policy
  • Schema-level argument rejections report expected: true and log at WARNING — test_schema_validation_failure_marked_expected_in_meta, test_tool_error_result_logs_at_error_log_level[schema_validation_logs_warning]
  • Unrecognized-argument hints: client-labeled wait_for_previous note, plain-typo listing, missing-arg silence, no-handshake fallback — test_unexpected_argument_suggestion*
  • Oversized error results keep is_error through the response limiter; truncated successes never gain it — test_response_limiter_preserves_error_results
  • Readonly retry still fires through the new stack (__cause__ contract) — test_readonly_retry_recovers_from_libtmux_exception
  • Tier denials still audit as outcome=errortest_audit_records_safety_denial
  • Manual QA via scripts/mcp_swap.py across gemini, codex, cursor-agent, and claude on an isolated tmux socket: all four received the clean error text and suggestion; gemini's organic wait_for_previous leak reproduced and self-recovered, and the client label resolved live as gemini-cli-mcp-client 0.45.2
  • Full chain: ruff check, ruff format, mypy, pytest --reruns 0, just build-docs

Note: the floor stays at >=3.4.0 rather than >=3.4.1 (starlette CVE-2026-48710 floor) because 3.4.1 is inside the 3-day uv dependency cooldown until June 8 — follow-up commit planned.

tony added 3 commits June 6, 2026 10:14
why: Unlock fastmcp 3.3/3.4 server features adopted in follow-up
commits: ToolError(log_level=...) for expected-failure log demotion
(fastmcp#4036) and ToolResult(is_error=True) error results
(fastmcp#4217).
what:
- pyproject: fastmcp floor >=3.2.4 -> >=3.4.0 (lock already at 3.4.0)
- CHANGES: Dependencies entry covering passive wins of the bump
  (template resource title preservation, no exc_info on expected
  tool failures, MCP-compliant OTEL span attributes)

note: floor stays at 3.4.0 for now — 3.4.1 (starlette CVE-2026-48710
floor) is inside the 3-day uv dependency cooldown until 2026-06-08;
a follow-up will raise the floor once it clears.
…ToolError

why: Every agent-correctable failure (unknown pane id, invalid
arguments, tier denial) logged at ERROR with a traceback, drowning
operator logs in noise that isn't actionable. fastmcp 3.3 added
ToolError(log_level=...) support (fastmcp#4036).
what:
- Add ExpectedToolError(ToolError) defaulting log_level=WARNING in
  _utils.py
- _map_exception_to_tool_error: expected libtmux failure branches
  return ExpectedToolError; TmuxCommandNotFound (broken operator
  environment) and the unexpected catch-all stay at ERROR
- Sweep all direct validation/state raises in tool modules and
  SafetyMiddleware tier denial to ExpectedToolError
- Doctests: traceback class names follow the raise sites
- tests: parametrized level partition for the mapper; in-memory
  Client + caplog proof that fastmcp's server layer honors the
  WARNING level (fastmcp logger needs propagate=True under caplog)
…ultMiddleware

why: fastmcp's stock ErrorHandlingMiddleware(transform_errors=True)
funnels every tool failure through a -32603 catch-all, so agents
received 'Internal error: Pane not found: %5' — the transform mangled
every expected failure message on the wire. fastmcp 3.4 added
ToolResult(is_error=True) (fastmcp#4217), letting the server shape
error results directly.
what:
- ToolErrorResultMiddleware(ErrorHandlingMiddleware) intercepts
  tool-call exceptions and returns ToolResult(is_error=True) carrying
  the message exactly as raised plus structured meta: error_type
  (originating exception class via __cause__), expected
  (agent-correctable vs operator fault/bug), suggestion (recovery
  hint when present)
- _log_error override honors FastMCPError.log_level — the stock
  implementation hardcoded logger.error, re-shouting failures the
  ExpectedToolError demotion had moved to WARNING
- Inherited on_message keeps the MCP -32002 resource-not-found
  transform for non-tool messages
- Ordering invariant documented in the server.py stack comment:
  conversion must stay outside Audit/Retry/Safety, which depend on
  exception semantics (audit catches to record outcome=error, retry
  matches LibTmuxException via __cause__)
- ExpectedToolError grows an optional suggestion kwarg; the
  not-found mapper branches point agents at the discovery tools
  (list_sessions / list_windows / list_panes)
- docs(topics/logging): document the fastmcp.errors channel and the
  WARNING-expected / ERROR-operator-fault level contract
- tests: probe-server coverage for clean messages, meta payload,
  suggestion plumbing, and per-level logging; production stack-order
  assertions updated to the subclass
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 6, 2026

Codecov Report

❌ Patch coverage is 78.46154% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.59%. Comparing base (310132f) to head (5e3520d).

Files with missing lines Patch % Lines
src/libtmux_mcp/middleware.py 83.09% 9 Missing and 3 partials ⚠️
src/libtmux_mcp/tools/pane_tools/capture_since.py 33.33% 4 Missing ⚠️
src/libtmux_mcp/tools/wait_for_tools.py 40.00% 3 Missing ⚠️
src/libtmux_mcp/tools/buffer_tools.py 75.00% 2 Missing ⚠️
src/libtmux_mcp/tools/pane_tools/io.py 0.00% 2 Missing ⚠️
src/libtmux_mcp/tools/pane_tools/lifecycle.py 33.33% 2 Missing ⚠️
src/libtmux_mcp/tools/session_tools.py 60.00% 2 Missing ⚠️
src/libtmux_mcp/tools/window_tools.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
+ Coverage   84.31%   84.59%   +0.27%     
==========================================
  Files          42       42              
  Lines        2684     2758      +74     
  Branches      359      370      +11     
==========================================
+ Hits         2263     2333      +70     
- Misses        317      321       +4     
  Partials      104      104              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tony added 20 commits June 6, 2026 10:49
why: Tighten the unreleased entries to product level now that the
work has a pull request to link.
what:
- Trim upstream fastmcp PR numbers, OTEL attribute names, and
  fastmcp API mechanics from the Dependencies and What's new
  entries — that depth lives in the linked PR
- Add the PR reference to both What's new entries
- Lead What's new with the clean-error-results headline; point
  discovery-tool mentions through tooliconl roles
…_log_error

why: AGENTS.md logging standard prescribes lazy %-formatting over
f-strings in log calls — deferred interpolation when the level is
filtered, and aggregator message-template grouping. The new
_log_error pre-formatted its message with an f-string; flagged in
the PR code review.
what:
- Pass the template and args to logger.log directly; the stock
  include_traceback if/else collapses into exc_info=<bool>
- tests: convert the log-level test to the NamedTuple + test_id
  parametrize pattern and assert via record.getMessage(), which
  both interpolates lazily-formatted records and would fail on a
  literal %s leaking into the message
why: The module docstring listed ToolErrorResultMiddleware after
ReadonlyRetryMiddleware, but the class is defined directly after
SafetyMiddleware — a reader scanning the file by the docstring's
order lands in the wrong place. Flagged in the PR code review.
what:
- Reorder the bullet list to match file definition order and say so
  in the lead-in (Safety, ToolErrorResult, Audit, ReadonlyRetry,
  TailPreserving)
why: The handle_tool_errors docstrings still said failures re-raise
as plain ToolError; the mapper now classifies expected failures as
ExpectedToolError at WARNING — the docstrings were silent on the
level partition and on why the re-raise chains from e. Flagged in
the PR code review.
what:
- Document the ExpectedToolError(WARNING) / ToolError(ERROR)
  partition on both decorators
- State the single-level __cause__ contract at the raise site:
  ReadonlyRetryMiddleware inspects exactly one cause hop, so
  re-wrapping the mapped error would silently disable readonly
  retries
…-layer errors

why: The request-flow diagram showed only SafetyMiddleware and the
error-handling section described a single decorator-to-ToolError
step; the branch's error redesign (classification in the decorator,
conversion to is_error results in ToolErrorResultMiddleware) made
both descriptions materially incomplete.
what:
- Request flow: list all six middleware outermost-first with a
  one-phrase role each; point at the server.py stack comment for
  the ordering rationale
- Error handling: describe the classification/conversion split, the
  from-e cause chain retries depend on, and the exceptions-stay-
  exceptions invariant through the audit/retry/safety trio; link
  topics/logging for the level policy
- Source layout: middleware.py one-liner covers all middleware, not
  just the safety tier
…se sites

why: Seven docstrings still named ToolError where every raise site
now raises ExpectedToolError — true only by subclassing, and the
rendered API reference loses the expected/unexpected distinction
the error redesign introduced.
what:
- Raises sections: _coerce_dict_arg and _apply_filters (_utils),
  _validate_channel_name and wait_for_channel (wait_for_tools),
  find_pane_by_position (pane_tools/lifecycle)
- Prose: _raise_if_pane_lifecycle_changed one-liner
  (pane_tools/state), wait_for_content_change baseline-loss note
  (pane_tools/wait)
…nvariant

why: The ordering-invariant comment opened with 'All three depend on
exception semantics' but explained only audit and retry and concluded
'silently disable both' — an internally inconsistent count in the one
comment future maintainers rely on to not reorder the stack. Flagged
in the PR code review.
what:
- Name safety's dependency explicitly (tier denials must propagate as
  exceptions for audit to record them) and close with 'break all
  three' at both sites: the ToolErrorResultMiddleware docstring and
  the server.py stack comment
…on claim

why: The error-handling section said the middleware catches the
exception 'at the outer edge of the stack', but Timing and the
response limiter sit outside it — imprecise in the one page whose
purpose is stack-position accuracy. Flagged in the PR code review.
what:
- Describe the catch point as 'once it has cleared the
  audit/retry/safety trio', matching the closing sentence's
  'outermost error layer' framing
why: The class documents its two keyword parameters only via prose
and doctests; AGENTS.md mandates NumPy docstring style, and the
codebase's parameterised classes (SafetyMiddleware, AuditMiddleware)
all carry a Parameters section. Flagged in the PR code review.
what:
- Document *args, log_level, and suggestion in a NumPy Parameters
  block between the extended description and Examples
… expected

why: fastmcp validates tool arguments against the input schema before
tool code runs, raising a bare pydantic.ValidationError that carries
no log_level — the handle_tool_errors decorators never get a chance
to classify it. The middleware logged these at ERROR and marked them
expected=false, contradicting the documented policy that invalid
arguments are agent-correctable WARNINGs (fastmcp's own server layer
already logs the same failure as a WARNING). Reported in PR review.
what:
- _is_schema_validation_error: recognize pydantic.ValidationError on
  the error or its __cause__; output-shape failures cannot be
  confused with this case — fastmcp's tool layer converts those to
  error results before the middleware sees them (verified)
- _log_error: exceptions without log_level classify via the helper —
  WARNING for argument validation, ERROR otherwise
- _error_tool_result: expected=True for schema validation failures
- tests: takes_int probe tool; ErrorLogLevelFixture grows an
  arguments field with a schema_validation_logs_warning case; meta
  classification test asserting expected=True + ValidationError
…tion

why: The limiter sits outside ToolErrorResultMiddleware, so oversized
error results (e.g. a validation message echoing a huge argument on a
response-limited tool) hit the truncation path, which rebuilds the
result without the error flag. The truncated error became an apparent
success, and the MCP SDK client rejected it against the tool's output
schema — the agent received a transport-level RuntimeError instead of
the tool error. Reported in PR review; reproduced against the
reference client.
what:
- TailPreservingResponseLimitingMiddleware.on_call_tool captures the
  inner result and restores is_error=True when the base class
  truncation replaced an error result; meta already survives via the
  base class, and tail-preservation keeps the suggestion line at the
  end of the text
- Truncated successes never gain the flag
- tests: LimiterErrorFixture parametrize (oversized error keeps flag
  + suggestion tail, small error untouched, oversized success stays
  non-error); the failing probe returns a Pydantic model so the MCP
  SDK client's output-schema validation guards the regression

note: truncated SUCCESS results from schema'd tools still trip the
same client-side validation — a pre-existing gap in fastmcp's stock
limiter (its meta workaround only bypasses server-side validation),
out of scope here
…allback

why: The previous commit's verification chain piped mypy through
tail, masking a real arg-type error — the capture callback didn't
satisfy fastmcp's CallNext protocol, which matches the parameter
name ('context'), not just the callable shape.
what:
- Annotate _capture with MiddlewareContext[CallToolRequestParams] ->
  ToolResult and name the parameter 'context' to structurally match
  the protocol; cast the captured Any back to ToolResult
why: The branch now preserves error results through the outer response limiter,
so stale ordering prose made the documented middleware contract misleading.
what:
- Describe response limiting as return-path truncation after error conversion
- Split tool exception classification from FastMCP schema validation handling
- Document that truncated oversized errors preserve is_error and _meta
why: The full suite can capture snapshot output before a burst of queued shell
commands finishes, making the truncation assertions depend on tmux timing.
what:
- Reuse the wait-for helper for snapshot output setup
- Wait for shell command completion before calling snapshot_pane
why: The branch distinguishes agent-correctable failures from operator faults,
so stale ToolError wording obscured the intended client contract.
what:
- Name ExpectedToolError in internal docstrings and middleware comments
- Use expected tool error language in client-facing tool docs
…idation errors

why: Gemini CLI's scheduler leaks its wait_for_previous sequencing
flag into MCP tool arguments when batching calls in one turn
(near-constant in gemini -p runs). Strict schemas reject it and
Gemini self-recovers from the generic pydantic message, but the
recovery depends on the model inferring what to do. MemPalace fixed
the same leak by stripping the key (MemPalace/mempalace#322) and
later whitelisting arguments (#647) — silent dropping would mask
argument typos on mutating/destructive tools, so the rejection
stays; the error just explains itself now.
what:
- _unexpected_kwargs: names flagged unexpected_keyword_argument in a
  pydantic validation failure (error or its __cause__)
- _client_label: client 'name version' from the MCP initialize
  handshake (fastmcp_context.session.client_params.clientInfo);
  None-safe at every hop, wording-only — never gates behavior
- _error_tool_result(error, context=None): synthesizes a suggestion
  for unexpected-argument rejections — name the argument(s) to drop
  or fix, plus a client-scheduling-flag note for wait_for_previous
  that names the connected client when the handshake exposed it
- docs(topics/gotchas): the leak, batching-vs-mode findings, the
  self-recovery flow, and why rejection beats stripping here
- tests: parametrized suggestion coverage (client-labeled flag leak,
  default handshake, plain typo, missing-arg silence) plus a direct
  no-handshake unit case via pydantic.validate_call
why: Prose said 'raises an expected tool error' in plain words while
the API reference renders ExpectedToolError with a stable autodoc
anchor — the docs convention is to cite autodoc'd APIs via roles,
never plain backticks, and readers get a one-click path to the
exception's full contract (log level, suggestion field).
what:
- Convert seven prose mentions to {exc}`~libtmux_mcp._utils.ExpectedToolError`:
  wait-for-channel (crashed signaller), show-hook (unknown hook
  names), delete-buffer (paste after delete), safety (kill-tool
  pane_id requirement), wait-for-content-change (pane lifecycle),
  capture-since (cursor/pane mismatch and pane lifecycle)
- architecture: switch the existing {class} role for
  ExpectedToolError to {exc}, matching the LibTmuxException
  precedent in topics/logging
- Bare fastmcp ToolError literals stay as code text: fastmcp
  publishes no objects.inv, so there is no link target
why: The unrecognized-argument suggestion (with client detection) is
user-visible product surface the unreleased notes didn't cover — the
behavior a Gemini CLI user upgrading will actually notice.
what:
- What's new entry covering the do-this-next hint for rejected
  unknown arguments and the wait_for_previous client-flag note
why: The branch now synthesizes recovery suggestions for schema-validation
failures, including unknown arguments and client scheduling flags, so the
architecture page's discovery-only wording was too narrow. The Gemini
gotcha page also carried brittle current-state claims about client behavior
and external MCP server fixes.
what:
- Broaden error-result suggestion prose to cover recovery hints generally
- Keep ExpectedToolError references linked with {exc}
- Soften stale-prone Gemini and external-server claims
why: Middleware tests still documented safety denials and decorated libtmux
failures with the older ToolError-only mental model, while production now
uses ExpectedToolError for agent-correctable failures.
what:
- Update stack-order and audit-denial docstrings for ExpectedToolError
- Make the audit-denial simulation match current SafetyMiddleware behavior
- Clarify retry/error-probe comments around mapped expected tool errors
@tony tony merged commit 00a5f66 into main Jun 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants