Skip to content

v0.1.1#25

Merged
themartto merged 7 commits into
mainfrom
dev
May 21, 2026
Merged

v0.1.1#25
themartto merged 7 commits into
mainfrom
dev

Conversation

@themartto
Copy link
Copy Markdown
Contributor

@themartto themartto commented May 19, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Session replay now correctly restores tool call and tool-result statuses (completed vs failed) so past tool interactions display their original outcomes.
  • Documentation
    • API docs and examples updated to include tool-result error signaling (an optional error flag) and to show tool-call replay behavior in session retrieval.

Review Change Stack

@themartto
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

Adds a boolean Message.is_error and threads it from tool execution through streaming, LLM adapters, session replay (acp), docs, and tests so tool results preserve completed/failed status.

Changes

Tool-result error propagation

Layer / File(s) Summary
Core message model and stream event
src/core/models.rs
Adds Message.is_error with serde skip/default helper; updates Message::tool_result to accept is_error and expands StreamEvent::ToolResult payload.
Agent tool execution and streaming
src/core/agent.rs
run_agent_loop captures (result, is_error) from tool execution and emits StreamEvent::ToolResult with is_error; test helper updated.
Anthropic adapter and tests
src/core/llm/anthropic.rs
Maps msg.is_error into Anthropic tool_result JSON (skip when false), sets is_error on converted choices, and updates fixtures/tests to include is_error:false.
Gemini adapter and tests
src/core/llm/gemini.rs
Sets is_error:false on converted messages/choices and updates unit tests / Message::tool_result calls to pass the new flag.
Retry test helper
src/core/llm/retry.rs
ok_choice test helper now includes is_error:false on Message fixtures.
ACP streaming and session load replay
src/acp/mod.rs
acp_prompt maps StreamEvent::ToolResult.is_error to ToolCallUpdate status (Failed/Completed). acp_load_session replay emits Assistant ToolCall entries (parsing function.arguments into raw_input when JSON parse succeeds) and Tool role ToolCallUpdate events using msg.is_error and content.
API docs
docs/api.md
Examples and spec add is_error for role:"tool", update session/load replay description, and add is_error?: boolean to the TypeScript Message interface.
RAG prompt system message
src/rag/prompt.rs
System Message constructed by PromptBuilder includes is_error:false when skills exist.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled bytes beneath the trees,

stitched errors into message seas,
Each tool reply now wears a sign,
completed green or failed red line,
Hooray — the history's clear and fine!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'v0.1.1' is a version number that does not describe the actual changes in the pull request, which involve adding error tracking to tool calls. Use a descriptive title that summarizes the main change, such as 'Add is_error flag to track tool call failures' or 'Persist tool error state in conversation history'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/acp/mod.rs`:
- Around line 250-259: The code silently swallows JSON parse errors for tool
call arguments by using serde_json::from_str(...).ok() — change this to
explicitly attempt parsing (e.g., match or map_err) on tc.function.arguments so
parsing failures are logged (include tc.id and tc.function.name for context)
before falling back to None; update the raw_input creation used in
on_update(SessionUpdate::ToolCall(...)) to use the parsed value on success and
still pass None on failure after logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3268db63-6a06-415d-b616-af8c64c7c316

📥 Commits

Reviewing files that changed from the base of the PR and between 58f62df and 3880eb5.

📒 Files selected for processing (1)
  • src/acp/mod.rs

Comment thread src/acp/mod.rs
themartto added 4 commits May 21, 2026 10:00
Add is_error: bool to Message so tool failures are stored explicitly in
conversation history rather than being lost as plain text. Propagate the
flag through StreamEvent::ToolResult and agent.rs so both the live path
and history replay emit ToolCallStatus::Failed instead of always Completed.
Forward is_error to Anthropic's tool_result block so the LLM gets accurate
signal on failure.
@themartto
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/core/llm/gemini.rs (1)

158-188: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gemini adapter must propagate msg.is_error into FunctionResponse.response (not always {"result": ...})

src/core/llm/gemini.rs’s Role::Tool branch ignores msg.is_error and always shapes GeminiFunctionResponse.response as {"result": msg.content...}. The Message model explicitly stores is_error for replay correctness, and the Anthropic adapter forwards it via tool_result.is_error. Gemini should encode the failure in the free-form function_response.response payload (e.g., {"error": ...}) so the model can distinguish errors from successes.

♻️ Suggested payload shape on error
-                let part = GeminiPart {
-                    text: None,
-                    function_call: None,
-                    function_response: Some(GeminiFunctionResponse {
-                        name,
-                        response: serde_json::json!({
-                            "result": msg.content.clone().unwrap_or_default()
-                        }),
-                    }),
-                };
+                let content = msg.content.clone().unwrap_or_default();
+                let response = if msg.is_error {
+                    serde_json::json!({ "error": content })
+                } else {
+                    serde_json::json!({ "result": content })
+                };
+                let part = GeminiPart {
+                    text: None,
+                    function_call: None,
+                    function_response: Some(GeminiFunctionResponse { name, response }),
+                };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/llm/gemini.rs` around lines 158 - 188, The Role::Tool branch
currently always sets GeminiFunctionResponse.response to {"result": ...} and
ignores Message::is_error; update the logic that builds GeminiFunctionResponse
in src/core/llm/gemini.rs (the Role::Tool case where GeminiPart and
GeminiFunctionResponse are constructed) to check msg.is_error and, when true,
set response to an error-shaped payload (e.g., {"error":
msg.content.unwrap_or_default()}) and otherwise keep {"result":
msg.content.unwrap_or_default()}; preserve existing merging into last
GeminiContent and ensure None content becomes an empty string.
🧹 Nitpick comments (3)
src/core/agent.rs (1)

91-95: ⚡ Quick win

ToolExecutionResult drops the is_error signal in the non-streaming result path.

is_error is now threaded through StreamEvent::ToolResult and Message::tool_result, but the ToolExecutionResult pushed into AgentStep.tool_calls here is unchanged. Callers of run_agent_with_history that consume AgentResult.steps (i.e., not using the streaming callback) still need to inspect the "Error: " prefix on result to know whether the tool failed — the exact heuristic the cohort is trying to retire.

For consistency, add is_error: bool to ToolExecutionResult in src/core/models.rs and populate it here.

♻️ Proposed change

In src/core/models.rs:

 #[derive(Debug, Serialize, Clone)]
 pub struct ToolExecutionResult {
     pub tool_name: String,
     pub arguments: String,
     pub result: String,
+    pub is_error: bool,
 }

In src/core/agent.rs:

                 tool_results.push(ToolExecutionResult {
                     tool_name: tool_name.clone(),
                     arguments: arguments.clone(),
                     result: result.clone(),
+                    is_error,
                 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/agent.rs` around lines 91 - 95, Add an is_error: bool field to the
ToolExecutionResult struct in models (so it mirrors StreamEvent::ToolResult /
Message::tool_result) and populate that field when pushing into tool_results
inside the agent logic: update the ToolExecutionResult construction in the code
that currently builds tool_results (the push that uses tool_name, arguments,
result) to include is_error (use the existing is_error variable in scope or
derive it the same way streaming path does). Ensure the struct definition and
all places that construct ToolExecutionResult are updated to compile.
src/core/models.rs (1)

381-391: ⚡ Quick win

Consider adding tests for is_error: true and serde default round-trip.

The only assertion added covers the false path. Given the PR explicitly relies on #[serde(default)] so older persisted history without is_error deserializes cleanly, and on is_error: true being faithfully replayed, a couple of small tests would lock that contract in:

  • Message::tool_result(..., true) sets is_error == true and serializes to JSON that contains "is_error": true.
  • Deserializing a Message JSON without the is_error key yields is_error == false.
  • Message with is_error: false does not emit the key (matches skip_serializing_if).
🧪 Suggested additional tests
     #[test]
     fn message_tool_result_sets_correct_fields() {
         let msg =
             Message::tool_result("call_1".into(), "read_file".into(), "content".into(), false);
         assert_eq!(msg.role, Role::Tool);
         assert_eq!(msg.content.as_deref(), Some("content"));
         assert_eq!(msg.tool_call_id.as_deref(), Some("call_1"));
         assert_eq!(msg.tool_name.as_deref(), Some("read_file"));
         assert!(!msg.is_error);
         assert!(msg.tool_calls.is_none());
     }
+
+    #[test]
+    fn message_tool_result_propagates_is_error_true() {
+        let msg =
+            Message::tool_result("call_2".into(), "shell".into(), "boom".into(), true);
+        assert!(msg.is_error);
+        let json: Value = serde_json::to_value(&msg).unwrap();
+        assert_eq!(json["is_error"], true);
+    }
+
+    #[test]
+    fn message_is_error_skipped_when_false_and_defaults_on_deserialize() {
+        let msg = Message::tool_result("c".into(), "t".into(), "ok".into(), false);
+        let json: Value = serde_json::to_value(&msg).unwrap();
+        assert!(json.get("is_error").is_none());
+
+        let raw = r#"{"role":"tool","content":"ok","tool_call_id":"c","tool_name":"t"}"#;
+        let parsed: Message = serde_json::from_str(raw).unwrap();
+        assert!(!parsed.is_error);
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/models.rs` around lines 381 - 391, Add tests to cover the true-path
and serde default round-trip for is_error: call Message::tool_result(..., true)
and assert msg.is_error == true and that serializing the Message to JSON
contains the key "is_error": true; also add a test that deserializing a JSON
Message without an "is_error" field yields is_error == false (verifying
#[serde(default)]), and a test that a Message with is_error == false does not
emit the "is_error" key when serialized (verifying skip_serializing_if). Use the
existing message_tool_result_sets_correct_fields test as a template to create
tests referencing Message::tool_result, is_error, and serde
(serialize/deserialize) helpers.
src/acp/mod.rs (1)

259-276: 💤 Low value

Consider using InProgress for the replayed tool_call to match the live streaming lifecycle.

In the live path (acp_prompt, line 147), a new tool call is first emitted with ToolCallStatus::InProgress and then transitioned via ToolCallUpdate. Here in replay, the initial ToolCall is emitted as Completed (line 272), and then immediately followed by a ToolCallUpdate from the Role::Tool branch that may flip it to Failed. A Completed → Failed transition for the same id is awkward for clients tracking status state machines (see the lifecycle documented in docs/api.md: pending → in_progress → completed/failed).

Emitting InProgress here and letting the subsequent Role::Tool update decide the terminal state keeps replay consistent with the live flow.

♻️ Proposed change
                             on_update(SessionUpdate::ToolCall(
                                 AcpToolCall::new(tc.id.clone(), &tc.function.name)
-                                    .status(ToolCallStatus::Completed)
+                                    .status(ToolCallStatus::InProgress)
                                     .raw_input(raw_input),
                             ));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/acp/mod.rs` around lines 259 - 276, The replay currently emits a ToolCall
with ToolCallStatus::Completed which can produce an invalid Completed→Failed
transition; change the initial emitted status to ToolCallStatus::InProgress when
constructing the AcpToolCall in the on_update(SessionUpdate::ToolCall(...)) call
(the code that calls AcpToolCall::new(tc.id.clone(),
&tc.function.name).status(...).raw_input(...)); leave the Role::Tool branch
updates as-is so they can transition the same id to Failed or Completed,
ensuring the replay matches the live lifecycle emitted by acp_prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/api.md`:
- Line 755: The documentation and implementation disagree: replayed tool calls
are emitted by acp_load_session with ToolCallStatus::Completed (currently at the
replay emission in acp_load_session), but the docs state they arrive as
tool_call with status "in_progress"; to fix, change the replay emission in
acp_load_session to emit ToolCallStatus::InProgress for assistant tool-call
requests (instead of ToolCallStatus::Completed) so the initial replay matches
the "in_progress" doc behavior, and update any tests or downstream handling that
expect Completed; alternatively, if you prefer to keep the code behavior, update
the docs sentence to state replayed tool calls are emitted as "completed" and
may later be flipped by tool_call_update when is_error is set.

---

Outside diff comments:
In `@src/core/llm/gemini.rs`:
- Around line 158-188: The Role::Tool branch currently always sets
GeminiFunctionResponse.response to {"result": ...} and ignores
Message::is_error; update the logic that builds GeminiFunctionResponse in
src/core/llm/gemini.rs (the Role::Tool case where GeminiPart and
GeminiFunctionResponse are constructed) to check msg.is_error and, when true,
set response to an error-shaped payload (e.g., {"error":
msg.content.unwrap_or_default()}) and otherwise keep {"result":
msg.content.unwrap_or_default()}; preserve existing merging into last
GeminiContent and ensure None content becomes an empty string.

---

Nitpick comments:
In `@src/acp/mod.rs`:
- Around line 259-276: The replay currently emits a ToolCall with
ToolCallStatus::Completed which can produce an invalid Completed→Failed
transition; change the initial emitted status to ToolCallStatus::InProgress when
constructing the AcpToolCall in the on_update(SessionUpdate::ToolCall(...)) call
(the code that calls AcpToolCall::new(tc.id.clone(),
&tc.function.name).status(...).raw_input(...)); leave the Role::Tool branch
updates as-is so they can transition the same id to Failed or Completed,
ensuring the replay matches the live lifecycle emitted by acp_prompt.

In `@src/core/agent.rs`:
- Around line 91-95: Add an is_error: bool field to the ToolExecutionResult
struct in models (so it mirrors StreamEvent::ToolResult / Message::tool_result)
and populate that field when pushing into tool_results inside the agent logic:
update the ToolExecutionResult construction in the code that currently builds
tool_results (the push that uses tool_name, arguments, result) to include
is_error (use the existing is_error variable in scope or derive it the same way
streaming path does). Ensure the struct definition and all places that construct
ToolExecutionResult are updated to compile.

In `@src/core/models.rs`:
- Around line 381-391: Add tests to cover the true-path and serde default
round-trip for is_error: call Message::tool_result(..., true) and assert
msg.is_error == true and that serializing the Message to JSON contains the key
"is_error": true; also add a test that deserializing a JSON Message without an
"is_error" field yields is_error == false (verifying #[serde(default)]), and a
test that a Message with is_error == false does not emit the "is_error" key when
serialized (verifying skip_serializing_if). Use the existing
message_tool_result_sets_correct_fields test as a template to create tests
referencing Message::tool_result, is_error, and serde (serialize/deserialize)
helpers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1918326d-3e22-4fa2-a438-7b4375605fd5

📥 Commits

Reviewing files that changed from the base of the PR and between 3880eb5 and 46b992d.

📒 Files selected for processing (8)
  • docs/api.md
  • src/acp/mod.rs
  • src/core/agent.rs
  • src/core/llm/anthropic.rs
  • src/core/llm/gemini.rs
  • src/core/llm/retry.rs
  • src/core/models.rs
  • src/rag/prompt.rs

Comment thread docs/api.md
@themartto themartto marked this pull request as ready for review May 21, 2026 13:46
@themartto themartto merged commit 98479a1 into main May 21, 2026
5 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant