v0.1.1 by themartto · Pull Request #25 · weirdstuff-dev/openheim

themartto · 2026-05-19T07:13:54Z

Summary by CodeRabbit

Bug Fixes
- Session replay now correctly restores tool call and tool-result statuses (completed vs failed) so past tool interactions display their original outcomes.
Documentation
- API docs and examples updated to include tool-result error signaling (an optional error flag) and to show tool-call replay behavior in session retrieval.

themartto · 2026-05-21T06:42:31Z

coderabbitai · 2026-05-21T06:42:38Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-21T06:44:37Z

📝 Walkthrough

Walkthrough

Adds a boolean Message.is_error and threads it from tool execution through streaming, LLM adapters, session replay (acp), docs, and tests so tool results preserve completed/failed status.

Changes

Tool-result error propagation

Layer / File(s)	Summary
Core message model and stream event `src/core/models.rs`	Adds `Message.is_error` with serde skip/default helper; updates `Message::tool_result` to accept `is_error` and expands `StreamEvent::ToolResult` payload.
Agent tool execution and streaming `src/core/agent.rs`	`run_agent_loop` captures (result, is_error) from tool execution and emits `StreamEvent::ToolResult` with `is_error`; test helper updated.
Anthropic adapter and tests `src/core/llm/anthropic.rs`	Maps `msg.is_error` into Anthropic `tool_result` JSON (skip when false), sets `is_error` on converted choices, and updates fixtures/tests to include `is_error:false`.
Gemini adapter and tests `src/core/llm/gemini.rs`	Sets `is_error:false` on converted messages/choices and updates unit tests / `Message::tool_result` calls to pass the new flag.
Retry test helper `src/core/llm/retry.rs`	`ok_choice` test helper now includes `is_error:false` on Message fixtures.
ACP streaming and session load replay `src/acp/mod.rs`	`acp_prompt` maps `StreamEvent::ToolResult.is_error` to `ToolCallUpdate` status (Failed/Completed). `acp_load_session` replay emits Assistant `ToolCall` entries (parsing function.arguments into raw_input when JSON parse succeeds) and `Tool` role `ToolCallUpdate` events using `msg.is_error` and `content`.
API docs `docs/api.md`	Examples and spec add `is_error` for `role:"tool"`, update `session/load` replay description, and add `is_error?: boolean` to the TypeScript `Message` interface.
RAG prompt system message `src/rag/prompt.rs`	System `Message` constructed by PromptBuilder includes `is_error:false` when skills exist.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled bytes beneath the trees,

stitched errors into message seas,
Each tool reply now wears a sign,
completed green or failed red line,
Hooray — the history's clear and fine!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'v0.1.1' is a version number that does not describe the actual changes in the pull request, which involve adding error tracking to tool calls.	Use a descriptive title that summarizes the main change, such as 'Add is_error flag to track tool call failures' or 'Persist tool error state in conversation history'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/acp/mod.rs`:
- Around line 250-259: The code silently swallows JSON parse errors for tool
call arguments by using serde_json::from_str(...).ok() — change this to
explicitly attempt parsing (e.g., match or map_err) on tc.function.arguments so
parsing failures are logged (include tc.id and tc.function.name for context)
before falling back to None; update the raw_input creation used in
on_update(SessionUpdate::ToolCall(...)) to use the parsed value on success and
still pass None on failure after logging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3268db63-6a06-415d-b616-af8c64c7c316

📥 Commits

Reviewing files that changed from the base of the PR and between 58f62df and 3880eb5.

📒 Files selected for processing (1)

src/acp/mod.rs

Add is_error: bool to Message so tool failures are stored explicitly in conversation history rather than being lost as plain text. Propagate the flag through StreamEvent::ToolResult and agent.rs so both the live path and history replay emit ToolCallStatus::Failed instead of always Completed. Forward is_error to Anthropic's tool_result block so the LLM gets accurate signal on failure.

themartto · 2026-05-21T13:21:54Z

@coderabbitai review

coderabbitai · 2026-05-21T13:22:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/core/llm/gemini.rs (1)

158-188: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gemini adapter must propagate msg.is_error into FunctionResponse.response (not always {"result": ...})

src/core/llm/gemini.rs’s Role::Tool branch ignores msg.is_error and always shapes GeminiFunctionResponse.response as {"result": msg.content...}. The Message model explicitly stores is_error for replay correctness, and the Anthropic adapter forwards it via tool_result.is_error. Gemini should encode the failure in the free-form function_response.response payload (e.g., {"error": ...}) so the model can distinguish errors from successes.

♻️ Suggested payload shape on error

-                let part = GeminiPart {
-                    text: None,
-                    function_call: None,
-                    function_response: Some(GeminiFunctionResponse {
-                        name,
-                        response: serde_json::json!({
-                            "result": msg.content.clone().unwrap_or_default()
-                        }),
-                    }),
-                };
+                let content = msg.content.clone().unwrap_or_default();
+                let response = if msg.is_error {
+                    serde_json::json!({ "error": content })
+                } else {
+                    serde_json::json!({ "result": content })
+                };
+                let part = GeminiPart {
+                    text: None,
+                    function_call: None,
+                    function_response: Some(GeminiFunctionResponse { name, response }),
+                };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/llm/gemini.rs` around lines 158 - 188, The Role::Tool branch
currently always sets GeminiFunctionResponse.response to {"result": ...} and
ignores Message::is_error; update the logic that builds GeminiFunctionResponse
in src/core/llm/gemini.rs (the Role::Tool case where GeminiPart and
GeminiFunctionResponse are constructed) to check msg.is_error and, when true,
set response to an error-shaped payload (e.g., {"error":
msg.content.unwrap_or_default()}) and otherwise keep {"result":
msg.content.unwrap_or_default()}; preserve existing merging into last
GeminiContent and ensure None content becomes an empty string.

🧹 Nitpick comments (3)

src/core/agent.rs (1)

91-95: ⚡ Quick win

ToolExecutionResult drops the is_error signal in the non-streaming result path.

is_error is now threaded through StreamEvent::ToolResult and Message::tool_result, but the ToolExecutionResult pushed into AgentStep.tool_calls here is unchanged. Callers of run_agent_with_history that consume AgentResult.steps (i.e., not using the streaming callback) still need to inspect the "Error: " prefix on result to know whether the tool failed — the exact heuristic the cohort is trying to retire.

For consistency, add is_error: bool to ToolExecutionResult in src/core/models.rs and populate it here.
♻️ Proposed change

In src/core/models.rs:
 #[derive(Debug, Serialize, Clone)]
 pub struct ToolExecutionResult {
     pub tool_name: String,
     pub arguments: String,
     pub result: String,
+    pub is_error: bool,
 }
In src/core/agent.rs:
                 tool_results.push(ToolExecutionResult {
                     tool_name: tool_name.clone(),
                     arguments: arguments.clone(),
                     result: result.clone(),
+                    is_error,
                 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/agent.rs` around lines 91 - 95, Add an is_error: bool field to the
ToolExecutionResult struct in models (so it mirrors StreamEvent::ToolResult /
Message::tool_result) and populate that field when pushing into tool_results
inside the agent logic: update the ToolExecutionResult construction in the code
that currently builds tool_results (the push that uses tool_name, arguments,
result) to include is_error (use the existing is_error variable in scope or
derive it the same way streaming path does). Ensure the struct definition and
all places that construct ToolExecutionResult are updated to compile.

src/core/models.rs (1)

381-391: ⚡ Quick win

Consider adding tests for is_error: true and serde default round-trip.

The only assertion added covers the false path. Given the PR explicitly relies on #[serde(default)] so older persisted history without is_error deserializes cleanly, and on is_error: true being faithfully replayed, a couple of small tests would lock that contract in:

Message::tool_result(..., true) sets is_error == true and serializes to JSON that contains "is_error": true.
Deserializing a Message JSON without the is_error key yields is_error == false.
Message with is_error: false does not emit the key (matches skip_serializing_if).

🧪 Suggested additional tests

     #[test]
     fn message_tool_result_sets_correct_fields() {
         let msg =
             Message::tool_result("call_1".into(), "read_file".into(), "content".into(), false);
         assert_eq!(msg.role, Role::Tool);
         assert_eq!(msg.content.as_deref(), Some("content"));
         assert_eq!(msg.tool_call_id.as_deref(), Some("call_1"));
         assert_eq!(msg.tool_name.as_deref(), Some("read_file"));
         assert!(!msg.is_error);
         assert!(msg.tool_calls.is_none());
     }
+
+    #[test]
+    fn message_tool_result_propagates_is_error_true() {
+        let msg =
+            Message::tool_result("call_2".into(), "shell".into(), "boom".into(), true);
+        assert!(msg.is_error);
+        let json: Value = serde_json::to_value(&msg).unwrap();
+        assert_eq!(json["is_error"], true);
+    }
+
+    #[test]
+    fn message_is_error_skipped_when_false_and_defaults_on_deserialize() {
+        let msg = Message::tool_result("c".into(), "t".into(), "ok".into(), false);
+        let json: Value = serde_json::to_value(&msg).unwrap();
+        assert!(json.get("is_error").is_none());
+
+        let raw = r#"{"role":"tool","content":"ok","tool_call_id":"c","tool_name":"t"}"#;
+        let parsed: Message = serde_json::from_str(raw).unwrap();
+        assert!(!parsed.is_error);
+    }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/models.rs` around lines 381 - 391, Add tests to cover the true-path
and serde default round-trip for is_error: call Message::tool_result(..., true)
and assert msg.is_error == true and that serializing the Message to JSON
contains the key "is_error": true; also add a test that deserializing a JSON
Message without an "is_error" field yields is_error == false (verifying
#[serde(default)]), and a test that a Message with is_error == false does not
emit the "is_error" key when serialized (verifying skip_serializing_if). Use the
existing message_tool_result_sets_correct_fields test as a template to create
tests referencing Message::tool_result, is_error, and serde
(serialize/deserialize) helpers.

src/acp/mod.rs (1)

259-276: 💤 Low value

Consider using InProgress for the replayed tool_call to match the live streaming lifecycle.

In the live path (acp_prompt, line 147), a new tool call is first emitted with ToolCallStatus::InProgress and then transitioned via ToolCallUpdate. Here in replay, the initial ToolCall is emitted as Completed (line 272), and then immediately followed by a ToolCallUpdate from the Role::Tool branch that may flip it to Failed. A Completed → Failed transition for the same id is awkward for clients tracking status state machines (see the lifecycle documented in docs/api.md: pending → in_progress → completed/failed).

Emitting InProgress here and letting the subsequent Role::Tool update decide the terminal state keeps replay consistent with the live flow.
♻️ Proposed change
                             on_update(SessionUpdate::ToolCall(
                                 AcpToolCall::new(tc.id.clone(), &tc.function.name)
-                                    .status(ToolCallStatus::Completed)
+                                    .status(ToolCallStatus::InProgress)
                                     .raw_input(raw_input),
                             ));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/acp/mod.rs` around lines 259 - 276, The replay currently emits a ToolCall
with ToolCallStatus::Completed which can produce an invalid Completed→Failed
transition; change the initial emitted status to ToolCallStatus::InProgress when
constructing the AcpToolCall in the on_update(SessionUpdate::ToolCall(...)) call
(the code that calls AcpToolCall::new(tc.id.clone(),
&tc.function.name).status(...).raw_input(...)); leave the Role::Tool branch
updates as-is so they can transition the same id to Failed or Completed,
ensuring the replay matches the live lifecycle emitted by acp_prompt.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/api.md`:
- Line 755: The documentation and implementation disagree: replayed tool calls
are emitted by acp_load_session with ToolCallStatus::Completed (currently at the
replay emission in acp_load_session), but the docs state they arrive as
tool_call with status "in_progress"; to fix, change the replay emission in
acp_load_session to emit ToolCallStatus::InProgress for assistant tool-call
requests (instead of ToolCallStatus::Completed) so the initial replay matches
the "in_progress" doc behavior, and update any tests or downstream handling that
expect Completed; alternatively, if you prefer to keep the code behavior, update
the docs sentence to state replayed tool calls are emitted as "completed" and
may later be flipped by tool_call_update when is_error is set.

---

Outside diff comments:
In `@src/core/llm/gemini.rs`:
- Around line 158-188: The Role::Tool branch currently always sets
GeminiFunctionResponse.response to {"result": ...} and ignores
Message::is_error; update the logic that builds GeminiFunctionResponse in
src/core/llm/gemini.rs (the Role::Tool case where GeminiPart and
GeminiFunctionResponse are constructed) to check msg.is_error and, when true,
set response to an error-shaped payload (e.g., {"error":
msg.content.unwrap_or_default()}) and otherwise keep {"result":
msg.content.unwrap_or_default()}; preserve existing merging into last
GeminiContent and ensure None content becomes an empty string.

---

Nitpick comments:
In `@src/acp/mod.rs`:
- Around line 259-276: The replay currently emits a ToolCall with
ToolCallStatus::Completed which can produce an invalid Completed→Failed
transition; change the initial emitted status to ToolCallStatus::InProgress when
constructing the AcpToolCall in the on_update(SessionUpdate::ToolCall(...)) call
(the code that calls AcpToolCall::new(tc.id.clone(),
&tc.function.name).status(...).raw_input(...)); leave the Role::Tool branch
updates as-is so they can transition the same id to Failed or Completed,
ensuring the replay matches the live lifecycle emitted by acp_prompt.

In `@src/core/agent.rs`:
- Around line 91-95: Add an is_error: bool field to the ToolExecutionResult
struct in models (so it mirrors StreamEvent::ToolResult / Message::tool_result)
and populate that field when pushing into tool_results inside the agent logic:
update the ToolExecutionResult construction in the code that currently builds
tool_results (the push that uses tool_name, arguments, result) to include
is_error (use the existing is_error variable in scope or derive it the same way
streaming path does). Ensure the struct definition and all places that construct
ToolExecutionResult are updated to compile.

In `@src/core/models.rs`:
- Around line 381-391: Add tests to cover the true-path and serde default
round-trip for is_error: call Message::tool_result(..., true) and assert
msg.is_error == true and that serializing the Message to JSON contains the key
"is_error": true; also add a test that deserializing a JSON Message without an
"is_error" field yields is_error == false (verifying #[serde(default)]), and a
test that a Message with is_error == false does not emit the "is_error" key when
serialized (verifying skip_serializing_if). Use the existing
message_tool_result_sets_correct_fields test as a template to create tests
referencing Message::tool_result, is_error, and serde (serialize/deserialize)
helpers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1918326d-3e22-4fa2-a438-7b4375605fd5

📥 Commits

Reviewing files that changed from the base of the PR and between 3880eb5 and 46b992d.

📒 Files selected for processing (8)

docs/api.md
src/acp/mod.rs
src/core/agent.rs
src/core/llm/anthropic.rs
src/core/llm/gemini.rs
src/core/llm/retry.rs
src/core/models.rs
src/rag/prompt.rs

fix: add missed tool calls to history sessions

3880eb5

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/acp/mod.rs

themartto added 4 commits May 21, 2026 10:00

feat: fix tool call errors logging

3a76a46

docs: document is_error field and correct tool call history replay

99c9dc2

style: cargo fmt

46b992d

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread docs/api.md

themartto added 2 commits May 21, 2026 16:34

fix: emit InProgress status for replayed tool calls in acp_load_session

7dd6413

chore: update README.md

4588b77

themartto marked this pull request as ready for review May 21, 2026 13:46

themartto merged commit 98479a1 into main May 21, 2026
5 checks passed

coderabbitai Bot mentioned this pull request May 23, 2026

Better tui #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.1#25

v0.1.1#25
themartto merged 7 commits into
mainfrom
dev

themartto commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

themartto commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

themartto commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

themartto commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

themartto commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

themartto commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

themartto commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading