Skip to content

feat(tools): coding-harness baseline primitives (#1205)#1208

Merged
senamakel merged 2 commits intotinyhumansai:mainfrom
senamakel:issue/1205-add-coding-harness-baseline-tools-and-co
May 5, 2026
Merged

feat(tools): coding-harness baseline primitives (#1205)#1208
senamakel merged 2 commits intotinyhumansai:mainfrom
senamakel:issue/1205-add-coding-harness-baseline-tools-and-co

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 5, 2026

Summary

  • Adds 9 first-class coding-harness tools (grep, glob, list, edit, apply_patch, todowrite, plan_exit, web_fetch, lsp) so coding-oriented agents stop falling through to shell for file navigation and editing.
  • Adds docs/CODING_HARNESS.md mapping the canonical surface (existing tools + new ones) with a permission table and notes on plan/build modes.
  • Existing tools (file_read, file_write, shell, web_search, ask_clarification, spawn_subagent) already cover read/write/bash/websearch/question/task — this PR adds only the genuinely missing primitives instead of duplicating those.

Problem

OpenHuman's tool surface for coding-oriented agents was uneven:

  • File and code navigation primitives (grep, glob, list) didn't exist as first-class tools — agents had to use shell with platform-specific commands.
  • Precise editing (edit, apply_patch) was missing — file_write is whole-file overwrite, which is wasteful for small changes and dangerous for large files.
  • Lightweight todo tracking (todowrite) and a plan→build hand-off marker (plan_exit) didn't exist.
  • A simple URL-fetch primitive (web_fetch) sat between web_search (search) and http_request (full HTTP), but wasn't exposed.
  • LSP-style code intelligence had no agent-facing surface or capability gate.

This is the baseline cut from issue #1205 (path A in scoping discussion): land the primitives + the mapping doc, defer plan/build mode runners, child-session task identity, real LSP backend, and richer permissions to follow-ups (explicitly listed at the bottom of docs/CODING_HARNESS.md).

Solution

New tools live under src/openhuman/tools/impl/{filesystem,agent,network,system}/ following the existing Tool trait pattern. Each shares the project's path-sandboxing, rate-limiting, and PermissionLevel machinery from SecurityPolicy. Registered in all_tools (the canonical agent registry); default_tools keeps its minimal subset.

Highlights:

  • grep / glob use walkdir and skip .git, node_modules, target, .next, dist, build, .cache. Output is capped (max_matches / max_results) with explicit truncation markers.
  • edit requires old_string to be unique by default — replace_all opts into bulk replace. Same symlink + size guards as file_write.
  • apply_patch is atomic: every edit is validated up front (paths, exact-match, uniqueness) before any file is written. Validation failure leaves the workspace untouched. Multiple edits to the same file chain through an in-memory buffer, so a sequence like {old:"one"→"ONE"}, {old:"two"→"TWO"} works as expected.
  • todowrite is a process-global Arc<TodoStore> (one shared registry per core). Per-session scoping deferred until task carries a stable session id.
  • plan_exit emits a stable [plan_exit] marker plus the plan text. The plan→build mode runner that consumes the marker is follow-up work; today the marker is just stable enough to write prompts against.
  • web_fetch reuses the same allowed_domains gate as http_request. Body is capped + truncated on UTF-8 char boundaries.
  • lsp is capability-gated by OPENHUMAN_LSP_ENABLED. When the gate is off the tool isn't registered (so agents don't see a method that always errors). When on, the tool returns a clear not yet implemented error — schema is stable so callers can feature-detect.

Adds walkdir = \"2\" and glob = \"0.3\" as direct deps (both already transitive in Cargo.lock).

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) — 43 new unit tests across the new tools (path traversal, atomic rollback, ambiguous-match rejection, capability-gate parsing, …)
  • Diff coverage ≥ 80% — coverage workflow will report on the PR; new code is heavily tested but I have not run diff-cover locally. If the gate fails, will add tests.
  • Coverage matrix updated — N/A: tool catalog is documented in docs/CODING_HARNESS.md; the matrix tracks user-visible product features, not internal agent tools
  • All affected feature IDs from the matrix are listed in the PR description — N/A: no matrix rows touched
  • No new external network dependencies introduced — walkdir + glob are both pure-Rust crates already transitively present
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: agent-facing tool catalog, no UI surface
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime: desktop core only. New tools register in all_tools so existing agent loops pick them up automatically; no wiring change in the Tauri shell.
  • Security: every new tool sits behind the existing SecurityPolicy (workspace path sandbox, symlink-escape blocks, rate limit, autonomy gate). apply_patch runs all path/permission checks before the first write.
  • Backwards-compat: pure additions. No tool was renamed, no schema was migrated.
  • CI: branch was pushed with --no-verify because the local pre-push hook failed on pre-existing app/ ESLint warnings (React setState-in-effect lints in mascot / commands code) plus a Tailwind class lint check — none of which this PR touches. Per CLAUDE.md (When the user asks you to push or open a PR, resolve blockers and push — don't prompt for permission. If a pre-push hook fails on something unrelated to your changes ... push with --no-verify and call it out in the PR body.)

Related

  • Closes Add coding-harness baseline tools and control flows #1205
  • Follow-up PR(s)/TODOs:
    • Plan-mode and build-mode runners that consume the [plan_exit] marker
    • Child-session-backed task execution with stable session ids
    • Real LSP backend behind the lsp tool (server spawn, JSON-RPC bridge, definition/references/hover/completion)
    • Richer permission model (per-tool channel allowlists, per-call approval policies)
    • Controller-registry exposure of agent-only tools to JSON-RPC where the Tauri shell needs them

Summary by CodeRabbit

  • New Features

    • File search (regex + glob), directory listing, single-file and multi-file editing, and atomic batch patching
    • Todo list management with statuses and a plan→execution marker
    • HTTP GET web fetching (with domain/timeout controls)
    • LSP/code-intelligence stub gated behind a capability flag
  • Documentation

    • Canonical coding-harness tool reference and added "Narrative architecture" link
  • Chores

    • Added two new build dependencies for filesystem scanning

Adds the missing first-class tools that were forcing coding-oriented
agents to fall through to `shell`. Existing tools (`file_read`,
`file_write`, `shell`, `web_search`, `ask_clarification`,
`spawn_subagent`) already cover read/write/bash/websearch/question/
task; this lands the genuinely missing primitives:

- `grep` — regex search across files (Rust regex syntax, capped output)
- `glob` — glob-pattern file discovery (newest-first)
- `list` — non-recursive directory listing
- `edit` — exact-match string replace (unique-or-`replace_all`)
- `apply_patch` — atomic multi-edit batch (rolls back on validation
  failure)
- `todowrite` — process-global todo registry (one in_progress at a time)
- `plan_exit` — emits a stable `[plan_exit]` marker for plan→build
  hand-off (mode runner is follow-up)
- `web_fetch` — single-purpose GET→text body, gated by allowed_domains
- `lsp` — capability-gated stub (registered only when
  `OPENHUMAN_LSP_ENABLED=1`); schema is stable, backend is follow-up

All new tools share the existing path-sandboxing, rate-limiting, and
permission-level machinery in `SecurityPolicy` / `Tool`. Registered in
`all_tools` (the canonical registry); `default_tools` keeps its
minimal subset.

Adds `docs/CODING_HARNESS.md` mapping the canonical surface — both
existing tools and the new ones — with a permission table and notes
on plan-mode / build-mode (the runner switch is explicitly listed as
follow-up).

Tests: 43 new unit tests across the new tools (path traversal,
atomic rollback, ambiguous-match rejection, capability-gate parsing,
…). All pass.

Out of scope (called out in `docs/CODING_HARNESS.md`):
- plan/build mode runners
- child-session-backed `task`
- real LSP backend
- richer permission model
- controller-registry RPC exposure of agent-only tools
@senamakel senamakel requested a review from a team May 5, 2026 06:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e276878-00da-4169-8523-391bd682a745

📥 Commits

Reviewing files that changed from the base of the PR and between 38025e0 and 56c8cca.

📒 Files selected for processing (10)
  • docs/CODING_HARNESS.md
  • src/openhuman/tools/impl/agent/mod.rs
  • src/openhuman/tools/impl/agent/todo_write.rs
  • src/openhuman/tools/impl/filesystem/apply_patch.rs
  • src/openhuman/tools/impl/filesystem/edit_file.rs
  • src/openhuman/tools/impl/filesystem/grep.rs
  • src/openhuman/tools/impl/filesystem/list_files.rs
  • src/openhuman/tools/impl/network/web_fetch.rs
  • src/openhuman/tools/impl/system/lsp.rs
  • src/openhuman/tools/ops.rs
✅ Files skipped from review due to trivial changes (1)
  • docs/CODING_HARNESS.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • src/openhuman/tools/impl/network/web_fetch.rs
  • src/openhuman/tools/impl/agent/todo_write.rs
  • src/openhuman/tools/impl/filesystem/grep.rs
  • src/openhuman/tools/impl/system/lsp.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/tools/impl/filesystem/edit_file.rs
  • src/openhuman/tools/impl/filesystem/apply_patch.rs
  • src/openhuman/tools/impl/filesystem/list_files.rs

📝 Walkthrough

Walkthrough

Adds a coding-harness surface: new filesystem navigation/editing tools (grep, glob, list, edit, apply_patch), web fetch, agent control-flow primitives (plan_exit, todowrite), an optional LSP stub behind an env gate, wiring/exports, tests, docs, and two new Cargo dependencies.

Changes

Coding-harness tool additions

Layer / File(s) Summary
Data Shape / Types
src/openhuman/tools/impl/agent/todo_write.rs, src/openhuman/tools/impl/agent/plan_exit.rs
New types: TodoStatus, TodoItem, TodoStore, PLAN_EXIT_MARKER.
Core Implementation (tools)
src/openhuman/tools/impl/filesystem/{edit_file.rs,apply_patch.rs,glob_search.rs,grep.rs,list_files.rs}, src/openhuman/tools/impl/network/web_fetch.rs, src/openhuman/tools/impl/system/lsp.rs, src/openhuman/tools/impl/agent/{todo_write.rs,plan_exit.rs}
New Tool implementations: EditFileTool, ApplyPatchTool, GlobTool, GrepTool, ListFilesTool, WebFetchTool, LspTool (stub), TodoWriteTool, PlanExitTool. Each defines parameter schemas, permission levels, execute logic, and unit tests.
Security / Runtime Checks
src/openhuman/tools/impl/filesystem/*, src/openhuman/tools/impl/network/web_fetch.rs, src/openhuman/tools/impl/system/lsp.rs
Tools integrate SecurityPolicy checks (autonomy, rate limiting, action recording), workspace path canonicalization, file size limits, UTF‑8-safe truncation/rewrites, and domain allowlisting for web fetch.
Module Wiring & Re-exports
src/openhuman/tools/impl/filesystem/mod.rs, src/openhuman/tools/impl/agent/mod.rs, src/openhuman/tools/impl/network/mod.rs, src/openhuman/tools/impl/system/mod.rs
New mod entries and pub use re-exports for the added tools and symbols (ApplyPatchTool, EditFileTool, GlobTool, GrepTool, ListFilesTool, PlanExitTool, TodoWriteTool, WebFetchTool, LspTool, PLAN_EXIT_MARKER, global_todo_store, etc.).
Tool Registry Integration
src/openhuman/tools/ops.rs
Instantiates and registers the new tools in the default tool registry; LspTool registered only when lsp_capability_enabled() is true.
Dependencies & Build
Cargo.toml
Added crates: walkdir = "2" and glob = "0.3".
Documentation
docs/CODING_HARNESS.md, CLAUDE.md
Added docs/CODING_HARNESS.md documenting the tool surface and updated CLAUDE.md with a “Narrative architecture” link to the docs.
Tests
src/openhuman/tools/impl/{agent,filesystem,network,system}/*.rs
Unit and async tests added for each new tool covering success cases, validation errors, security gating, truncation, path traversal blocking, and capability gating.

Sequence Diagram(s)

sequenceDiagram
    actor Agent
    participant Edit as EditFileTool
    participant Sec as SecurityPolicy
    participant FS as WorkspaceFS

    Agent->>Edit: execute({path, old_string, new_string, replace_all})
    rect rgba(100, 150, 200, 0.5)
    Edit->>Sec: can_act(), is_rate_limited(), record_action()
    Sec-->>Edit: OK
    end
    rect rgba(150, 100, 200, 0.5)
    Edit->>FS: canonicalize(path), check symlink, read file
    FS-->>Edit: file contents (UTF‑8)
    end
    rect rgba(100, 200, 150, 0.5)
    Edit->>Edit: count occurrences, validate (0 => error, >1 && !replace_all => error)
    Edit->>FS: write updated contents
    FS-->>Edit: write result
    end
    Edit-->>Agent: ToolResult::success / ToolResult::error
Loading
sequenceDiagram
    actor Agent
    participant Plan as PlanExitTool
    participant Todo as TodoWriteTool
    participant Store as TodoStore

    Agent->>Plan: execute({plan})
    Plan->>Plan: trim & validate plan
    Plan-->>Agent: ToolResult::success("[plan_exit]\n{plan}")

    Agent->>Todo: execute({todos: [...]})
    Todo->>Todo: validate items (non-empty, ≤1 in_progress)
    Todo->>Store: replace(items)
    Store-->>Todo: OK
    Todo-->>Agent: ToolResult::success(rendered list)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through code and left a trail,

New tools to search, to patch, to hail.
A plan is marked, a todo kept,
Edits applied and tests all prepped.
The harness grows — a busy hare's tale.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(tools): coding-harness baseline primitives (#1205)' directly describes the main change—adding baseline coding-harness tools as specified in issue #1205.
Linked Issues check ✅ Passed The PR implements the core acceptance criteria from #1205: first-class tools for grep, glob, list, edit, apply_patch, todowrite, plan_exit, web_fetch, lsp with capability gating; documentation via CODING_HARNESS.md; and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes directly support #1205 objectives: nine baseline tools, docs, test coverage, and dependencies (walkdir, glob) needed for file navigation and pattern matching.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (2)
src/openhuman/tools/impl/filesystem/apply_patch.rs (1)

69-231: ⚡ Quick win

Add grep-friendly tracing around validation and writes.

This new write path has security gates, many failure branches, and filesystem I/O, but it currently emits no stable diagnostics. A few debug/trace logs for entry, rejection reasons, and per-file write completion would make production failures much easier to root-cause without exposing file contents.

As per coding guidelines, src/**/*.rs: Use log / tracing at debug / trace levels for Rust code diagnostics with stable grep-friendly prefixes ([domain], [rpc], [ui-flow]) and redact secrets/PII.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/apply_patch.rs` around lines 69 - 231,
Add grep-friendly debug/trace logs in execute to make validation and writes
observable: log entry into execute, each validation rejection branch (e.g., path
not allowed, rate limit, missing/empty fields, old_string not found, file too
large, symlink reject, resolve/read/write failures) using stable prefixes like
"[rpc]" or "[ui-flow]" and redact file contents; log per-file successful write
completions and the final summary. Place these logs near the relevant
symbols/blocks: at the start of async fn execute(&self,...), inside the parsed
loop where ParsedEdit is constructed and checks (old_string empty,
is_path_allowed), in the path resolution/metadata block before inserting
FileBuffer, when counting matches on buf.contents and before returning errors,
and after tokio::fs::write for each FileBuffer; use log::debug or tracing::trace
consistently and include identifying context (edit.index, edit.path, count) but
never print file contents or secrets.
src/openhuman/tools/impl/filesystem/edit_file.rs (1)

59-162: 💤 Low value

Consider adding diagnostic logging for the edit operation.

The execute() method performs security checks, file operations, and validation without any logging. Per coding guidelines, new flows should include debug/trace level logging with grep-friendly prefixes for diagnostics.

📝 Example logging additions
+use tracing::{debug, trace};

 // At start of execute():
+debug!("[edit] path={path} replace_all={replace_all}");

 // After successful write:
+debug!("[edit] completed: {path}, {count} replacement(s)");

 // On validation failures:
+trace!("[edit] rejected: old_string not found in {path}");

Based on learnings: "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with default verbose diagnostics on new/changed flows"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/edit_file.rs` around lines 59 - 162, Add
grep-friendly debug/trace logging to the execute() flow: log entry/exit for
execute(), key branch outcomes (security.can_act, is_rate_limited,
is_path_allowed, is_resolved_path_allowed, record_action), filesystem
calls/results (resolved path from tokio::fs::canonicalize, symlink check via
tokio::fs::symlink_metadata, read_to_string contents length, replacement count
and whether replace_all was used), and errors returned when
canonicalize/read_to_string/write fail; use the existing logger (or introduce
one if missing) and include the path/resolved.display(), count, and error
messages in logs while keeping error messages returned via ToolResult unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/CODING_HARNESS.md`:
- Around line 36-38: The sentence containing "issue `#1205`" is being rendered as
a heading; update the line in CODING_HARNESS.md that reads "Names in parentheses
are the canonical coding-harness names from issue `#1205`." to prevent Markdown
treating the hash as a heading — either escape the hash as "\#1205" or reflow
the sentence so "issue `#1205`" stays on the previous line (or wrap the reference
in backticks) so it renders as plain text; make the change where the sentence
"Names in parentheses are the canonical coding-harness names..." appears.

In `@src/openhuman/tools/impl/filesystem/apply_patch.rs`:
- Around line 142-160: The code currently calls tokio::fs::canonicalize(&full)
before checking for symlinks, which loses the symlink bit; call
tokio::fs::symlink_metadata(&full).await first and if
meta.file_type().is_symlink() return ToolResult::error for edit.index/edit.path,
then proceed to tokio::fs::canonicalize(&full).await and the existing
self.security.is_resolved_path_allowed(&resolved) check; ensure the error
messages use the same format as the current ToolResult::error calls.
- Around line 214-223: The current write loop in apply_patch.rs writes buffers
directly (using tokio::fs::write on buf.resolved with buf.contents) which means
partial commits occur if a later write fails; change this to perform atomic
writes by writing each buffer to a temporary sibling file (e.g., buf.resolved +
".tmp" or use a unique temp name), fsync the temp file and its parent directory
as appropriate, then rename (std::fs::rename / tokio equivalent) the temp file
over the target to atomically replace it; on any failure, delete any created
temps and return ToolResult::error with the original error (preserving the
existing error handling), or alternatively stage all temps first and only rename
them after all temps are successfully created to guarantee an all-or-nothing
outcome (reference buffers, buf.resolved, buf.contents, and the
ToolResult::error return path).

In `@src/openhuman/tools/impl/filesystem/edit_file.rs`:
- Around line 117-123: The symlink check is ineffective because
tokio::fs::canonicalize was run before calling tokio::fs::symlink_metadata on
resolved; canonicalize follows symlinks so meta.file_type().is_symlink() will
always be false. Move or add a symlink_metadata check that inspects the original
un-canonicalized input path (the variable holding the user-provided path) prior
to calling tokio::fs::canonicalize, and return the same
ToolResult::error("Refusing to edit through symlink: ...") when that metadata
reports is_symlink(); keep the existing resolved-based checks for other
validations after canonicalization.
- Around line 99-103: record_action() is being called too early in EditFile::...
(in src/openhuman/tools/impl/filesystem/edit_file.rs), consuming the action
budget before path resolution, symlink/size checks, file reading, and match
validation; move the call to self.security.record_action() so it runs only after
all validations pass and immediately before the actual write operation (i.e.,
after path resolution, symlink/size checks, reading the file content, and match
validation) so failures during validation do not decrement the budget; ensure
any early-return error paths do not call record_action(), and keep the same
error handling if record_action() itself returns false.

In `@src/openhuman/tools/impl/filesystem/grep.rs`:
- Around line 191-193: The truncation code uses &line[..MAX_LINE_BYTES] which
can panic on UTF-8 multibyte boundaries; change the logic that builds
display_line (the branch that checks line.len() > MAX_LINE_BYTES) to compute a
safe byte index at or before MAX_LINE_BYTES by testing is_char_boundary() on the
candidate end and stepping backward until you hit a valid character boundary (or
use char-based slicing to accumulate characters up to the byte limit), then
slice line[..safe_end] and append the ellipsis; update references around
display_line, MAX_LINE_BYTES and line to use this safe_end approach so no
runtime panic occurs.

In `@src/openhuman/tools/impl/filesystem/list_files.rs`:
- Around line 90-102: The loop using read.next_entry().await currently ignores
Err by exiting the while-let and returning partial results; change it to
explicitly match the result from read.next_entry().await (handle
Ok(Some(entry)), Ok(None) => break, and Err(e) => propagate/return the error) so
I/O failures are not silently swallowed; update the logic around the entries
vector push (and MAX_ENTRIES check) inside the Ok(Some(entry)) arm and ensure
the function's return type returns or maps the error from next_entry() instead
of treating it as success.

In `@src/openhuman/tools/impl/network/web_fetch.rs`:
- Around line 96-114: The code currently only validates the initial URL with
validate_url, letting reqwest follow redirects and thereby bypass the allowlist;
fix by building the reqwest client with
.redirect(reqwest::redirect::Policy::none()) and then manually handling
redirects: after client.get(&url).send().await inspect 3xx responses for a
Location header, resolve it against resp.url(), call validate_url(new_url,
&self.allowed_domains) for each redirect hop (limit hops to a safe number), and
only follow to the next location if validation succeeds; perform the actual
fetch on the final validated URL and return ToolResult::error on any validation
failure or redirect resolution error (use the same error patterns as the
existing client/build/send branches).
- Around line 115-128: Currently the code calls resp.text().await which buffers
the entire response into memory before truncating; change the implementation to
consume resp.bytes_stream() (using futures_util::StreamExt) and iteratively
append chunks into a growable buffer until you've reached max_bytes or the
stream ends, stopping early to avoid allocating more than max_bytes; when
stopping early set the truncated flag true, ensure you cut the final buffer on a
char boundary (like the existing is_char_boundary logic) before converting to
&str, and map any stream/IO errors into the same ToolResult::error path so
errors from the streaming read are returned consistently (update the
snippet/truncated logic around the resp handling and keep the same return
behavior).

In `@src/openhuman/tools/impl/system/lsp.rs`:
- Around line 131-157: These tests mutate the process-wide env var
LSP_ENABLED_ENV and can race when run in parallel; wrap the body of
lsp_capability_gate_off_by_default and lsp_capability_gate_accepts_truthy_values
with a shared process-wide mutex (e.g., a static Mutex via once_cell::sync::Lazy
or lazy_static) and acquire the lock at the start of each test before changing
the env, then release at the end (drop the guard) so tests run serially;
reference LSP_ENABLED_ENV, lsp_capability_enabled, and the two test functions to
locate where to add the guard.

In `@src/openhuman/tools/ops.rs`:
- Around line 118-124: The TodoStore is being created inline inside
all_tools_with_runtime (via Arc::new(TodoStore::new()) passed to
TodoWriteTool::new) which makes todowrite state transient; instead hoist the
Arc<TodoStore> creation out of the registry constructor and accept or capture a
shared Arc<TodoStore> so the same store is reused across calls. Update the
all_tools_with_runtime signature (or its caller) to take an Arc<TodoStore>
parameter and pass that Arc into TodoWriteTool::new, or create a process-wide
singleton Arc<TodoStore> and use that, ensuring TodoWriteTool::new receives the
shared store rather than constructing TodoStore::new inline.

---

Nitpick comments:
In `@src/openhuman/tools/impl/filesystem/apply_patch.rs`:
- Around line 69-231: Add grep-friendly debug/trace logs in execute to make
validation and writes observable: log entry into execute, each validation
rejection branch (e.g., path not allowed, rate limit, missing/empty fields,
old_string not found, file too large, symlink reject, resolve/read/write
failures) using stable prefixes like "[rpc]" or "[ui-flow]" and redact file
contents; log per-file successful write completions and the final summary. Place
these logs near the relevant symbols/blocks: at the start of async fn
execute(&self,...), inside the parsed loop where ParsedEdit is constructed and
checks (old_string empty, is_path_allowed), in the path resolution/metadata
block before inserting FileBuffer, when counting matches on buf.contents and
before returning errors, and after tokio::fs::write for each FileBuffer; use
log::debug or tracing::trace consistently and include identifying context
(edit.index, edit.path, count) but never print file contents or secrets.

In `@src/openhuman/tools/impl/filesystem/edit_file.rs`:
- Around line 59-162: Add grep-friendly debug/trace logging to the execute()
flow: log entry/exit for execute(), key branch outcomes (security.can_act,
is_rate_limited, is_path_allowed, is_resolved_path_allowed, record_action),
filesystem calls/results (resolved path from tokio::fs::canonicalize, symlink
check via tokio::fs::symlink_metadata, read_to_string contents length,
replacement count and whether replace_all was used), and errors returned when
canonicalize/read_to_string/write fail; use the existing logger (or introduce
one if missing) and include the path/resolved.display(), count, and error
messages in logs while keeping error messages returned via ToolResult unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6353139b-4faf-4a3a-9825-9cb2f350a224

📥 Commits

Reviewing files that changed from the base of the PR and between 50d109b and 38025e0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • CLAUDE.md
  • Cargo.toml
  • docs/CODING_HARNESS.md
  • src/openhuman/tools/impl/agent/mod.rs
  • src/openhuman/tools/impl/agent/plan_exit.rs
  • src/openhuman/tools/impl/agent/todo_write.rs
  • src/openhuman/tools/impl/filesystem/apply_patch.rs
  • src/openhuman/tools/impl/filesystem/edit_file.rs
  • src/openhuman/tools/impl/filesystem/glob_search.rs
  • src/openhuman/tools/impl/filesystem/grep.rs
  • src/openhuman/tools/impl/filesystem/list_files.rs
  • src/openhuman/tools/impl/filesystem/mod.rs
  • src/openhuman/tools/impl/network/mod.rs
  • src/openhuman/tools/impl/network/web_fetch.rs
  • src/openhuman/tools/impl/system/lsp.rs
  • src/openhuman/tools/impl/system/mod.rs
  • src/openhuman/tools/ops.rs

Comment thread docs/CODING_HARNESS.md Outdated
Comment thread src/openhuman/tools/impl/filesystem/apply_patch.rs Outdated
Comment thread src/openhuman/tools/impl/filesystem/apply_patch.rs Outdated
Comment thread src/openhuman/tools/impl/filesystem/edit_file.rs
Comment thread src/openhuman/tools/impl/filesystem/edit_file.rs Outdated
Comment thread src/openhuman/tools/impl/filesystem/list_files.rs Outdated
Comment thread src/openhuman/tools/impl/network/web_fetch.rs
Comment thread src/openhuman/tools/impl/network/web_fetch.rs
Comment thread src/openhuman/tools/impl/system/lsp.rs
Comment thread src/openhuman/tools/ops.rs
…yhumansai#1205)

- grep: walk back to UTF-8 char boundary before slicing line preview
  (previously could panic on multi-byte chars near the 2KB cap).
- edit, apply_patch: check symlink on the *unresolved* path —
  `canonicalize` resolves symlinks, so the prior post-canonicalize
  check never fired. Also split metadata/symlink check to keep size
  + symlink guards correct.
- apply_patch: best-effort rollback on partial-write failure. We
  cannot get true multi-file atomicity without filesystem
  transactions, but if the i-th write fails we now restore the
  previously-written files from in-memory snapshots taken at read
  time. Stored as a new `original` field on `FileBuffer`.
- list: surface directory-iter errors instead of treating any `Err`
  from `next_entry()` as end-of-stream (silently truncating output).
- web_fetch: disable reqwest's default redirect policy. A 3xx target
  could be on a host outside the allowed-domains list, so we now
  surface the redirect to the caller and require an explicit refetch.
- todo_write: hoist the `TodoStore` to a process-global `OnceCell`
  so todo state survives registry rebuilds. The previous
  `Arc::new(TodoStore::new())` baked into `all_tools_with_runtime`
  would reset the list whenever the agent loop rebuilt its tool set.
- lsp tests: serialize env-var mutation behind a module-level mutex
  so the truthy/falsy and gate-off-by-default tests don't race.
- docs/CODING_HARNESS.md: rewrap so a sentence-leading `tinyhumansai#1205` doesn't
  get parsed as a Markdown heading.

Replied inline on the `record_action()` ordering comment dismissing
it: the call-before-validate ordering is intentional (matches
`file_read.rs:65`) and prevents probing-via-error-path.
@senamakel senamakel merged commit e945390 into tinyhumansai:main May 5, 2026
21 of 23 checks passed
jwalin-shah added a commit to jwalin-shah/openhuman that referenced this pull request May 5, 2026
* feat(remotion): Ghosty character library with transparent MOV variants (tinyhumansai#1059)

Co-authored-by: WOZCODE <contact@withwoz.com>

* feat(composio/gmail): sync into memory tree (Slack-parity) (tinyhumansai#1056)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(scheduler-gate): throttle background AI on battery / busy CPU (tinyhumansai#1062)

* fix(core,cef): run core in-process and stop orphaning CEF helpers on Cmd+Q (tinyhumansai#1061)

* ci: add dedicated staging release workflow (tinyhumansai#1066)

* fix(sentry): Rust source context + per-release deploy marker (tinyhumansai#405) (tinyhumansai#1067)

* fix(welcome): re-enable OAuth buttons with focus/timeout recovery (tinyhumansai#1049) (tinyhumansai#1069)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(dependencies): update pnpm-lock.yaml and Cargo.lock for package… (tinyhumansai#1082)

* fix(onboarding): personalize welcome agent greeting with user identity (tinyhumansai#1078)

* fix(chat): make agent message bubbles fit content width (tinyhumansai#1083)

* Feat/dmg checks (tinyhumansai#1084)

* fix(linux): Add X11 platform flags to .deb package launcher (tinyhumansai#1087)

Co-authored-by: unn-Known1 <unn-known1@users.noreply.github.com>

* fix(sentry): auto-send React events; collapse core→tauri for desktop (tinyhumansai#1086)

Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>

* fix(cef): run blank reload guard on the CEF UI thread (tinyhumansai#1092)

* fix(app): reload webview instead of restart_app in dev mode (tinyhumansai#1068) (tinyhumansai#1071)

* fix(linux): deliver X11 ozone flags via custom .desktop template (tinyhumansai#1091)

* fix(webview-accounts): retry data-dir purge so CEF handle race doesn't leak cookies (tinyhumansai#1076) (tinyhumansai#1081)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>

* fix(webview/slack): media perms + deep-link isolation (tinyhumansai#1074) (tinyhumansai#1080)

Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>

* ci(release): split staging vs production workflows; promote staging tags (tinyhumansai#1094)

* Update release-staging.yml (tinyhumansai#1097)

* chore(staging): v0.53.5

* chore(staging): v0.53.6

* ci(staging): cut staging from main; add act local-debug helper (tinyhumansai#1099)

* chore(staging): v0.53.7

* fix(ci): correct sentry-cli download URL and trap scope (tinyhumansai#1100)

* chore(staging): v0.53.8

* feat(chat): forward thread_id to backend for KV cache locality (tinyhumansai#1095)

* fix(ci): bump pinned sentry-cli to 3.4.1 (2.34.2 was never published) (tinyhumansai#1102)

* chore(staging): v0.53.9

* fix(ci): drop bash trap in upload_sentry_symbols.sh; inline cleanup (tinyhumansai#1103)

* chore(staging): v0.53.10

* refactor(session): flatten session_raw/, switch md to YYYY_MM_DD (tinyhumansai#1098)

* Add full Composio managed-auth toolkit catalog (tinyhumansai#1093)

* ci: add diff-aware 80% coverage gate (Vitest + cargo-llvm-cov) (tinyhumansai#1104)

* feat(scripts): pnpm work + pnpm debug for agent-driven workflows (tinyhumansai#1105)

* ci: pull pnpm into CI image, drop redundant setup steps (tinyhumansai#1107)

* docs: add Cursor Cloud specific instructions to AGENTS.md (tinyhumansai#1106)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* chore(staging): v0.53.11

* docs: surface 80% coverage gate and scripts/debug runners (tinyhumansai#1108)

* feat(app): show Composio integrations as sorted icon grid on Skills (tinyhumansai#1109)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* feat(composio): client-side trigger enable/disable toggles (tinyhumansai#1110)

* feat(skills): channels grid + integrations card polish; tolerant Composio trigger decode (tinyhumansai#1112)

* chore(staging): v0.53.12

* feat(home): early-bird banner + assistant→agent terminology (tinyhumansai#1113)

* feat(updater): in-app auto-update with auto-download + restart prompt (tinyhumansai#677) (tinyhumansai#1114)

* chore(claude): add ship-and-babysit slash command (tinyhumansai#1115)

* feat(home): EarlyBirdyBanner + agent terminology + LinkedIn enrichment model pin (tinyhumansai#1118)

* fix(chat): single onboarding thread in sidebar after wizard (tinyhumansai#1116)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Steven Enamakel <senamakel@users.noreply.github.com>

* fix: filter out global namespace from citation chips (tinyhumansai#1124)

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: senamakel-droid <281415773+senamakel-droid@users.noreply.github.com>

* feat(nav): enable Memory tab in BottomTabBar (tinyhumansai#1125)

* feat(memory): singleton ingestion + status RPC + UI pill (tinyhumansai#1126)

* feat(human): mascot tab with viseme-driven lipsync (staging only) (tinyhumansai#1127)

* Fix CEF zombie processes on full app close and restart (tinyhumansai#1128)

Co-authored-by: senamakel-droid <281415773+senamakel-droid@users.noreply.github.com>
Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>

* Update issue templates for GitHub issue types (tinyhumansai#1146)

* feat(human): expand mascot expressions and tighten reply-speech state machine (tinyhumansai#1147)

* feat(memory): ingestion pipeline + tree-architecture docs + ops/schemas split (tinyhumansai#1142)

* feat(threads): surface live subagent work in parent thread (tinyhumansai#1122) (tinyhumansai#1159)

* fix(human): keep mascot mouth animating when TTS ships no viseme data (tinyhumansai#1160)

* feat(composio): consume backend markdownFormatted for LLM output (tinyhumansai#1165)

* fix(subagent): lazy-register toolkit actions filtered out of fuzzy top-K (tinyhumansai#1162)

* feat(memory): user-facing long-term memory window preset (tinyhumansai#1137) (tinyhumansai#1161)

* fix(tauri-shell): proactively kill stale openhuman RPC on startup (tinyhumansai#1166)

* chore(staging): v0.53.13

* fix(composio): per-action tool consumes backend markdownFormatted (tinyhumansai#1167)

* fix(threads): persist selectedThreadId across reloads (tinyhumansai#1168)

* feat(memory_tree): switch embed model to bge-m3 (1024-dim, 8K context) (tinyhumansai#1174)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent): drop redundant [Memory context] recall injection (tinyhumansai#1173)

* chore(memory_tree): drop body-read timeouts on Ollama HTTP calls (tinyhumansai#1171)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(transcript): emit thread_id + fix orchestrator missing cost (tinyhumansai#1169)

* fix(composio/gmail): phase out html2md, prefer text/plain MIME part (tinyhumansai#1170)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): markdown output for internal tool results (tinyhumansai#1172)

* feat(security): enforce prompt-injection guard before model and tool execution (tinyhumansai#1175)

* fix(cef): popup paint dies after first frame — skip blank-page guard for popups (tinyhumansai#1079) (tinyhumansai#1182)

Co-authored-by: Steven Enamakel <31011319+senamakel@users.noreply.github.com>

* chore(sentry): rename OPENHUMAN_SENTRY_DSN → OPENHUMAN_CORE_SENTRY_DSN (tinyhumansai#1186)

* feat(remotion): add yellow mascot character with all animation variants (tinyhumansai#1193)

Co-authored-by: Neel Mistry <neelmistry@Neels-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(composio): hide raw connection ID, derive friendly label (tinyhumansai#1153) (tinyhumansai#1185)

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* fix(windows): align install.ps1 MSI with per-machine scope (tinyhumansai#913) (tinyhumansai#1187)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(tauri): deterministic CEF teardown on full app close (tinyhumansai#1120) (tinyhumansai#1189)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(composio): cap Gmail HTML body before strip (crash mitigation) (tinyhumansai#1191)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(auth): stop stale chat threads after signup (tinyhumansai#1192)

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(sentry): staging-only "Trigger Sentry Test" button (tinyhumansai#1072) (tinyhumansai#1183)

* chore(staging): v0.53.14

* chore(staging): v0.53.15

* feat(composio): format trigger slugs into human-readable labels (tinyhumansai#1129) (tinyhumansai#1179)

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* fix(ui): hide unsupported permission UI on non-macOS for Screen Intelligence (tinyhumansai#1194)

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tauri-shell): retire embedded Gmail webview-account flow (tinyhumansai#1181)

* feat(onboarding): replace welcome-agent bot with react-joyride walkthrough (tinyhumansai#1180)

* chore(release): v0.53.16

* fix(threads): preserve selectedThreadId on cold-boot identity hydration (tinyhumansai#1196)

* feat(core): version/shutdown/update RPCs + mid-thread integration refresh (tinyhumansai#1195)

* fix(mascot): swap to yellow mascot via @remotion/player (tinyhumansai#1200)

* feat(memory_tree): cloud-default LLM, queue priority, entity filter, Memory tab UI (tinyhumansai#1198)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Persist turn state + restore conversation history on cold-boot (tinyhumansai#1202)

* feat(mascot): floating desktop mascot via native NSPanel + WKWebView (macOS) (tinyhumansai#1203)

* fix(memory/tree): emit summary children as Obsidian wikilinks (tinyhumansai#1210)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): coding-harness baseline primitives (tinyhumansai#1205) (tinyhumansai#1208)

* docs: add Codex PR checklist for remote agents

---------

Co-authored-by: Steven Enamakel <31011319+senamakel@users.noreply.github.com>
Co-authored-by: WOZCODE <contact@withwoz.com>
Co-authored-by: sanil-23 <sanil@vezures.xyz>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cyrus Gray <144336577+graycyrus@users.noreply.github.com>
Co-authored-by: CodeGhost21 <164498022+CodeGhost21@users.noreply.github.com>
Co-authored-by: oxoxDev <164490987+oxoxDev@users.noreply.github.com>
Co-authored-by: Mega Mind <146339422+M3gA-Mind@users.noreply.github.com>
Co-authored-by: Gaurang Patel <ptelgm.yt@gmail.com>
Co-authored-by: unn-Known1 <unn-known1@users.noreply.github.com>
Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Steven Enamakel <senamakel@users.noreply.github.com>
Co-authored-by: Steven Enamakel's Droid <enamakel.agent@tinyhumans.ai>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: senamakel-droid <281415773+senamakel-droid@users.noreply.github.com>
Co-authored-by: YellowSnnowmann <167776381+YellowSnnowmann@users.noreply.github.com>
Co-authored-by: Neil <neil@maha.xyz>
Co-authored-by: Neel Mistry <neelmistry@Neels-MacBook-Pro.local>
Co-authored-by: obchain <167975049+obchain@users.noreply.github.com>
Co-authored-by: Jwalin Shah <jshah1331@gmail.com>
senamakel added a commit that referenced this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add coding-harness baseline tools and control flows

1 participant