Skip to content

fix(windows): retry-with-backoff for transient FS errors on auth-profiles.lock + .openhuman wipe (#9E, #9C, #4Y, #61, #5Q, #9F, #4M)#1641

Merged
senamakel merged 8 commits into
tinyhumansai:mainfrom
oxoxDev:fix/windows-fs-retry
May 14, 2026
Merged

fix(windows): retry-with-backoff for transient FS errors on auth-profiles.lock + .openhuman wipe (#9E, #9C, #4Y, #61, #5Q, #9F, #4M)#1641
senamakel merged 8 commits into
tinyhumansai:mainfrom
oxoxDev:fix/windows-fs-retry

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented May 13, 2026

Summary

  • New shared helpers retry_with_backoff / retry_with_backoff_async in src/openhuman/util.rs that retry on Windows transient FS error codes (5, 32, 33, 1224).
  • AuthProfilesStore::acquire_lock now retries lock creation 6× with exponential backoff (100ms base).
  • memory::tree::wipe_all_rpc + reset_tree_rpc now retry .openhuman tree removal 6× with 200ms base.
  • Final-failure path still calls report_error — only the transient-retry phase is quieted.

Problem

  • Seven Sentry issues on Windows clients fired from one of two transient-handle races:
    • Failed to create auth profile lock at C:\Users\…\auth-profiles.lock (OPENHUMAN-TAURI-9E ×1, -9C ×1, -4Y ×1, -61 ×1; read-path -5Q ×1) — CEF subprocess / AV scanner holds a handle on the parent dir for a sub-second window when openhuman tries to create the lock file. Windows file locking is mandatory, so create_new fails with ERROR_SHARING_VIOLATION (32) or ERROR_ACCESS_DENIED (5) instead of blocking.
    • Failed to remove C:\Users\…\.openhuman: The process cannot access the file because it is being used by another process. (os error 32) (OPENHUMAN-TAURI-9F ×1, -4M ×1) — same shape on tree-wipe; CEF profile / Ollama binary handle releases within seconds.
  • Hard-failing on the first attempt produced Sentry noise on every cold-start race; the operations succeed deterministically on retry.

Solution

  • src/openhuman/util.rs: new retry_with_backoff (sync, std::thread::sleep) and retry_with_backoff_async (tokio). Both honor a shared is_transient_fs_error classifier that matches Windows os_error codes 5, 32, 33, 1224. On non-Windows, transient classification returns false (POSIX EACCES is not transient — surface immediately).
  • src/openhuman/credentials/profiles.rs: acquire_lock wraps OpenOptions::create_new with retry_with_backoff("create auth profile lock", 6, 100, …). The existing AlreadyExists busy-wait loop is preserved — that's the legitimate "another openhuman PID is signing in" case. Only the unexpected error path retries.
  • src/openhuman/memory/tree/read_rpc.rs: wipe_all_rpc + reset_tree_rpc separate the SQLite truncation (still spawn_blocking) from the filesystem cleanup (now tokio::fs wrapped in retry_with_backoff_async). Avoids blocking the executor while retries are sleeping.
  • Logging: mid-retry tracing::warn! with structured tags (op, attempt, retry_in_ms, error) keeps the diagnostic locally visible; final-failure surfaces unwrapped so report_error still emits the Sentry event when retries are exhausted. First-attempt success is silent.
  • Tests: 5 new unit tests in src/openhuman/util.rs — immediate success, success-after-retries (sync + async), failure-after-all-attempts, non-transient bails immediately. Cross-platform tests use a __TEST_TRANSIENT__ error-message marker so the retry logic exercises on POSIX CI.

Submission Checklist

  • Tests added or updated — 5 unit tests in src/openhuman/util.rs covering happy path + failure modes + non-transient short-circuit.
  • N/A: Diff coverage ≥ 80% — touched lines are mostly retry plumbing; helpers exercised by the new unit tests; call sites are integration paths covered by their existing tests (cargo test --lib openhuman::credentials passes 112/0).
  • N/A: Coverage matrix updated — behaviour-only retry layer, no new feature row.
  • N/A: All affected feature IDs from the matrix — see above.
  • No new external network dependencies introduced.
  • N/A: Manual smoke checklist — does not touch release-cut surfaces.
  • Linked issues closed via Closes in ## Related.

Impact

  • Stops ~7 events / 3d on the Windows auth-lock and tree-wipe paths.
  • Eliminates a class of cold-start failures users hit on Windows when AV scanners / CEF subprocesses hold transient handles. The retry loop is bounded (worst-case ~3-12 seconds before surfacing the real failure), so there's no risk of indefinite hang.
  • No POSIX behavior change — is_transient_fs_error returns false on non-Windows and the retry short-circuits.

Related

  • Closes OPENHUMAN-TAURI-9E
  • Closes OPENHUMAN-TAURI-9C
  • Closes OPENHUMAN-TAURI-4Y
  • Closes OPENHUMAN-TAURI-61
  • Closes OPENHUMAN-TAURI-5Q
  • Closes OPENHUMAN-TAURI-9F
  • Closes OPENHUMAN-TAURI-4M

AI Authored PR Metadata (required for Codex/Linear PRs)

  • Authored by: Google Jules (cloud agent, session 2289277133504848069)
  • Jules session status flipped to Failed at the publish stage, but the produced patch was complete and clean. Pulled via jules remote pull --session, applied to a fresh branch off upstream/main with a single conflict in src/openhuman/util.rs (resolved by keeping HEAD's existing truncate tests + appending Jules's new retry tests).
  • Cherry-pick + conflict-resolve + 3-commit split done orchestrator-side by Claude.

Summary by CodeRabbit

  • Bug Fixes

    • Improved lock-file acquisition with targeted retries for transient errors.
    • Enhanced wipe/reset workflows to tolerate "not found" and transient filesystem failures.
  • Refactor

    • Moved on-disk cleanup out of blocking tasks into non-blocking async operations to reduce executor blocking and streamline response assembly.
  • Chores / Tests

    • Added robust retry utilities and unit tests for retry success, eventual failure, and non-retryable errors.

Review Change Stack

@oxoxDev oxoxDev requested a review from a team May 13, 2026 12:55
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@senamakel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 12 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d03ab8ec-4a6a-41fa-b2b6-9a7aba857fb6

📥 Commits

Reviewing files that changed from the base of the PR and between fc16fd5 and fd075db.

📒 Files selected for processing (17)
  • .github/workflows/contributor-rewards.yml
  • CONTRIBUTING.md
  • app/src/utils/__tests__/toolTimelineFormatting.test.ts
  • app/src/utils/toolTimelineFormatting.ts
  • docs/CONTRIBUTOR-REWARDS.md
  • src/openhuman/agent/agents/orchestrator/prompt.md
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/credentials/profiles.rs
  • src/openhuman/memory/tree/read_rpc.rs
  • src/openhuman/tools/impl/agent/dispatch.rs
  • src/openhuman/tools/impl/agent/skill_delegation.rs
  • src/openhuman/tools/orchestrator_tools.rs
  • src/openhuman/util.rs
📝 Walkthrough

Walkthrough

Adds synchronous and asynchronous exponential-backoff retry helpers with transient filesystem error classification; applies them to lock-file creation and moves RPC directory deletions into async context with retries for transient filesystem sharing/locking failures.

Changes

Retry infrastructure and resilient filesystem operations

Layer / File(s) Summary
Retry utilities and tests
src/openhuman/util.rs
Adds retry_with_backoff, retry_with_backoff_async, and is_transient_fs_error, with unit tests for immediate success, success after retries, retry exhaustion, non-transient bailout, and invalid-attempts validation.
Lock acquisition using retry utility
src/openhuman/credentials/profiles.rs
AuthProfilesStore::acquire_lock now wraps lock-file creation with retry_with_backoff and inspects the error chain for AlreadyExists; only AlreadyExists follows the previous timeout/sleep loop, other errors are returned with context.
RPC handlers refactored for async filesystem operations
src/openhuman/memory/tree/read_rpc.rs
wipe_all_rpc and reset_tree_rpc move on-disk directory deletions out of spawn_blocking into async context using retry_with_backoff_async + tokio::fs::remove_dir_all. Blocking tasks now return tuples of DB/work counts; response construction, worker wake, and filesystem cleanup happen asynchronously. Missing directories are non-fatal; transient FS errors are retried.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐇 I nudge the files, I try, I wait,

backoff counts each gentle state,
async hops clear locks and grime,
retries hum in patient time,
the rabbit smiles — the code feels great.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding retry-with-backoff for transient filesystem errors affecting auth-profiles.lock creation and .openhuman directory operations on Windows.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/util.rs (1)

321-355: ⚡ Quick win

Demote retry-churn logs to debug/trace.

These failures are expected transient noise in the exact path this PR is trying to quiet. Logging every retry at warn! and every recovery at info! will still spam normal logs; keep the final exhausted failure surfaced by callers, but log the per-retry churn here at debug/trace instead. As per coding guidelines, "Use log / tracing at debug / trace level; include stable grep-friendly prefixes ([domain], [rpc]) and correlation fields; never log secrets or full PII".

Also applies to: 383-417

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/util.rs` around lines 321 - 355, The retry churn logs inside
the retry loop should be lowered from warn/info to debug/trace: change the
tracing::warn! that logs each failed attempt (includes fields op = op_name,
attempt, max_attempts, error = %e, retry_in_ms) to tracing::debug! or
tracing::trace!, and change the tracing::info! on successful recovery (which
logs op_name and retries = i) to tracing::debug!/trace! as well; keep the same
structured fields (op = op_name, retries, attempt, max_attempts, error = %e,
retry_in_ms) and the "[util]" grep-friendly prefix, and leave terminal/exhausted
failures returned to callers unchanged (code paths that set last_err/return
Err(e) remain the same).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/util.rs`:
- Around line 307-365: Reject zero attempts before entering the loop: in
retry_with_backoff check if attempts == 0 and immediately return an
anyhow::Error (e.g. anyhow::anyhow!("{} requires attempts > 0", op_name) or
similar) instead of letting the loop skip and hit expect; apply the same
precondition check to the other retry helper in this file (the second retry
function around lines 368-427) so both public utilities return a proper error
for attempts == 0 rather than panicking.

---

Nitpick comments:
In `@src/openhuman/util.rs`:
- Around line 321-355: The retry churn logs inside the retry loop should be
lowered from warn/info to debug/trace: change the tracing::warn! that logs each
failed attempt (includes fields op = op_name, attempt, max_attempts, error = %e,
retry_in_ms) to tracing::debug! or tracing::trace!, and change the
tracing::info! on successful recovery (which logs op_name and retries = i) to
tracing::debug!/trace! as well; keep the same structured fields (op = op_name,
retries, attempt, max_attempts, error = %e, retry_in_ms) and the "[util]"
grep-friendly prefix, and leave terminal/exhausted failures returned to callers
unchanged (code paths that set last_err/return Err(e) remain the same).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f8b6a5c-af34-4f5d-89af-f3061bb77bc3

📥 Commits

Reviewing files that changed from the base of the PR and between 7beed3a and 5b35659.

📒 Files selected for processing (3)
  • src/openhuman/credentials/profiles.rs
  • src/openhuman/memory/tree/read_rpc.rs
  • src/openhuman/util.rs

Comment thread src/openhuman/util.rs
oxoxDev added a commit to oxoxDev/openhuman that referenced this pull request May 13, 2026
…i#1641 CR)

CodeRabbit major: both retry_with_backoff and retry_with_backoff_async
skipped the loop body when attempts == 0 and panicked at
`last_err.expect("attempts > 0")`. As public utilities, they should
fail gracefully — surface anyhow::ensure! at the entry point so the
caller gets a normal Err.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented May 13, 2026

Addressed in 378e633: both helpers now reject attempts == 0 cleanly via anyhow::ensure! before loop entry (no more expect() panic). Added regression tests test_retry_with_backoff_rejects_zero_attempts and test_retry_with_backoff_async_rejects_zero_attempts — both assert the error message contains "requires attempts > 0" and confirm the closure never runs.

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 13, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #1641 Review — fix(windows): retry-with-backoff for transient FS errors

Walkthrough

This PR surgically targets a class of Windows-specific cold-start failures where mandatory file locking by CEF subprocesses, AV scanners, or the Ollama binary causes one-shot filesystem operations to fail with ERROR_SHARING_VIOLATION (32), ERROR_ACCESS_DENIED (5), ERROR_LOCK_VIOLATION (33), or ERROR_USER_MAPPED_FILE (1224). The fix adds two generic retry helpers — a sync version (using std::thread::sleep) and an async version (using tokio::time::sleep) — to src/openhuman/util.rs, then wires them into the two affected code paths: lock-file creation in AuthProfilesStore::acquire_lock and filesystem tree removal in wipe_all_rpc / reset_tree_rpc.

The approach is well-scoped, the test coverage is solid, and the helper design is reusable for future FS-touching code.

Changes

File Summary
src/openhuman/util.rs Adds retry_with_backoff (sync), retry_with_backoff_async (tokio), and is_transient_fs_error (Windows OS error codes 5/32/33/1224), plus 7 unit tests
src/openhuman/credentials/profiles.rs acquire_lock wraps OpenOptions::create_new with retry_with_backoff; AlreadyExists detection via anyhow chain inspection
src/openhuman/memory/tree/read_rpc.rs wipe_all_rpc and reset_tree_rpc split SQLite work from FS cleanup, use retry_with_backoff_async + tokio::fs::remove_dir_all

Actionable comments posted inline (4)

  1. [minor] util.rs — tracing macros duplicate every field in the message string
  2. [minor] read_rpc.rs:1359 — unnecessary .clone() on dirs_removed
  3. [minor] util.rs — missing async non-transient short-circuit test
  4. [minor] read_rpc.rs — doc comment step numbering no longer matches code order

Nitpicks (2)

  • util.rs — Public functions defined after #[cfg(test)] mod tests block. Convention is #[cfg(test)] at the bottom. Not a correctness issue but reads oddly top-to-bottom.
  • util.rs:3822u64.pow(i) is safe at current hardcoded params (attempts=6), but a saturating cap (e.g. sleep_ms.min(30_000)) would make the helper unconditionally safe for future callers.

Verified / looks good ✓

  • AlreadyExists chain walk in profiles.rs correctly pierces anyhow wrapping
  • is_transient_fs_error is #[cfg(windows)]-gated — POSIX behavior unchanged
  • cfg!(test) gate for __TEST_TRANSIENT__ is dead code in release builds
  • Sync retry_with_backoff only called from acquire_lock (already in spawn_blocking context)
  • NotFound handling in wipe_all_rpc correctly treated as no-op
  • 7 unit tests cover happy path, retries, exhaustion, non-transient bail, and attempts == 0 guard

Comment thread src/openhuman/util.rs Outdated
i + 1,
attempts,
e,
sleep_ms
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Both tracing::warn! and tracing::info! pass structured key-value fields (op, attempt, error, retry_in_ms) and then re-format all those same values into the message string. Structured log consumers (JSON formatter, Sentry, DataDog) will emit each field twice.

// before — fields duplicated in message
tracing::warn!(
    op = op_name,
    attempt = i + 1,
    max_attempts = attempts,
    error = %e,
    retry_in_ms = sleep_ms,
    "[util] {} failed (attempt {}/{}): {}. Retrying in {}ms...",
    op_name, i + 1, attempts, e, sleep_ms
);

// after — message is a static label; readers query structured fields
tracing::warn!(
    op = op_name,
    attempt = i + 1,
    max_attempts = attempts,
    error = %e,
    retry_in_ms = sleep_ms,
    "[util] transient fs error, will retry"
);

Same fix applies to the tracing::info! at the success-after-retry path, and to both copies in retry_with_backoff_async.

Comment thread src/openhuman/memory/tree/read_rpc.rs Outdated

let resp = WipeAllResponse {
rows_deleted,
dirs_removed: dirs_removed.clone(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] dirs_removed is a locally-owned Vec<String> that's moved into WipeAllResponse and never used again. The .clone() is unnecessary.

// before
dirs_removed: dirs_removed.clone(),

// after
dirs_removed,

Comment thread src/openhuman/util.rs
let mut calls = 0;
let result = retry_with_backoff("zero_sync", 0, 1, || {
calls += 1;
Ok::<i32, anyhow::Error>(42)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] There's test_retry_with_backoff_bail_on_non_transient for the sync helper but no equivalent for the async path. The async helper has the same if !is_transient_fs_error(&e) { return Err(e); } branch and it should be covered.

#[tokio::test]
async fn test_retry_with_backoff_async_bail_on_non_transient() {
    use std::sync::atomic::{AtomicU32, Ordering};
    let calls = AtomicU32::new(0);
    let result = retry_with_backoff_async("test_async_non_transient", 3, 1, || async {
        calls.fetch_add(1, Ordering::SeqCst);
        anyhow::bail!("permanent error");
        #[allow(unreachable_code)]
        Ok::<i32, anyhow::Error>(0)
    })
    .await;
    let err = result.unwrap_err();
    assert_eq!(err.to_string(), "permanent error");
    assert_eq!(calls.load(Ordering::SeqCst), 1);
}

Ok((chunks_requeued, jobs_enqueued))
Ok(total)
})?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The inline doc comments still describe the steps as 1 → 2 → 3 (truncate → remove wiki/summaries → re-enqueue chunks), but the refactored code runs them as 1 → 3 → 2 — the wiki/summaries removal now happens after chunk re-enqueue. The comment says "Step 3 — flip every chunk back…" but this block is now step 2 in execution order. Update the step numbering to reflect the actual new order, or future readers will be confused.

google-labs-jules Bot and others added 5 commits May 14, 2026 01:24
Add retry_with_backoff (sync) and retry_with_backoff_async (async) plus
is_transient_fs_error classifier. Used by Windows-only paths where
mandatory file locking trips ERROR_SHARING_VIOLATION (32),
ERROR_ACCESS_DENIED (5), ERROR_LOCK_VIOLATION (33), and
ERROR_USER_MAPPED_FILE (1224) on transient sub-second races between
CEF subprocess, antivirus scanner, and our own profile cleanup.

Sleep base_ms * 2^i between attempts; warn! on retry, info! on
success-after-retry. Final-failure path surfaces the original error
unwrapped so the caller's report_error funnel still sees a real Sentry
event.

Cross-platform: helper compiles on all targets but is_transient_fs_error
only matches Windows-specific os_error codes; POSIX errors fall through
and are surfaced immediately (no retry on UNIX EACCES — those are
genuinely permanent).

Co-Authored-By: oxoxDev <164490987+oxoxDev@users.noreply.github.com>
… sharing violations

Wrap OpenOptions::create_new on `auth-profiles.lock` in
retry_with_backoff (6 attempts, 100ms base ~3.1s total). On Windows,
sharing violations are transient — the holding process (often a stale
core sidecar PID or AV scanner) typically releases within seconds.
Hard-failing on the first attempt produced flood of OPENHUMAN-TAURI-9E,
-9C, -4Y, -61, and the read-path variant -5Q.

The existing AlreadyExists busy-wait loop is preserved — that's the
intentional contention case (another openhuman PID legitimately holds
the lock). Only the unexpected error path now retries.

Closes OPENHUMAN-TAURI-9E, OPENHUMAN-TAURI-9C, OPENHUMAN-TAURI-4Y,
OPENHUMAN-TAURI-61, OPENHUMAN-TAURI-5Q.

Co-Authored-By: oxoxDev <164490987+oxoxDev@users.noreply.github.com>
Wrap fs::remove_dir_all for memory_tree wipe + reset paths in
retry_with_backoff_async (6 attempts, 200ms base ~12.4s total). The
blocking SQL truncation stays in spawn_blocking; only the async
filesystem cleanup uses the retry helper.

Windows file-busy (os error 32) on user-data tree removal is invariably
caused by a transient handle in a CEF subprocess or AV scan — releases
within seconds in practice. The previous hard-fail produced
OPENHUMAN-TAURI-9F and -4M.

Closes OPENHUMAN-TAURI-9F, OPENHUMAN-TAURI-4M.

Co-Authored-By: oxoxDev <164490987+oxoxDev@users.noreply.github.com>
…i#1641 CR)

CodeRabbit major: both retry_with_backoff and retry_with_backoff_async
skipped the loop body when attempts == 0 and panicked at
`last_err.expect("attempts > 0")`. As public utilities, they should
fail gracefully — surface anyhow::ensure! at the entry point so the
caller gets a normal Err.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#1641)

- util.rs retry helpers: drop duplicate KV-vs-message-string interpolation
  in tracing::warn!/info!; structured fields are the canonical source.
- util.rs: add test_retry_with_backoff_async_bail_on_non_transient covering
  the missing async bail path (sync test already existed).
- memory/tree/read_rpc.rs: drop unnecessary dirs_removed.clone() —
  moved into WipeAllResponse directly.
- memory/tree/read_rpc.rs: update inline step-order doc comments to
  match the refactored 1→3→2 sequence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented May 13, 2026

graycyrus 4 minors addressed in fc16fd5:

  1. util.rs:394 — dropped duplicate KV-vs-message interpolation in both retry helpers; structured fields are the source.
  2. read_rpc.rs:1359 — removed unnecessary dirs_removed.clone().
  3. util.rs:299 — added test_retry_with_backoff_async_bail_on_non_transient covering the missing async bail path.
  4. read_rpc.rs:1457 — step-order doc comments updated to match refactored 1→3→2 sequence (and inline labels renumbered).

cargo test --lib openhuman::util — passes (23 tests now). fmt + check clean.

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 13, 2026
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented May 13, 2026

graycyrus 4 minors addressed in fc16fd5c:

  1. util.rs:394 — dropped duplicate KV-vs-message-string interpolation in both tracing::warn! and tracing::info! macros (sync + async retry helpers). Structured fields are the canonical source; message is now a short literal.
  2. read_rpc.rs:1359 — removed unnecessary dirs_removed.clone(); Vec<String> is moved straight into WipeAllResponse.
  3. util.rs:299 — added test_retry_with_backoff_async_bail_on_non_transient covering the missing async bail path (sync sibling test_retry_with_backoff_bail_on_non_transient already existed).
  4. read_rpc.rs:1457 — inline step-order doc comments updated to match the refactored 1 → 3 → 2 sequence (truncate → re-enqueue chunks → remove wiki/summaries).

cargo test --lib openhuman::util — 23 tests passing (+1 new async-bail). cargo fmt --check clean. No new clippy hits on touched files.

@senamakel senamakel self-assigned this May 14, 2026
…erflow & comment fixes (tinyhumansai#1641)

- credentials/profiles.rs: resolved merge conflict combining PR's
  retry_with_backoff wrapper with main's pid-write guard and
  clear_lock_if_stale stale-PID recovery (Issue tinyhumansai#1612)
- util.rs: replace base_ms * 2u64.pow(i) with
  saturating_mul/saturating_pow capped at 30 000 ms to prevent
  overflow on large attempt counts in both sync and async retry helpers
- memory/tree/read_rpc.rs: clarify wake_workers() comment to reflect
  post-cleanup ordering (prevents reader confusion about race window)
- cargo fmt: reformat saturating chain onto single line
@senamakel
Copy link
Copy Markdown
Member

Deferred review notes from pr-manager (post-merge-conflict pass)

These were flagged during the reviewer agent's pass but deferred for human consideration — they are non-blocking nitpicks and one question for the author.


Nitpicks

src/openhuman/util.rs:489#[cfg(not(windows))] block with let _ = io_err is noisier than it needs to be. The binding only silences an unused-variable warning that only exists on non-Windows builds. A tighter alternative is to use #[allow(unused_variables)] scoped to that block, or simply pattern-match the io_err where it's actually used on Windows, so the non-Windows arm disappears entirely.

src/openhuman/util.rs:272, 286 — The #[allow(unreachable_code)] Ok::<i32,_>(0) dance in each test closure is needed to give the compiler a type hint after anyhow::bail!, but it reads awkward. Consider annotating the closure's return type explicitly (|| -> anyhow::Result<i32> { ... }) so the Ok::<i32,_>(0) placeholder and the allow attribute can be dropped.

src/openhuman/memory/tree/read_rpc.rs:1325format!("remove dir {}", dir) allocates a fresh String on every retry iteration (up to 6× per dir, 6 dirs). Since dir is &'static str, a lazy-format or a concatenation with a static prefix would avoid the allocation on the success path. Simplest fix: pass dir directly as the op_name (accepting that the log says raw rather than remove dir raw), or concatenate once before the retry_with_backoff_async call.

src/openhuman/util.rs:361 — Public items (retry_with_backoff, retry_with_backoff_async) appear below the #[cfg(test)] module in source order. Convention is public API at the top of the module, test module at the bottom. Consider reordering for readability.


Question for author

src/openhuman/util.rsis_transient_fs_error test-bypass scope

is_transient_fs_error is pub and contains a cfg!(test) branch that treats any error whose message contains __TEST_TRANSIENT__ as transient. cfg!(test) is true for the Rust test harness (cargo test) but also for any integration-test binary or fuzz harness built with --cfg test. If an external integration harness (e.g. tests/json_rpc_e2e.rs compiled as a test binary) ever passes error strings through this function, the bypass would be active.

Should the hook be narrowed with #[cfg(test)] at the item level rather than the inline cfg!(test) call? That would restrict it to crate-internal test builds only and prevent the bypass from leaking into integration-test contexts that import openhuman_core as a dependency.

@senamakel senamakel merged commit 5e6073b into tinyhumansai:main May 14, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants