fix(credentials): diagnose + recover from H8 auth-profile-lock create failures#2180
Conversation
…enhance error context in auth profile lock creation
…tion failure in read-only state directory
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds richer io::ErrorKind/raw_os_error diagnostics when lock-file creation fails and treats Windows ERROR_DELETE_PENDING (303) as a transient filesystem error; includes unit tests for the annotation helper and the new transient classification. ChangesAuth Profile Lock Resilience
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…e lock creation into a separate function for improved testability and clarity
…creation failures, ensuring platform independence and clarity
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/credentials/profiles_tests.rs`:
- Around line 454-468: The test constructs a platform-specific raw OS error code
incorrectly by hardcoding 13; update the test in profiles_tests.rs so io_err is
set with cfg attributes (e.g., #[cfg(windows)] use 5 / ERROR_ACCESS_DENIED and
#[cfg(not(windows))] use 13 / EACCES) before calling
annotate_lock_create_failure, and change the subsequent os_code assertion to
expect the platform-specific value (use the same cfg to set the expected_os_code
variable or two separate assertions) so the test validates the real OS error
code on Windows and non-Windows platforms while keeping the checks for the
top-level message and ErrorKind.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9aeb69de-2b61-4f3b-a03d-6e7130829740
📒 Files selected for processing (3)
src/openhuman/credentials/profiles.rssrc/openhuman/credentials/profiles_tests.rssrc/openhuman/util.rs
…reation failure tests for better accuracy and consistency
Summary
io::ErrorKindandraw_os_error()in the surfaced context for "Failed to create auth profile lock" so Sentry fingerprints split by root-cause OS code instead of collapsing into one ticket.ERROR_DELETE_PENDING(OS code 303) as a transient FS error soretry_with_backoffrides out the AV / Search-indexer race that left the lock file in delete-pending limbo.Problem
Sentry OPENHUMAN-TAURI-H8 ("Failed to create auth profile lock") was unresolved with 822 events, priority high, substatus escalating, last seen 2026-05-17. Tags showed:
os.name: macOS 659 / windows 180domain = rpc,operation = invoke_methodmethod:openhuman.app_state_snapshot(710),openhuman.billing_get_current_plan(68),openhuman.team_get_usage(57)elapsed_ms: 1–4 ms for ~95% of eventsTwo issues:
AuthProfilesStore::acquire_lockwraps every non-AlreadyExistsOpenOptions::create_newfailure with a single static string. Every distinct OS errno collapses into one Sentry fingerprint, so we cannot tell which underlying error is hot — the 659 macOS events are opaque.create_newimmediately after the prior owner'sfs::remove_filecan returnERROR_DELETE_PENDING(303) when AV / Search-indexer still holds a handle. Until nowis_transient_fs_erroronly matched 5/32/33/1224, so 303 returned akind = Otherio::Error thatretry_with_backofftreated as fatal and bailed at attempt 1 (~2 ms — matching theelapsed_ms = 2on the latest H8 event forteam_get_usageon Windows 10.0.26200).The branch is companion to OPENHUMAN-TAURI-H1 ("Timed out waiting for auth profile lock") which was already resolved by stale-PID recovery + retry-with-backoff in #1641 / #1636.
Solution
src/openhuman/credentials/profiles.rs— in the non-transientErrbranch of thecreate_newmatch, walk the anyhow chain for anio::Error, pull itskind()andraw_os_error(), and embed both in thewith_contextformat string:The top-level message stays stable ("Failed to create auth profile lock …") so the existing Sentry rule keeps matching, but the trailing (kind=…, os_code=…) makes the fingerprint split per OS code. Once this ships, future events will tell us exactly which macOS errno dominates the 659-event cluster, unlocking a one-line follow-up to is_transient_fs_error on the macOS branch.
src/openhuman/util.rs— add 303 to the Windows match in is_transient_fs_error alongside 5/32/33/1224, with an inline comment documenting the AV/indexer delete-pending race and the H8 evidence that justifies it. retry_with_backoff (6 attempts, 100 ms base, exponential up to 30 s) now absorbs the race for callers of acquire_lock.src/openhuman/credentials/profiles_tests.rs— #[cfg(unix)] test forces a non-AlreadyExists create_new failure by stripping write permission off the state dir (chmod 0500), then asserts the surfaced error string contains the stable top-level message, kind=, and os_code=. Doesn't assert on the kind itself (Windows would surface a different one for the same scenario), only that the diagnostic markers are present.src/openhuman/util.rs— #[cfg(windows)] test builds io::Error::from_raw_os_error(303), wraps in anyhow::Error::new, and asserts is_transient_fs_error returns true.Design notes / tradeoffs
Submission Checklist
Impact
Related
Summary by CodeRabbit
Bug Fixes
Tests