fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)#2061
Conversation
…ation and self-repair functionality
…AURI-GN) - Use USERDOMAIN\USERNAME for icacls grant so domain/AAD-joined accounts resolve correctly; bare USERNAME locked out users on corporate machines - Add icacls /reset fallback when grant fails to restore inherited ACLs instead of leaving the file with no ACEs (permanent lockout) - Add self-repair path in load_or_create_key: on PermissionDenied, run icacls /reset automatically and retry before surfacing the error — heals existing locked files on next launch with no data loss - Add rust-core-tests-windows CI job on windows-latest so all #[cfg(windows)] secrets tests run on every PR - Fix locked_key_file_fails_gracefully_on_unix: cache key mismatch on macOS (/tmp symlink → /private/tmp) caused false cache miss; explicitly clear cache and assert Err instead of relying on implicit cache hit
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Windows ACL self-repair and classification for secret-key load/create paths, a username qualification helper, comprehensive Windows/Unix tests for repair and error cases, and a reusable Windows CI job to run the ChangesWindows ACL Self-Repair for Secret Key Management
Sequence DiagramsequenceDiagram
participant Loader as SecretStore::load_or_create_key
participant ReadRetry as read_key_file_with_retry
participant PermCheck as is_permission_error
participant ACLRepair as repair_windows_acl
Loader->>ReadRetry: attempt to read key file
ReadRetry-->>Loader: read error
Loader->>PermCheck: classify error
PermCheck-->>Loader: permission denied?
Loader->>ACLRepair: run icacls /reset + grant
ACLRepair-->>Loader: readable?
Loader->>ReadRetry: retry read
ReadRetry-->>Loader: final result
🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/test-reusable.yml (1)
102-130: 💤 Low valueConsider pinning Rust version for consistency with Linux job.
The Linux job uses a container with Rust 1.93.0, but this Windows job uses whatever Rust version is pre-installed on
windows-latest. While this is likely fine for these tests, you could add arustupstep to ensure version parity:- name: Set up Rust toolchain run: rustup default 1.93.0This is optional since version differences would surface as test failures, and the primary goal is validating Windows-specific behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test-reusable.yml around lines 102 - 130, The Windows job rust-core-tests-windows should pin the Rust toolchain to match Linux; add a step before running tests to set the Rust toolchain (e.g., using rustup default 1.93.0) so the job uses a known Rust version rather than whatever is preinstalled on windows-latest; place this step after Checkout code and before Install sccache / Run Windows-specific secrets tests to ensure cargo test -p openhuman -- security::secrets --nocapture runs with the pinned toolchain.src/openhuman/security/secrets.rs (1)
306-311: ⚡ Quick winReuse
repair_windows_aclhelper to eliminate duplication and gain logging.This inline
icacls /resetduplicates the logic inrepair_windows_acl(lines 413–442). The helper also logs success/failure, which aids debugging when grant commands fail.♻️ Suggested refactor
if !icacls_ok { - let _ = std::process::Command::new("icacls") - .arg(&self.key_path) - .args(["/reset"]) - .output(); + repair_windows_acl(&self.key_path); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/security/secrets.rs` around lines 306 - 311, The inline icacls reset block should be replaced with a call to the existing repair_windows_acl helper to avoid duplication and get its logging; where the code currently checks icacls_ok and runs std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(), remove that Command invocation and instead call repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's an impl method) so failures/successes are logged consistently and behavior is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/security/secrets_tests.rs`:
- Line 670: The test calls the wrong module path for repair_windows_acl; change
the invocation to use the same module-relative path as clear_cached_key by
replacing super::super::repair_windows_acl with super::repair_windows_acl so the
test (in the secrets_tests.rs test module wired from secrets.rs) accesses the
pub(super) function correctly alongside clear_cached_key.
---
Nitpick comments:
In @.github/workflows/test-reusable.yml:
- Around line 102-130: The Windows job rust-core-tests-windows should pin the
Rust toolchain to match Linux; add a step before running tests to set the Rust
toolchain (e.g., using rustup default 1.93.0) so the job uses a known Rust
version rather than whatever is preinstalled on windows-latest; place this step
after Checkout code and before Install sccache / Run Windows-specific secrets
tests to ensure cargo test -p openhuman -- security::secrets --nocapture runs
with the pinned toolchain.
In `@src/openhuman/security/secrets.rs`:
- Around line 306-311: The inline icacls reset block should be replaced with a
call to the existing repair_windows_acl helper to avoid duplication and get its
logging; where the code currently checks icacls_ok and runs
std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(),
remove that Command invocation and instead call
repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's
an impl method) so failures/successes are logged consistently and behavior is
centralized.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e32f7dcf-c9fb-4a07-a352-be76b007455a
📒 Files selected for processing (3)
.github/workflows/test-reusable.ymlsrc/openhuman/security/secrets.rssrc/openhuman/security/secrets_tests.rs
… (CI temp dir fix)
…pt cycle The direct fs::read_to_string assertion after self-repair was intermittently failing on Windows CI (Windows Server 2025 / windows-latest). After icacls /reset + /grant:r, Windows Defender / Security Center can briefly re-acquire the file handle, causing the raw read to hit PermissionDenied even though the ACL is already fixed. Replace with a second from-disk decrypt cycle (clear cache → decrypt again), which goes through read_key_file_with_retry's existing PermissionDenied retry backoff — the same path production code takes. This is a stronger test of durability (the ACL fix must survive a full second load-from-disk) and avoids the transient-lock flake entirely.
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Solid fix for the Windows ACL lockout bug (OPENHUMAN-TAURI-GN). The approach is well-thought-out: qualify the username for domain-joined machines, add a grant-failure safety net, and introduce a self-repair path for users already affected. The test coverage is excellent — 13+ new tests covering all branches of the new logic, including the tricky elevated-runner edge case. Windows CI job is a welcome addition.
Change Summary
| File | Change type | Description |
|---|---|---|
.github/workflows/test-reusable.yml |
Added job | New rust-core-tests-windows job running secrets tests on windows-latest |
src/openhuman/security/secrets.rs |
Modified | Self-repair path in load_or_create_key, domain-qualified username via qualify_windows_username, grant-failure safety net with icacls /reset fallback, new helpers is_permission_error and repair_windows_acl |
src/openhuman/security/secrets_tests.rs |
Modified | 13 new tests: qualify_windows_username (7), is_permission_error (3), self-repair E2E (2), Unix locked-file rewrite (1) |
Per-file Analysis
secrets.rs
The self-repair path is cleanly integrated into the existing load_or_create_key flow — the #[cfg(windows)] re-binding of read_result is idiomatic and avoids touching any non-Windows code paths. The two-step repair in repair_windows_acl (/reset then /grant:r) handles both production and CI environments.
The grant-failure safety net (inline /reset at line ~319) is intentionally simpler than calling repair_windows_acl — it only needs to restore inheritance, not re-grant. Good separation of concerns.
is_permission_error correctly handles both ErrorKind::PermissionDenied and raw OS error 5 as a belt-and-suspenders check.
secrets_tests.rs
Tests are well-structured with section separators matching the existing style. The self_repair_recovers_from_locked_key_file test gracefully handles elevated runners (SYSTEM/admin bypass DENY ACEs) — nice attention to CI reality. The Unix test rewrite properly handles root runners too.
test-reusable.yml
Windows job is appropriately scoped to security::secrets tests only. Uses the same sccache + rust-cache pattern as the Linux job.
Minor Notes
One minor nit below. Otherwise this is clean — well-designed fix with comprehensive tests.
| } | ||
|
|
||
| /// Returns `true` when an `std::io::Error` is a permanent permission/access | ||
| /// denial rather than a transient sharing violation. |
There was a problem hiding this comment.
[minor] Duplicate doc comment opener — the first line and the paragraph below say the same thing:
/// Attempt to repair a locked key file by running `icacls /reset` on it.
///
/// Attempt to repair a locked key file.
Suggestion: drop the second sentence and keep the more specific first line, or merge them:
/// Attempt to repair a locked key file by running `icacls /reset` followed
/// by an explicit `/grant:r` for the current user.
Summary
USERDOMAIN\USERNAMEforicaclsgrant so domain-joined and AAD/Entra-joined Windows accounts resolve correctly (bareUSERNAMEcould lock out users on corporate machines)icacls /resetfallback when the grant command fails, so inherited%APPDATA%ACLs are restored instead of leaving the file with no ACEsload_or_create_key: onPermissionDenied, automatically runicacls /resetand retry before surfacing the error — heals existing locked files on next launch with no data lossrust-core-tests-windowsCI job (windows-latest) intest-reusable.ymlso all#[cfg(windows)]secrets tests gate every PRlocked_key_file_fails_gracefully_on_unix:/tmp→/private/tmpsymlink on macOS caused a cache key mismatch, turning a cache hit into a disk read against a0o000file; now explicitly clears cache and assertsErrProblem
Issue link: https://tinyhumans.sentry.io/issues/7482766968/events/6dd382b1a42c48069b3afac7fd88baad/
Sentry issue OPENHUMAN-TAURI-GN —
"Failed to read secret key file"fired on everyopenhuman.app_state_snapshotpoll (~every 2 s) on Windows machines.Two compounding root causes:
AV-scanner transient lock (fixed in PRs fix(secrets): cache decoded key + retry transient reads (OPENHUMAN-TAURI-58) #1509 + fix(security): surface Windows ACL repair hint when .secret_key is unreadable #1748, already in
main): Windows Defender holds a short-lived exclusive handle on newly-created files. Without retry/cache every poll failed permanently.icacls ACL corruption (this PR): the key-creation path ran
icacls .secret_key /inheritance:r /grant:r USERNAME:F. On domain-joined machinesUSERNAME=alicebut the account isCORP\alice; when icacls couldn't resolve the name it exited non-zero — but/inheritance:rhad already stripped the inherited%APPDATA%ACEs. The file was left with no ACEs and no inheritance, permanently unreadable on every subsequent launch.Users already on an affected version have a locked file on disk. The cache+retry fix from #1509 doesn't help them because the very first read of the existing file fails before the cache can be populated.
Solution
qualify_windows_username(username, userdomain, computername)returnsDOMAIN\userwhenUSERDOMAIN ≠ COMPUTERNAME(domain-joined), plainuserfor local accounts. Extracted into a testable function.icacls /grantexits non-zero, immediately runicacls /resetto restore inheritance so the file is always readable, even if the explicit hardening failed.load_or_create_keydetectsPermissionDeniedon an existing file, runsicacls /reset, and retries once. Existing users are silently healed — no manual intervention, no data loss.rust-core-tests-windowsjob runscargo test -p openhuman -- security::secretsonwindows-latestwith--nocaptureso the full#[cfg(windows)]test suite gates every PR going forward.Submission Checklist
self_repair_recovers_from_locked_key_fileis the E2E Windows pathCloses #NNN— tracked in Sentry, not GitHub IssuesImpact
#[cfg(windows)])./tmpsymlink cache-key bug)..secret_keyfrom a prior badicaclsrun are automatically repaired on first launch of the fixed binary — secrets remain intact.Related
https://tinyhumans.sentry.io/issues/7482766968/)AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/sentry-OPENHUMAN-TAURI-GNdce21c949d5f3704cde0a1c8142c6ad80ffb87a3Validation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test -p openhuman -- security::secrets— 40 tests, all passcargo fmt --check+cargo check— cleanValidation Blocked
Behavior Changes
.secret_keyfile are automatically repaired on first launch; domain-joined users no longer get locked out during key creation"Failed to read secret key file"error stops appearing; app functions normally after upgradeParity Contract
enc2:/enc:/ plaintext decrypt paths unchanged; cache and retry logic unchangedPermissionDenied; non-permission errors (corrupt file, wrong length) still surface immediately without triggering repairDuplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Improvements
Testing
Chores