Skip to content

fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)#2061

Merged
graycyrus merged 8 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-OPENHUMAN-TAURI-GN
May 18, 2026
Merged

fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)#2061
graycyrus merged 8 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-OPENHUMAN-TAURI-GN

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 18, 2026

Summary

  • Use USERDOMAIN\USERNAME for icacls grant so domain-joined and AAD/Entra-joined Windows accounts resolve correctly (bare USERNAME could lock out users on corporate machines)
  • Add icacls /reset fallback when the grant command fails, so inherited %APPDATA% ACLs are restored instead of leaving the file with no ACEs
  • Add self-repair path in load_or_create_key: on PermissionDenied, automatically run icacls /reset and retry before surfacing the error — heals existing locked files on next launch with no data loss
  • Add rust-core-tests-windows CI job (windows-latest) in test-reusable.yml so all #[cfg(windows)] secrets tests gate every PR
  • Fix locked_key_file_fails_gracefully_on_unix: /tmp/private/tmp symlink on macOS caused a cache key mismatch, turning a cache hit into a disk read against a 0o000 file; now explicitly clears cache and asserts Err

Problem

Issue link: https://tinyhumans.sentry.io/issues/7482766968/events/6dd382b1a42c48069b3afac7fd88baad/
Sentry issue OPENHUMAN-TAURI-GN"Failed to read secret key file" fired on every openhuman.app_state_snapshot poll (~every 2 s) on Windows machines.

Two compounding root causes:

  1. AV-scanner transient lock (fixed in PRs fix(secrets): cache decoded key + retry transient reads (OPENHUMAN-TAURI-58) #1509 + fix(security): surface Windows ACL repair hint when .secret_key is unreadable #1748, already in main): Windows Defender holds a short-lived exclusive handle on newly-created files. Without retry/cache every poll failed permanently.

  2. icacls ACL corruption (this PR): the key-creation path ran icacls .secret_key /inheritance:r /grant:r USERNAME:F. On domain-joined machines USERNAME=alice but the account is CORP\alice; when icacls couldn't resolve the name it exited non-zero — but /inheritance:r had already stripped the inherited %APPDATA% ACEs. The file was left with no ACEs and no inheritance, permanently unreadable on every subsequent launch.

Users already on an affected version have a locked file on disk. The cache+retry fix from #1509 doesn't help them because the very first read of the existing file fails before the cache can be populated.

Solution

  • Qualify the username: qualify_windows_username(username, userdomain, computername) returns DOMAIN\user when USERDOMAIN ≠ COMPUTERNAME (domain-joined), plain user for local accounts. Extracted into a testable function.
  • Grant-failure safety net: if icacls /grant exits non-zero, immediately run icacls /reset to restore inheritance so the file is always readable, even if the explicit hardening failed.
  • Self-repair on read: load_or_create_key detects PermissionDenied on an existing file, runs icacls /reset, and retries once. Existing users are silently healed — no manual intervention, no data loss.
  • Windows CI: new rust-core-tests-windows job runs cargo test -p openhuman -- security::secrets on windows-latest with --nocapture so the full #[cfg(windows)] test suite gates every PR going forward.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — 40+ unit tests cover all new branches; self_repair_recovers_from_locked_key_file is the E2E Windows path
  • N/A: Coverage matrix updated — no new feature rows; this is a bug fix to an existing security module
  • N/A: All affected feature IDs from the matrix are listed — bug fix only
  • No new external network dependencies introduced
  • N/A: Manual smoke checklist updated — no release-cut surface changed
  • N/A: Linked issue closed via Closes #NNN — tracked in Sentry, not GitHub Issues

Impact

  • Windows only for the ACL fix and self-repair path (#[cfg(windows)]).
  • All platforms for the Unix test fix (macOS /tmp symlink cache-key bug).
  • No behaviour change for users whose key file is already readable.
  • Users with a locked .secret_key from a prior bad icacls run are automatically repaired on first launch of the fixed binary — secrets remain intact.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

  • Branch: fix/sentry-OPENHUMAN-TAURI-GN
  • Commit SHA: dce21c949d5f3704cde0a1c8142c6ad80ffb87a3

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: cargo test -p openhuman -- security::secrets — 40 tests, all pass
  • Rust fmt/check (if changed): cargo fmt --check + cargo check — clean
  • N/A: Tauri fmt/check (if changed): no Tauri shell files modified

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: Windows users with a previously-locked .secret_key file are automatically repaired on first launch; domain-joined users no longer get locked out during key creation
  • User-visible effect: "Failed to read secret key file" error stops appearing; app functions normally after upgrade

Parity Contract

  • Legacy behavior preserved: all existing enc2: / enc: / plaintext decrypt paths unchanged; cache and retry logic unchanged
  • Guard/fallback/dispatch parity checks: self-repair only triggers on PermissionDenied; non-permission errors (corrupt file, wrong length) still surface immediately without triggering repair

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Bug Fixes

    • Windows secret-key handling now detects permission/access failures and automatically repairs file ACLs to restore readability.
  • Improvements

    • More robust Windows access-control handling and account qualification for secure key storage, reducing permission-related failures.
  • Testing

    • Added comprehensive Windows and Unix tests for username qualification, permission-error classification, repair behavior, and corruption handling.
  • Chores

    • Added a reusable CI job to run core tests on Windows.

Review Change Stack

…AURI-GN)

- Use USERDOMAIN\USERNAME for icacls grant so domain/AAD-joined accounts
  resolve correctly; bare USERNAME locked out users on corporate machines
- Add icacls /reset fallback when grant fails to restore inherited ACLs
  instead of leaving the file with no ACEs (permanent lockout)
- Add self-repair path in load_or_create_key: on PermissionDenied, run
  icacls /reset automatically and retry before surfacing the error —
  heals existing locked files on next launch with no data loss
- Add rust-core-tests-windows CI job on windows-latest so all
  #[cfg(windows)] secrets tests run on every PR
- Fix locked_key_file_fails_gracefully_on_unix: cache key mismatch on
  macOS (/tmp symlink → /private/tmp) caused false cache miss; explicitly
  clear cache and assert Err instead of relying on implicit cache hit
@YellowSnnowmann YellowSnnowmann requested a review from a team May 18, 2026 07:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Windows ACL self-repair and classification for secret-key load/create paths, a username qualification helper, comprehensive Windows/Unix tests for repair and error cases, and a reusable Windows CI job to run the security::secrets test suite.

Changes

Windows ACL Self-Repair for Secret Key Management

Layer / File(s) Summary
Windows ACL Self-Repair Helpers
src/openhuman/security/secrets.rs
is_permission_error classifies permanent access denial vs transient errors; repair_windows_acl runs icacls /reset and best-effort icacls /grant:r for the qualified user and returns whether the file is readable; qualify_windows_username builds an icacls-friendly DOMAIN\user when domain-joined.
Windows Key Load and Create with ACL Self-Repair
src/openhuman/security/secrets.rs
Key-read path inspects first-read failures, logs, runs repair_windows_acl and retries before returning error. Key-creation uses domain-qualified account grants with icacls /inheritance:r /grant:r and falls back to icacls /reset on failure to avoid leaving the file unreadable.
Test Coverage for ACL Helpers and Self-Repair
src/openhuman/security/secrets_tests.rs
Adds unit tests for qualify_windows_username and is_permission_error; Unix test simulates unreadable key file; Windows end-to-end tests corrupt ACLs to exercise repair and verify behavior for corrupted key contents.
Windows CI Job for Security Tests
.github/workflows/test-reusable.yml
New reusable job rust-core-tests-windows runs on windows-latest with Rust cache and sccache, checks out submodules, and executes cargo test -p openhuman -- security::secrets --nocapture when inputs.run_rust_core is true.

Sequence Diagram

sequenceDiagram
  participant Loader as SecretStore::load_or_create_key
  participant ReadRetry as read_key_file_with_retry
  participant PermCheck as is_permission_error
  participant ACLRepair as repair_windows_acl
  Loader->>ReadRetry: attempt to read key file
  ReadRetry-->>Loader: read error
  Loader->>PermCheck: classify error
  PermCheck-->>Loader: permission denied?
  Loader->>ACLRepair: run icacls /reset + grant
  ACLRepair-->>Loader: readable?
  Loader->>ReadRetry: retry read
  ReadRetry-->>Loader: final result
Loading

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • tinyhumansai/openhuman#1748: Modifies the same Windows key-read error path and adds icacls-based ACL repair guidance in error contexts.
  • tinyhumansai/openhuman#1887: Refactors CI into reusable workflows; related at the CI workflow level where this PR adds a Windows job.

"🐇
Keys locked in night, I hop to see,
icacls reset sets secrets free,
Domain names trimmed, permissions spun,
Decrypt returns—our work is done! ✨"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)' directly matches the PR's main objective—adding self-repair logic for locked Windows key files—and accurately summarizes the primary change.
Linked Issues check ✅ Passed The PR implements caching of decoded .secret_key bytes via OnceLock-backed map, adds bounded retry for transient Windows read errors, includes regression test demonstrating cache behavior, and gates via focused tests as specified in issue #1509.
Out of Scope Changes check ✅ Passed All changes directly address issue #1509 objectives: Windows ACL self-repair, username qualification, caching, and retries; only the macOS test fix (cleared cache assertion) is tangentially related but supports the overall caching implementation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/test-reusable.yml (1)

102-130: 💤 Low value

Consider pinning Rust version for consistency with Linux job.

The Linux job uses a container with Rust 1.93.0, but this Windows job uses whatever Rust version is pre-installed on windows-latest. While this is likely fine for these tests, you could add a rustup step to ensure version parity:

- name: Set up Rust toolchain
  run: rustup default 1.93.0

This is optional since version differences would surface as test failures, and the primary goal is validating Windows-specific behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-reusable.yml around lines 102 - 130, The Windows job
rust-core-tests-windows should pin the Rust toolchain to match Linux; add a step
before running tests to set the Rust toolchain (e.g., using rustup default
1.93.0) so the job uses a known Rust version rather than whatever is
preinstalled on windows-latest; place this step after Checkout code and before
Install sccache / Run Windows-specific secrets tests to ensure cargo test -p
openhuman -- security::secrets --nocapture runs with the pinned toolchain.
src/openhuman/security/secrets.rs (1)

306-311: ⚡ Quick win

Reuse repair_windows_acl helper to eliminate duplication and gain logging.

This inline icacls /reset duplicates the logic in repair_windows_acl (lines 413–442). The helper also logs success/failure, which aids debugging when grant commands fail.

♻️ Suggested refactor
                 if !icacls_ok {
-                    let _ = std::process::Command::new("icacls")
-                        .arg(&self.key_path)
-                        .args(["/reset"])
-                        .output();
+                    repair_windows_acl(&self.key_path);
                 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/security/secrets.rs` around lines 306 - 311, The inline icacls
reset block should be replaced with a call to the existing repair_windows_acl
helper to avoid duplication and get its logging; where the code currently checks
icacls_ok and runs
std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(),
remove that Command invocation and instead call
repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's
an impl method) so failures/successes are logged consistently and behavior is
centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/security/secrets_tests.rs`:
- Line 670: The test calls the wrong module path for repair_windows_acl; change
the invocation to use the same module-relative path as clear_cached_key by
replacing super::super::repair_windows_acl with super::repair_windows_acl so the
test (in the secrets_tests.rs test module wired from secrets.rs) accesses the
pub(super) function correctly alongside clear_cached_key.

---

Nitpick comments:
In @.github/workflows/test-reusable.yml:
- Around line 102-130: The Windows job rust-core-tests-windows should pin the
Rust toolchain to match Linux; add a step before running tests to set the Rust
toolchain (e.g., using rustup default 1.93.0) so the job uses a known Rust
version rather than whatever is preinstalled on windows-latest; place this step
after Checkout code and before Install sccache / Run Windows-specific secrets
tests to ensure cargo test -p openhuman -- security::secrets --nocapture runs
with the pinned toolchain.

In `@src/openhuman/security/secrets.rs`:
- Around line 306-311: The inline icacls reset block should be replaced with a
call to the existing repair_windows_acl helper to avoid duplication and get its
logging; where the code currently checks icacls_ok and runs
std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(),
remove that Command invocation and instead call
repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's
an impl method) so failures/successes are logged consistently and behavior is
centralized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e32f7dcf-c9fb-4a07-a352-be76b007455a

📥 Commits

Reviewing files that changed from the base of the PR and between 0f616e4 and dce21c9.

📒 Files selected for processing (3)
  • .github/workflows/test-reusable.yml
  • src/openhuman/security/secrets.rs
  • src/openhuman/security/secrets_tests.rs

Comment thread src/openhuman/security/secrets_tests.rs Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 18, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 18, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 18, 2026
…pt cycle

The direct fs::read_to_string assertion after self-repair was intermittently
failing on Windows CI (Windows Server 2025 / windows-latest). After icacls
/reset + /grant:r, Windows Defender / Security Center can briefly re-acquire
the file handle, causing the raw read to hit PermissionDenied even though
the ACL is already fixed.

Replace with a second from-disk decrypt cycle (clear cache → decrypt again),
which goes through read_key_file_with_retry's existing PermissionDenied retry
backoff — the same path production code takes. This is a stronger test of
durability (the ACL fix must survive a full second load-from-disk) and avoids
the transient-lock flake entirely.
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walkthrough

Solid fix for the Windows ACL lockout bug (OPENHUMAN-TAURI-GN). The approach is well-thought-out: qualify the username for domain-joined machines, add a grant-failure safety net, and introduce a self-repair path for users already affected. The test coverage is excellent — 13+ new tests covering all branches of the new logic, including the tricky elevated-runner edge case. Windows CI job is a welcome addition.

Change Summary

File Change type Description
.github/workflows/test-reusable.yml Added job New rust-core-tests-windows job running secrets tests on windows-latest
src/openhuman/security/secrets.rs Modified Self-repair path in load_or_create_key, domain-qualified username via qualify_windows_username, grant-failure safety net with icacls /reset fallback, new helpers is_permission_error and repair_windows_acl
src/openhuman/security/secrets_tests.rs Modified 13 new tests: qualify_windows_username (7), is_permission_error (3), self-repair E2E (2), Unix locked-file rewrite (1)

Per-file Analysis

secrets.rs

The self-repair path is cleanly integrated into the existing load_or_create_key flow — the #[cfg(windows)] re-binding of read_result is idiomatic and avoids touching any non-Windows code paths. The two-step repair in repair_windows_acl (/reset then /grant:r) handles both production and CI environments.

The grant-failure safety net (inline /reset at line ~319) is intentionally simpler than calling repair_windows_acl — it only needs to restore inheritance, not re-grant. Good separation of concerns.

is_permission_error correctly handles both ErrorKind::PermissionDenied and raw OS error 5 as a belt-and-suspenders check.

secrets_tests.rs

Tests are well-structured with section separators matching the existing style. The self_repair_recovers_from_locked_key_file test gracefully handles elevated runners (SYSTEM/admin bypass DENY ACEs) — nice attention to CI reality. The Unix test rewrite properly handles root runners too.

test-reusable.yml

Windows job is appropriately scoped to security::secrets tests only. Uses the same sccache + rust-cache pattern as the Linux job.

Minor Notes

One minor nit below. Otherwise this is clean — well-designed fix with comprehensive tests.

}

/// Returns `true` when an `std::io::Error` is a permanent permission/access
/// denial rather than a transient sharing violation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Duplicate doc comment opener — the first line and the paragraph below say the same thing:

/// Attempt to repair a locked key file by running `icacls /reset` on it.
///
/// Attempt to repair a locked key file.

Suggestion: drop the second sentence and keep the more specific first line, or merge them:

/// Attempt to repair a locked key file by running `icacls /reset` followed
/// by an explicit `/grant:r` for the current user.

@graycyrus graycyrus merged commit 5220345 into tinyhumansai:main May 18, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants