fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN) by YellowSnnowmann · Pull Request #2061 · tinyhumansai/openhuman

YellowSnnowmann · 2026-05-18T07:46:16Z

Summary

Use USERDOMAIN\USERNAME for icacls grant so domain-joined and AAD/Entra-joined Windows accounts resolve correctly (bare USERNAME could lock out users on corporate machines)
Add icacls /reset fallback when the grant command fails, so inherited %APPDATA% ACLs are restored instead of leaving the file with no ACEs
Add self-repair path in load_or_create_key: on PermissionDenied, automatically run icacls /reset and retry before surfacing the error — heals existing locked files on next launch with no data loss
Add rust-core-tests-windows CI job (windows-latest) in test-reusable.yml so all #[cfg(windows)] secrets tests gate every PR
Fix locked_key_file_fails_gracefully_on_unix: /tmp→/private/tmp symlink on macOS caused a cache key mismatch, turning a cache hit into a disk read against a 0o000 file; now explicitly clears cache and asserts Err

Problem

Issue link: https://tinyhumans.sentry.io/issues/7482766968/events/6dd382b1a42c48069b3afac7fd88baad/
Sentry issue OPENHUMAN-TAURI-GN — "Failed to read secret key file" fired on every openhuman.app_state_snapshot poll (~every 2 s) on Windows machines.

Two compounding root causes:

AV-scanner transient lock (fixed in PRs fix(secrets): cache decoded key + retry transient reads (OPENHUMAN-TAURI-58) #1509 + fix(security): surface Windows ACL repair hint when .secret_key is unreadable #1748, already in main): Windows Defender holds a short-lived exclusive handle on newly-created files. Without retry/cache every poll failed permanently.
icacls ACL corruption (this PR): the key-creation path ran icacls .secret_key /inheritance:r /grant:r USERNAME:F. On domain-joined machines USERNAME=alice but the account is CORP\alice; when icacls couldn't resolve the name it exited non-zero — but /inheritance:r had already stripped the inherited %APPDATA% ACEs. The file was left with no ACEs and no inheritance, permanently unreadable on every subsequent launch.

Users already on an affected version have a locked file on disk. The cache+retry fix from #1509 doesn't help them because the very first read of the existing file fails before the cache can be populated.

Solution

Qualify the username: qualify_windows_username(username, userdomain, computername) returns DOMAIN\user when USERDOMAIN ≠ COMPUTERNAME (domain-joined), plain user for local accounts. Extracted into a testable function.
Grant-failure safety net: if icacls /grant exits non-zero, immediately run icacls /reset to restore inheritance so the file is always readable, even if the explicit hardening failed.
Self-repair on read: load_or_create_key detects PermissionDenied on an existing file, runs icacls /reset, and retries once. Existing users are silently healed — no manual intervention, no data loss.
Windows CI: new rust-core-tests-windows job runs cargo test -p openhuman -- security::secrets on windows-latest with --nocapture so the full #[cfg(windows)] test suite gates every PR going forward.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — 40+ unit tests cover all new branches; self_repair_recovers_from_locked_key_file is the E2E Windows path
N/A: Coverage matrix updated — no new feature rows; this is a bug fix to an existing security module
N/A: All affected feature IDs from the matrix are listed — bug fix only
No new external network dependencies introduced
N/A: Manual smoke checklist updated — no release-cut surface changed
N/A: Linked issue closed via Closes #NNN — tracked in Sentry, not GitHub Issues

Impact

Windows only for the ACL fix and self-repair path (#[cfg(windows)]).
All platforms for the Unix test fix (macOS /tmp symlink cache-key bug).
No behaviour change for users whose key file is already readable.
Users with a locked .secret_key from a prior bad icacls run are automatically repaired on first launch of the fixed binary — secrets remain intact.

Sentry issue: OPENHUMAN-TAURI-GN (https://tinyhumans.sentry.io/issues/7482766968/)
Prior partial fixes: fix(secrets): cache decoded key + retry transient reads (OPENHUMAN-TAURI-58) #1509 (cache + retry), fix(security): surface Windows ACL repair hint when .secret_key is unreadable #1748 (Windows ACL hint)
Follow-up PR(s)/TODOs: none

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: OPENHUMAN-TAURI-GN
URL: https://tinyhumans.sentry.io/issues/7482766968/

Commit & Branch

Branch: fix/sentry-OPENHUMAN-TAURI-GN
Commit SHA: dce21c949d5f3704cde0a1c8142c6ad80ffb87a3

Validation Run

pnpm --filter openhuman-app format:check
pnpm typecheck
Focused tests: cargo test -p openhuman -- security::secrets — 40 tests, all pass
Rust fmt/check (if changed): cargo fmt --check + cargo check — clean
N/A: Tauri fmt/check (if changed): no Tauri shell files modified

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: Windows users with a previously-locked .secret_key file are automatically repaired on first launch; domain-joined users no longer get locked out during key creation
User-visible effect: "Failed to read secret key file" error stops appearing; app functions normally after upgrade

Parity Contract

Legacy behavior preserved: all existing enc2: / enc: / plaintext decrypt paths unchanged; cache and retry logic unchanged
Guard/fallback/dispatch parity checks: self-repair only triggers on PermissionDenied; non-permission errors (corrupt file, wrong length) still surface immediately without triggering repair

Duplicate / Superseded PR Handling

Duplicate PR(s): none
Canonical PR: this PR
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

Bug Fixes
- Windows secret-key handling now detects permission/access failures and automatically repairs file ACLs to restore readability.
Improvements
- More robust Windows access-control handling and account qualification for secure key storage, reducing permission-related failures.
Testing
- Added comprehensive Windows and Unix tests for username qualification, permission-error classification, repair behavior, and corruption handling.
Chores
- Added a reusable CI job to run core tests on Windows.

…epair

…ation and self-repair functionality

…AURI-GN) - Use USERDOMAIN\USERNAME for icacls grant so domain/AAD-joined accounts resolve correctly; bare USERNAME locked out users on corporate machines - Add icacls /reset fallback when grant fails to restore inherited ACLs instead of leaving the file with no ACEs (permanent lockout) - Add self-repair path in load_or_create_key: on PermissionDenied, run icacls /reset automatically and retry before surfacing the error — heals existing locked files on next launch with no data loss - Add rust-core-tests-windows CI job on windows-latest so all #[cfg(windows)] secrets tests run on every PR - Fix locked_key_file_fails_gracefully_on_unix: cache key mismatch on macOS (/tmp symlink → /private/tmp) caused false cache miss; explicitly clear cache and assert Err instead of relying on implicit cache hit

coderabbitai · 2026-05-18T07:46:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Windows ACL self-repair and classification for secret-key load/create paths, a username qualification helper, comprehensive Windows/Unix tests for repair and error cases, and a reusable Windows CI job to run the security::secrets test suite.

Changes

Windows ACL Self-Repair for Secret Key Management

Layer / File(s)	Summary
Windows ACL Self-Repair Helpers `src/openhuman/security/secrets.rs`	`is_permission_error` classifies permanent access denial vs transient errors; `repair_windows_acl` runs `icacls /reset` and best-effort `icacls /grant:r` for the qualified user and returns whether the file is readable; `qualify_windows_username` builds an icacls-friendly `DOMAIN\user` when domain-joined.
Windows Key Load and Create with ACL Self-Repair `src/openhuman/security/secrets.rs`	Key-read path inspects first-read failures, logs, runs `repair_windows_acl` and retries before returning error. Key-creation uses domain-qualified account grants with `icacls /inheritance:r /grant:r` and falls back to `icacls /reset` on failure to avoid leaving the file unreadable.
Test Coverage for ACL Helpers and Self-Repair `src/openhuman/security/secrets_tests.rs`	Adds unit tests for `qualify_windows_username` and `is_permission_error`; Unix test simulates unreadable key file; Windows end-to-end tests corrupt ACLs to exercise repair and verify behavior for corrupted key contents.
Windows CI Job for Security Tests `.github/workflows/test-reusable.yml`	New reusable job `rust-core-tests-windows` runs on `windows-latest` with Rust cache and `sccache`, checks out submodules, and executes `cargo test -p openhuman -- security::secrets --nocapture` when `inputs.run_rust_core` is true.

Sequence Diagram

sequenceDiagram
  participant Loader as SecretStore::load_or_create_key
  participant ReadRetry as read_key_file_with_retry
  participant PermCheck as is_permission_error
  participant ACLRepair as repair_windows_acl
  Loader->>ReadRetry: attempt to read key file
  ReadRetry-->>Loader: read error
  Loader->>PermCheck: classify error
  PermCheck-->>Loader: permission denied?
  Loader->>ACLRepair: run icacls /reset + grant
  ACLRepair-->>Loader: readable?
  Loader->>ReadRetry: retry read
  ReadRetry-->>Loader: final result

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

tinyhumansai/openhuman#1748: Modifies the same Windows key-read error path and adds icacls-based ACL repair guidance in error contexts.
tinyhumansai/openhuman#1887: Refactors CI into reusable workflows; related at the CI workflow level where this PR adds a Windows job.

"🐇
Keys locked in night, I hop to see,
icacls reset sets secrets free,
Domain names trimmed, permissions spun,
Decrypt returns—our work is done! ✨"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)' directly matches the PR's main objective—adding self-repair logic for locked Windows key files—and accurately summarizes the primary change.
Linked Issues check	✅ Passed	The PR implements caching of decoded .secret_key bytes via OnceLock-backed map, adds bounded retry for transient Windows read errors, includes regression test demonstrating cache behavior, and gates via focused tests as specified in issue `#1509`.
Out of Scope Changes check	✅ Passed	All changes directly address issue `#1509` objectives: Windows ACL self-repair, username qualification, caching, and retries; only the macOS test fix (cleared cache assertion) is tangentially related but supports the overall caching implementation.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/workflows/test-reusable.yml (1)

102-130: 💤 Low value

Consider pinning Rust version for consistency with Linux job.

The Linux job uses a container with Rust 1.93.0, but this Windows job uses whatever Rust version is pre-installed on windows-latest. While this is likely fine for these tests, you could add a rustup step to ensure version parity:
- name: Set up Rust toolchain
  run: rustup default 1.93.0
This is optional since version differences would surface as test failures, and the primary goal is validating Windows-specific behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-reusable.yml around lines 102 - 130, The Windows job
rust-core-tests-windows should pin the Rust toolchain to match Linux; add a step
before running tests to set the Rust toolchain (e.g., using rustup default
1.93.0) so the job uses a known Rust version rather than whatever is
preinstalled on windows-latest; place this step after Checkout code and before
Install sccache / Run Windows-specific secrets tests to ensure cargo test -p
openhuman -- security::secrets --nocapture runs with the pinned toolchain.

src/openhuman/security/secrets.rs (1)

306-311: ⚡ Quick win

Reuse repair_windows_acl helper to eliminate duplication and gain logging.

This inline icacls /reset duplicates the logic in repair_windows_acl (lines 413–442). The helper also logs success/failure, which aids debugging when grant commands fail.

♻️ Suggested refactor

                 if !icacls_ok {
-                    let _ = std::process::Command::new("icacls")
-                        .arg(&self.key_path)
-                        .args(["/reset"])
-                        .output();
+                    repair_windows_acl(&self.key_path);
                 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/security/secrets.rs` around lines 306 - 311, The inline icacls
reset block should be replaced with a call to the existing repair_windows_acl
helper to avoid duplication and get its logging; where the code currently checks
icacls_ok and runs
std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(),
remove that Command invocation and instead call
repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's
an impl method) so failures/successes are logged consistently and behavior is
centralized.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/security/secrets_tests.rs`:
- Line 670: The test calls the wrong module path for repair_windows_acl; change
the invocation to use the same module-relative path as clear_cached_key by
replacing super::super::repair_windows_acl with super::repair_windows_acl so the
test (in the secrets_tests.rs test module wired from secrets.rs) accesses the
pub(super) function correctly alongside clear_cached_key.

---

Nitpick comments:
In @.github/workflows/test-reusable.yml:
- Around line 102-130: The Windows job rust-core-tests-windows should pin the
Rust toolchain to match Linux; add a step before running tests to set the Rust
toolchain (e.g., using rustup default 1.93.0) so the job uses a known Rust
version rather than whatever is preinstalled on windows-latest; place this step
after Checkout code and before Install sccache / Run Windows-specific secrets
tests to ensure cargo test -p openhuman -- security::secrets --nocapture runs
with the pinned toolchain.

In `@src/openhuman/security/secrets.rs`:
- Around line 306-311: The inline icacls reset block should be replaced with a
call to the existing repair_windows_acl helper to avoid duplication and get its
logging; where the code currently checks icacls_ok and runs
std::process::Command::new("icacls").arg(&self.key_path).args(["/reset"]).output(),
remove that Command invocation and instead call
repair_windows_acl(&self.key_path) (or the appropriate method receiver if it's
an impl method) so failures/successes are logged consistently and behavior is
centralized.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e32f7dcf-c9fb-4a07-a352-be76b007455a

📥 Commits

Reviewing files that changed from the base of the PR and between 0f616e4 and dce21c9.

📒 Files selected for processing (3)

.github/workflows/test-reusable.yml
src/openhuman/security/secrets.rs
src/openhuman/security/secrets_tests.rs

… (CI temp dir fix)

…pt cycle The direct fs::read_to_string assertion after self-repair was intermittently failing on Windows CI (Windows Server 2025 / windows-latest). After icacls /reset + /grant:r, Windows Defender / Security Center can briefly re-acquire the file handle, causing the raw read to hit PermissionDenied even though the ACL is already fixed. Replace with a second from-disk decrypt cycle (clear cache → decrypt again), which goes through read_key_file_with_retry's existing PermissionDenied retry backoff — the same path production code takes. This is a stronger test of durability (the ACL fix must survive a full second load-from-disk) and avoids the transient-lock flake entirely.

graycyrus

Walkthrough

Solid fix for the Windows ACL lockout bug (OPENHUMAN-TAURI-GN). The approach is well-thought-out: qualify the username for domain-joined machines, add a grant-failure safety net, and introduce a self-repair path for users already affected. The test coverage is excellent — 13+ new tests covering all branches of the new logic, including the tricky elevated-runner edge case. Windows CI job is a welcome addition.

Change Summary

File	Change type	Description
`.github/workflows/test-reusable.yml`	Added job	New `rust-core-tests-windows` job running secrets tests on `windows-latest`
`src/openhuman/security/secrets.rs`	Modified	Self-repair path in `load_or_create_key`, domain-qualified username via `qualify_windows_username`, grant-failure safety net with `icacls /reset` fallback, new helpers `is_permission_error` and `repair_windows_acl`
`src/openhuman/security/secrets_tests.rs`	Modified	13 new tests: `qualify_windows_username` (7), `is_permission_error` (3), self-repair E2E (2), Unix locked-file rewrite (1)

Per-file Analysis

`secrets.rs`

The self-repair path is cleanly integrated into the existing load_or_create_key flow — the #[cfg(windows)] re-binding of read_result is idiomatic and avoids touching any non-Windows code paths. The two-step repair in repair_windows_acl (/reset then /grant:r) handles both production and CI environments.

The grant-failure safety net (inline /reset at line ~319) is intentionally simpler than calling repair_windows_acl — it only needs to restore inheritance, not re-grant. Good separation of concerns.

is_permission_error correctly handles both ErrorKind::PermissionDenied and raw OS error 5 as a belt-and-suspenders check.

`secrets_tests.rs`

Tests are well-structured with section separators matching the existing style. The self_repair_recovers_from_locked_key_file test gracefully handles elevated runners (SYSTEM/admin bypass DENY ACEs) — nice attention to CI reality. The Unix test rewrite properly handles root runners too.

`test-reusable.yml`

Windows job is appropriately scoped to security::secrets tests only. Uses the same sccache + rust-cache pattern as the Linux job.

Minor Notes

One minor nit below. Otherwise this is clean — well-designed fix with comprehensive tests.

graycyrus · 2026-05-18T09:45:47Z

 }

+/// Returns `true` when an `std::io::Error` is a permanent permission/access
+/// denial rather than a transient sharing violation.


[minor] Duplicate doc comment opener — the first line and the paragraph below say the same thing:

/// Attempt to repair a locked key file by running `icacls /reset` on it. /// /// Attempt to repair a locked key file.

Suggestion: drop the second sentence and keep the more specific first line, or merge them:

/// Attempt to repair a locked key file by running `icacls /reset` followed /// by an explicit `/grant:r` for the current user.

YellowSnnowmann added 4 commits May 18, 2026 12:55

fix(security): enhance Windows ACL handling for key file access and r…

220ca74

…epair

test(security): add comprehensive tests for Windows username qualific…

e58000f

…ation and self-repair functionality

style: cargo fmt secrets_tests.rs

dce21c9

YellowSnnowmann requested a review from a team May 18, 2026 07:46

coderabbitai Bot requested changes May 18, 2026

View reviewed changes

Comment thread src/openhuman/security/secrets_tests.rs Outdated

fix(test): use super::repair_windows_acl not super::super (CodeRabbit)

a427561

coderabbitai Bot previously approved these changes May 18, 2026

View reviewed changes

fix(test): skip locked-file assert when running as root (Linux CI)

2edc151

YellowSnnowmann dismissed coderabbitai[bot]’s stale review via 2edc151 May 18, 2026 08:11

coderabbitai Bot previously approved these changes May 18, 2026

View reviewed changes

fix(security): repair_windows_acl adds explicit /grant:r after /reset…

40699e6

… (CI temp dir fix)

YellowSnnowmann dismissed coderabbitai[bot]’s stale review via 40699e6 May 18, 2026 08:29

coderabbitai Bot previously approved these changes May 18, 2026

View reviewed changes

YellowSnnowmann dismissed coderabbitai[bot]’s stale review via bf06206 May 18, 2026 08:57

coderabbitai Bot approved these changes May 18, 2026

View reviewed changes

graycyrus reviewed May 18, 2026

View reviewed changes

graycyrus approved these changes May 18, 2026

View reviewed changes

graycyrus merged commit 5220345 into tinyhumansai:main May 18, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)#2061

fix(security): self-repair locked .secret_key on Windows (OPENHUMAN-TAURI-GN)#2061
graycyrus merged 8 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/sentry-OPENHUMAN-TAURI-GN

YellowSnnowmann commented May 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram

Possibly Related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

graycyrus left a comment

Uh oh!

graycyrus May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YellowSnnowmann commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Possibly Related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Walkthrough

Change Summary

Per-file Analysis

secrets.rs

secrets_tests.rs

test-reusable.yml

Minor Notes

Uh oh!

graycyrus May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YellowSnnowmann commented May 18, 2026 •

edited

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading

`secrets.rs`

`secrets_tests.rs`

`test-reusable.yml`