Skip to content

[Sprint 7] aimx doctor extensions + hooks prune --orphans#131

Merged
uzyn merged 4 commits into
mainfrom
sprint-7-doctor-prune
Apr 22, 2026
Merged

[Sprint 7] aimx doctor extensions + hooks prune --orphans#131
uzyn merged 4 commits into
mainfrom
sprint-7-doctor-prune

Conversation

@uzyn
Copy link
Copy Markdown
Owner

@uzyn uzyn commented Apr 22, 2026

Summary

Implements Sprint 7 of the agent-integration track.

  • S7-1check_mailbox_ownership: per-mailbox getpwnam(owner) + inbox/<name>/ + sent/<name>/ existence, uid, gid, and 0700 mode checks. Stale storage dirs surface as ORPHAN-STORAGE; config mailboxes with no storage dirs surface as ORPHAN-CONFIG.
  • S7-2check_templates: every [[hook_template]] is validated — run_as resolves (or is reserved), cmd[0] is an executable file, and the run_as user can access(X_OK) cmd[0]. The access probe uses runuser -u <user> -- test -x <path> first, then falls back to a fork+seteuid probe (root-only). Non-root doctor runs that can't evaluate the probe emit an INFO finding rather than silently skipping.
  • S7-3 — defensive hook-invariant re-check via check_hook_owner_invariant; check_catchall_user fails when a catchall mailbox exists but the aimx-catchall user does not resolve; legacy aimx-hook surfaces as an INFO note; LoadWarnings from Config::load are translated into matching doctor findings.
  • S7-4aimx hooks prune --orphans [--dry-run]: root-only CLI, atomic config rewrite via write_config_atomic + SIGHUP, refuses when doctor reports non-orphan failures, idempotent second pass. Summary line: Removed N templates (...) and M hooks from K mailboxes.

The existing doctor output (format_status) is unchanged; the new findings render in a new Checks section after the existing report, and aimx doctor now exits non-zero when any FAIL-severity finding is present.

Test plan

  • cargo test — 1199 unit tests + 94 integration tests pass (3 ignored: real-user isolation + uds_authz tests).
  • cargo clippy --all-targets -- -D warnings clean.
  • cargo fmt -- --check clean.
  • Verifier crate untouched; existing CI split remains valid.
  • New unit tests cover each S7-1/2/3 check branch (check_mailbox_ownership, check_templates_with_runner with a mock AccessRunner, check_hook_invariants, check_catchall_user, translate_load_warnings, format_checks, ORPHAN_CHECK_IDS).
  • New hooks.rs unit tests cover build_prune_plan, apply_prune_plan (including idempotence), hook_run_as_is_orphan, and format_prune_summary.
  • New integration tests cover the hooks prune --orphans CLI path: hooks_prune_orphans_dry_run_then_apply, hooks_prune_requires_orphans_and_root.

Notes

  • access(X_OK) under a different uid: the runner tries runuser first (the documented primary path), then falls back to fork(2) + setegid(2) + seteuid(2) + access(..., X_OK) when runuser is absent. Non-root hosts without runuser fall through to AccessResult::Unknown, which surfaces as an INFO finding so the operator knows the check was inconclusive.
  • The two pre-existing doctor_* integration tests had to drop their .success() expectations — the Sprint 7 Checks section legitimately flags the test fixture's pre-chowned dirs as drift (uid mismatch vs. aimx-catchall). They still assert the report's stdout content end-to-end.

S7-1: add `check_mailbox_ownership` — per mailbox, verify `getpwnam(owner)`
resolves, the inbox/ and sent/ dirs exist, are chowned `owner:owner`
mode 0700, and surface stale `inbox/<name>/` / `sent/<name>/` dirs
as ORPHAN-STORAGE + config mailboxes with no storage dirs as
ORPHAN-CONFIG. Findings carry a severity and the Checks section
makes `aimx doctor` exit non-zero on any `Fail` finding.

S7-2: add `check_templates` — per `[[hook_template]]`, verify run_as
resolves (or is reserved), cmd[0] is an executable file, and the
run_as user can `access(X_OK)` cmd[0]. The access probe tries
`runuser -u <user> -- test -x <path>` first, then falls back to a
fork+seteuid probe (root-only); non-root doctor runs that can't
evaluate the probe emit an INFO finding so operators know the
check was skipped rather than silently bypassed.

S7-3: add `check_hook_invariants`, `check_catchall_user`, and
`translate_load_warnings`. The hook invariant is re-run defensively
against a hand-edited config.toml; the catchall-user check fails
when a catchall mailbox is configured but `aimx-catchall` does not
resolve; legacy `aimx-hook` is surfaced as an INFO note.

S7-4: add `aimx hooks prune --orphans` (+ `--dry-run`). Root-only
(with AIMX_TEST_SKIP_ROOT_CHECK for tests). Refuses when the
internal doctor pre-flight reports any non-orphan `Fail` finding;
otherwise atomically rewrites `config.toml` via
`write_config_atomic` + sends SIGHUP to the daemon. Idempotent
on a second pass. Summary line:
`Removed N templates (...) and M hooks from K mailboxes`.

Existing doctor integration tests updated to inspect stdout
without asserting on exit status — the Sprint 7 checks now
surface fixture-level ownership drift as FAIL findings by design.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint Goal Assessment

Sprint 7's goal — surfacing every misconfiguration that can break isolation or hook execution via aimx doctor, and cleaning up residue via aimx hooks prune --orphans — is fully met. The four stories (S7-1 through S7-4) are implemented end-to-end with distinct, stable check IDs, deterministic orphan classification, root-gating on prune, a dry-run preview, an atomic config rewrite + SIGHUP, and idempotent repeat runs. 1199 unit + 94 integration tests pass locally; clippy and fmt are clean.

Acceptance Criteria Checklist

S7-1 Mailbox ownership checks

  • check_mailbox_ownership added and wired into doctor::run via run_checks
  • 8 distinct finding IDs — exceeds the "5 checks" ask: MAILBOX-OWNER-ORPHAN, MAILBOX-DIR-MISSING, MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT, MAILBOX-DIR-GROUP-DRIFT, MAILBOX-DIR-MODE-DRIFT, ORPHAN-STORAGE, ORPHAN-CONFIG
  • Severities are sensibly assigned: Fail for dir-missing / not-a-dir / uid-drift; Warn for group + mode drift + orphan-owner + orphan-storage + orphan-config. That matches the PRD intent (uid drift breaks isolation, mode/group drift is recoverable)
  • Findings rendered through term::*_badge helpers
  • Non-zero exit when any Fail is present (via doctor::run returning Err)
  • Unit tests exercise the main branches via set_test_resolver
  • Partial test coverage: MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT (uid mismatch), MAILBOX-DIR-GROUP-DRIFT have no dedicated unit test. MAILBOX-DIR-OWNER-DRIFT is indirectly exercised in integration via the fixture drift, but there is no direct assertion pinning the check ID

S7-2 Template checks

  • check_templates_with_runner added; run_as resolution, cmd[0] exists + executable, access(X_OK) as target user all checked
  • AccessRunner trait + RealAccessRunner seam; runuser first, fork+seteuid fallback when running as root, AccessResult::Unknown → Info finding when neither path evaluates
  • Finding text includes suggested fixes (agent-setup --redetect, hooks prune --orphans)
  • Unit tests cover: valid template, missing run_as, missing cmd[0], denied access, unknown access, root run_as short-circuit
  • Fork path is safe: only async-signal-safe calls (setegid, seteuid, access, _exit) in child; CString allocated before fork; no Rust destructors run between fork and _exit; no FD leaks introduced (no new FDs opened in child). Exit codes are discriminated (0=Allowed, 1=Denied, other=Unknown). doctor::run is invoked from a single-threaded fn main, so the fork is well-defined

S7-3 Hook invariant + catchall + legacy-user notes

  • check_hook_invariants re-runs the Sprint 1 invariant defensively
  • check_catchall_user fails if a catchall mailbox exists but aimx-catchall does not resolve
  • check_legacy_aimx_hook_user emits Info (not Fail) when aimx-hook is still present
  • translate_load_warnings converts every LoadWarning variant to a doctor finding at the right severity (Warn for orphans, Info for legacy / root-catchall)
  • Unit tests cover every branch

S7-4 aimx hooks prune --orphans

  • Subcommand added in src/cli.rs and src/hooks.rs
  • Root-only (with AIMX_TEST_SKIP_ROOT_CHECK escape hatch for tests — consistent with other root-gated commands in the codebase)
  • Refuses when non-orphan Fail findings exist; error message names each offending check by ID + message
  • Atomic rewrite via write_config_atomic (temp-then-rename + fsync)
  • --dry-run prints the planned diff without writing
  • Idempotent on second run (verified by both unit and integration tests)
  • SIGHUP to the running daemon when available; restart hint otherwise
  • Note: the sprint plan said "ConfigHandle::store swap (reuses S5-2 pattern)". The CLI cannot directly swap the daemon's Arc<Config> — it writes + SIGHUPs, and the daemon swaps on reload. This is the correct pattern for CLI→daemon config edits and matches aimx mailboxes create/delete fallback path; treating this as not a deviation

Scrutiny areas explicitly flagged

  • Severity classification for orphans: ORPHAN-STORAGE and ORPHAN-CONFIG are classified Warn, not Fail. This is sensible — an orphan dir or an unprovisioned mailbox entry does not immediately break isolation or correctness, and forcing Fail would make aimx doctor exit non-zero on routine post-userdel state. Because they are Warn, they do NOT block hooks prune --orphans unconditionally (the prune only blocks on non-orphan Fail). Consistent.

  • runuser-first / fork-setuid-fallback: Safe. Runuser exit codes are discriminated carefully (0=Allowed, 1=Denied, any other=Unknown). The fork path is euid-gated (non-root returns Unknown immediately without forking). The child only calls async-signal-safe POSIX functions before _exit. CString::new is outside the fork, so the child does not allocate. No Rust destructors run in the child.

  • Doctor exit code: Correct — any_failErr(...) → non-zero exit via main.rs.

  • Prune root-only + --dry-run + refusal: All correct and directly test-covered.

  • Test fixture regression (doctor_shows_domain_and_mailboxes, doctor_renders_logs_pointer_section): Confirmed real fixture issue, not a regression. setup_test_env declares aimx-catchall as the catchall owner but chowns inbox/catchall/ to the test runner's uid, and never creates sent/catchall/ at all. The new checks correctly flag both (MAILBOX-DIR-OWNER-DRIFT / MAILBOX-DIR-MISSING), so the tests legitimately cannot assert .success() anymore.

Potential Bugs

None blocking. Notes:

  • run_runuser maps "Some(_)" (any non-1 exit code) from runuser to None (try fallback), including runuser-internal codes (2 "permission denied", 126 "no shell", 127 "command not found"). That's by design — runuser couldn't evaluate, so try fork. The behavior is safe but could mask genuine runuser misconfig from the operator. Non-blocker.

Test Coverage

  • Non-blocker: MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT (uid mismatch scenario), MAILBOX-DIR-GROUP-DRIFT have no dedicated unit test. These are straightforward to exercise with a tempdir + a resolver that returns a uid/gid different from the fs-reported ones.
  • Non-blocker: Dropping .success() from doctor_shows_domain_and_mailboxes and doctor_renders_logs_pointer_section weakens the exit-code contract — any future regression that makes doctor fail for an unrelated reason would still pass these tests. A cleaner fix would be either (a) make the fixture clean (create a user the test runner can chown to + sent/catchall/), or (b) assert the specific expected Fail findings rather than asserting nothing about exit status. Not a blocker because the assertions on stdout content still pin the happy-path rendering, but worth noting.
  • Non-blocker: The RealAccessRunner runuser / fork-seteuid subprocess paths are not covered by integration tests. Hard to test hermetically without a real second user, so the trait seam + mock is the right call; flagging for completeness only.

Code Quality

  • FindingSeverity::Pass is declared with #[allow(dead_code)] for future use. The comment justifies it, but it is strictly speaking dead code today. Non-blocker.
  • hook_run_as_is_orphan and check_templates_with_runner both duplicate the name == RESERVED_RUN_AS_ROOT || name == RESERVED_RUN_AS_CATCHALL short-circuit. There's already a local is_reserved_run_as helper in doctor.rs; hooks.rs inlines the check. Non-blocker cosmetic.
  • prune is a long function (~90 lines) that combines root check, doctor pre-flight, plan build, dry-run, write, SIGHUP. Still readable, but splitting the pre-flight into its own helper would make it easier to unit-test independently. Non-blocker.

Alignment with PRD

Fully aligned with the PRD §6 isolation model and §10 cleanup posture. The Warn vs Fail split mirrors the PRD's "operator cleanup is eventually-consistent" stance. The ORPHAN_CHECK_IDS contract is the right seam between doctor and prune.

Summary and Recommended Actions

Overall verdict: Ready to merge.

Blockers: none.

Non-blockers (worth addressing but not gating merge):

  1. Add unit tests for MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT, MAILBOX-DIR-GROUP-DRIFT so every documented finding ID has a direct test.
  2. Either fix the setup_test_env fixture (create sent/catchall/ and chown to a real resolvable user) or replace the dropped .success() assertions with an explicit exit-status assertion that pins the expected Fail findings. The current "no exit-status assertion" weakens the contract.

Nice-to-haves (low priority):

  • Dedupe the reserved-run_as short-circuit between doctor.rs::is_reserved_run_as and hooks.rs::hook_run_as_is_orphan.
  • Consider surfacing runuser internal errors (exit codes >1) as a distinct Info finding rather than silently falling through to the fork path, so operators can debug PAM / permission issues.
  • Split prune into preflight_non_orphan_fails + prune to ease unit-testing the preflight independently.

… split

Addresses non-blocker and nice-to-have feedback on PR #131.

Non-blockers
* Add direct unit tests for MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT,
  and MAILBOX-DIR-GROUP-DRIFT so every documented mailbox-ownership
  finding ID has a targeted test (previously only covered through the
  integration fixture drift).
* Clean up setup_test_env: both mailboxes now own by the test runner,
  all four storage dirs are created at mode 0700, so doctor exits
  success on the happy-path fixture. Restore .success() on
  doctor_shows_domain_and_mailboxes and doctor_renders_logs_pointer_section.

Nice-to-haves
* Factor the reserved-run_as short-circuit into a shared
  is_reserved_run_as() helper in config.rs; callers in doctor.rs and
  hooks.rs use it instead of duplicating the root/aimx-catchall check.
* Surface non-1 runuser exit codes (126, 127, PAM errors, etc.) via a
  tracing::debug! log with the captured stderr before falling through to
  the fork+seteuid probe, so operators can diagnose PAM/shell/permission
  issues without the signal being conflated with "binary not executable".
* Split hooks::prune into prune_preflight_check() + prune so the
  non-orphan-fail refusal logic can be unit-tested independently.
@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented Apr 22, 2026

Review fixups (commit 8df2f24)

Addressed both non-blockers and all three nice-to-haves.

Non-blockers

  1. Dedicated unit tests for every mailbox-ownership finding ID
    Added three targeted unit tests in src/doctor.rs::tests:

    • mailbox_ownership_flags_not_a_dir — stomps a regular file into inbox/alice/ and pins MAILBOX-DIR-NOT-DIR at Fail.
    • mailbox_ownership_flags_owner_uid_drift — installs a resolver that returns a sentinel uid (424242) so the filesystem-vs-config uid mismatch pins MAILBOX-DIR-OWNER-DRIFT at Fail.
    • mailbox_ownership_flags_group_drift — matches uid but returns gid + 10_000 so only MAILBOX-DIR-GROUP-DRIFT fires at Warn (plus an explicit assert!(!OWNER-DRIFT) to prove the two checks don't bleed).
  2. Fixture cleanup (option A)
    Rewrote setup_test_env: both mailboxes now own by current_username(), all four storage dirs (inbox/{catchall,alice}, sent/{catchall,alice}) are created at mode 0700. Restored .success() on doctor_shows_domain_and_mailboxes and doctor_renders_logs_pointer_section.

Nice-to-haves

  1. Deduped reserved-run_as short-circuit
    Promoted is_reserved_run_as() to pub fn in src/config.rs next to RESERVED_RUN_AS_{ROOT,CATCHALL}. doctor.rs and hooks.rs (both build_prune_plan and hook_run_as_is_orphan) now use the shared helper instead of open-coding name == ROOT \|\| name == CATCHALL.

  2. Surface runuser internal errors distinctly
    run_runuser now pipes stderr and emits a tracing::debug! log with run_as, path, exit_code, and captured stderr whenever runuser returns an exit code other than 0 or 1 (or is killed by a signal, or fails to spawn for reasons other than NotFound). Behaviour is unchanged — we still fall through to the fork+seteuid probe — but operators who enable debug logging can now diagnose PAM / missing-shell / permission issues without the signal being conflated with "binary not executable".

  3. Split prune preflight into its own helper
    Extracted prune_preflight_check(&Config) -> Result<(), PruneRefusal> in hooks.rs. prune now does root check → preflight → plan/apply/sighup in sequence; the preflight is independently testable via its return type.

Verification

  • cargo test: 1202 unit + 94 integration tests pass (1199 → 1202, +3 new).
  • cargo clippy --all-targets -- -D warnings: clean.
  • cargo fmt -- --check: clean.
  • Verifier crate untouched.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: Sprint 7 review fixups (commit 8df2f24)

All five items from the prior review are addressed. Verified each independently against the diff and by running the suite locally.

Resolved

  1. Unit tests for MAILBOX-DIR-NOT-DIR, MAILBOX-DIR-OWNER-DRIFT, MAILBOX-DIR-GROUP-DRIFT — three new tests in src/doctor.rs::tests (mailbox_ownership_flags_not_a_dir, mailbox_ownership_flags_owner_uid_drift, mailbox_ownership_flags_group_drift). Each pins the exact finding ID and severity. Owner-drift uses a resolver that returns uid 424242 so the fs-vs-resolver mismatch is unambiguous; group-drift uses gid.wrapping_add(10_000) which stays correct on hosts with high real gids; not-a-dir places a regular file at the inbox path and asserts the Fail finding. All three pass locally (70/70 doctor tests green).
  2. Fixture cleanup + .success() restoredsetup_test_env now chowns both mailboxes to the test runner's username, creates all four storage dirs (inbox/catchall, sent/catchall, inbox/alice, sent/alice) at mode 0700. .success() restored on both doctor_shows_domain_and_mailboxes and doctor_renders_logs_pointer_section. Exit-code contract is back. Both tests pass.
  3. Reserved-run_as dedupeis_reserved_run_as() promoted to pub in config.rs:23 and reused by doctor.rs:1212, hooks.rs:579 (build_prune_plan), and hooks.rs:636 (hook_run_as_is_orphan). Local duplicate in doctor.rs removed. Clean.
  4. Surface runuser internal errors distinctlyrun_runuser now pipes stderr (previously Stdio::null()) and emits tracing::debug! structured events with run_as, path, exit_code, and captured stderr for (a) non-1 exit codes, (b) signal termination, and (c) spawn failures other than NotFound. Exit code 1 ("test -x denied") still short-circuits to AccessResult::Denied, so semantics are unchanged — only observability.
  5. Prune preflight splitprune_preflight_check(config) -> Result<(), PruneRefusal> extracted with its own doc comment; prune now calls it and maps the refusal into a boxed error. Logic is byte-for-byte equivalent to the inlined version.

Still unresolved

None blocking. One minor observation on item 5: the refactor extracts the helper but no unit test was added that exercises prune_preflight_check directly. The stated rationale for the split was "so the non-orphan-fail refusal logic can be unit-tested independently" — the seam is in place, but the follow-up test did not land. Not merge-blocking; end-to-end coverage via hooks_prune_orphans_dry_run_then_apply / hooks_prune_requires_orphans_and_root still holds. Worth a trailing commit if the author wants to close the loop fully.

New issues found

None. The changes are surgical and do not touch any production code paths beyond the reserved-run_as helper consolidation and the stderr piping on runuser. Both are safe.

Verification

  • cargo fmt -- --check clean.
  • cargo clippy --all-targets -- -D warnings clean.
  • cargo test — 1202 unit + 94 integration tests pass (up from 1199, consistent with the three new doctor tests); 3 ignored tests (real-user isolation + uds_authz) remain ignored as expected.

Summary

Ready to merge. All blockers from the prior review were already absent; all four concrete non-blocker/nice-to-have items are fully addressed; the fifth (preflight split) is structurally done but its unit-test payoff is left on the table. Net change is strict quality/observability improvement with no regression risk.

Posted as a comment (not --approve) because GitHub blocks self-approval on this repo; treat as an approval.

Recommended merge commit message

[Sprint 7] aimx doctor extensions + hooks prune --orphans (#131)

S7-1 adds check_mailbox_ownership: per-mailbox getpwnam(owner) check plus
inbox/<name>/ and sent/<name>/ existence, uid, gid, and 0700 mode checks.
Stale storage dirs surface as ORPHAN-STORAGE; config mailboxes missing
storage surface as ORPHAN-CONFIG. Findings carry severities and aimx
doctor now exits non-zero on any Fail.

S7-2 adds check_templates: every [[hook_template]] is validated — run_as
resolves (or is reserved), cmd[0] is an executable file, and run_as can
access(X_OK) cmd[0] via runuser-first / fork+seteuid fallback. Non-root
hosts without runuser surface an INFO finding rather than silently
skipping.

S7-3 adds check_hook_invariants, check_catchall_user, and
translate_load_warnings. Catchall mailboxes without an aimx-catchall
user fail; legacy aimx-hook surfaces as INFO; every LoadWarning variant
is translated into a matching doctor finding.

S7-4 adds aimx hooks prune --orphans [--dry-run]. Root-only, atomic
config rewrite via write_config_atomic + SIGHUP, refuses when doctor
reports non-orphan Fail findings, idempotent on a second pass.

Covers the three branches called out in the PR #131 review:
  * clean config (no non-orphan Fail findings) returns Ok(()).
  * a non-orphan Fail finding (MAILBOX-DIR-MISSING) produces
    PruneRefusal whose message names both the check ID and the
    offending mailbox.
  * orphan-only findings (ORPHAN-TEMPLATE-RUN_AS at Warn severity)
    do not block preflight, matching the intent of the function.
@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented Apr 22, 2026

Added direct unit tests for prune_preflight_check in src/hooks.rs::tests (commit 9173450) covering the three branches called out in the review:

  • prune_preflight_check_passes_on_clean_config — config with properly-provisioned mailbox storage and no orphans returns Ok(()).
  • prune_preflight_check_refuses_on_non_orphan_fail — missing mailbox storage dir triggers MAILBOX-DIR-MISSING (Fail, not in ORPHAN_CHECK_IDS); asserts PruneRefusal.message names both the check ID and the offending mailbox.
  • prune_preflight_check_ignores_orphan_only_findings — orphan-only findings (here ORPHAN-TEMPLATE-RUN_AS) do not block the prune, which matches the intent of the function (do not refuse on the very findings hooks prune --orphans exists to clean up). The test sanity-checks that the orphan finding is actually emitted before asserting preflight passes, so it cannot pass vacuously.

Verified locally:

  • cargo test — all 1205 unit + 94 integration tests pass (ignored counts unchanged).
  • cargo clippy --all-targets -- -D warnings — clean.
  • cargo fmt -- --check — clean.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review #2: prune_preflight_check unit tests (commit 9173450)

The previously-flagged nice-to-have from the prior re-review — "no direct unit test for prune_preflight_check" — is cleanly resolved. Verified the new commit against the diff, the function under test, and the full test suite.

Resolved

  • Direct prune_preflight_check coverage — three new unit tests in src/hooks.rs::tests, each asserts a distinct branch through the helper:
    1. prune_preflight_check_passes_on_clean_config — properly-provisioned mailbox (owner resolves via the mock, inbox/alice/ + sent/alice/ created at mode 0700), empty template list. run_checks returns no Fail findings → preflight returns Ok(()).
    2. prune_preflight_check_refuses_on_non_orphan_fail — deliberately omits create_mailbox_dirs, so check_mailbox_ownership emits MAILBOX-DIR-MISSING at Fail. The check ID is not in ORPHAN_CHECK_IDS, so preflight returns Err(PruneRefusal) with a message that names both the check ID and the offending mailbox (alice) — both pinned via explicit assert! contains checks.
    3. prune_preflight_check_ignores_orphan_only_findings — adds a HookTemplate { run_as: "bob", .. } while the mock resolver does not know "bob". The test then calls run_checks directly and asserts ORPHAN-TEMPLATE-RUN_AS is actually in the findings before asserting preflight passes. That sanity-check guards against the test passing vacuously if the finding ID is ever renamed or the check is dropped.

The three tests use the shared set_test_resolver / resolver_without_bob harness already used throughout hooks.rs::tests, so the seam is consistent with the rest of the file.

Still unresolved

None.

New issues found

None. The diff is +116 / -1 in src/hooks.rs and is entirely confined to mod tests. No production code was touched, so there is no regression surface.

Minor observation, not a finding: ORPHAN_CHECK_IDS today contains only Warn-severity findings, so the !ORPHAN_CHECK_IDS.contains(&f.check) filter in prune_preflight_check is strictly redundant with the severity == Fail filter. Test 3 therefore verifies the observable contract (orphan-caused findings don't block prune) rather than the ORPHAN_CHECK_IDS filter itself. That is fine — the observable contract is the right thing to pin, and the ORPHAN_CHECK_IDS filter exists precisely so that a future change that promotes an orphan finding from Warn to Fail does not silently start blocking hooks prune --orphans. The filter is defensive rather than dead.

Verification

  • cargo test — 1205 unit + 94 integration tests pass (matches the implementer's claim); 3 unit + 5 integration tests remain ignored as expected (real-user isolation + uds_authz).
  • Scoped run cargo test prune_preflight_check — 3 passed, 0 failed.
  • cargo clippy --all-targets -- -D warnings — clean.
  • cargo fmt -- --check — clean.
  • Verifier crate untouched.

Summary

Ready to merge. All blockers, non-blockers, and nice-to-haves from both prior reviews are now closed.

Posted as a comment (not --approve) because GitHub blocks self-approval on this repo; treat as an approval.

Recommended merge commit message

[Sprint 7] aimx doctor extensions + hooks prune --orphans (#131)

S7-1 adds check_mailbox_ownership: per-mailbox getpwnam(owner) check plus
inbox/<name>/ and sent/<name>/ existence, uid, gid, and 0700 mode checks.
Stale storage dirs surface as ORPHAN-STORAGE; config mailboxes missing
storage surface as ORPHAN-CONFIG. Findings carry severities and aimx
doctor now exits non-zero on any Fail.

S7-2 adds check_templates: every [[hook_template]] is validated — run_as
resolves (or is reserved), cmd[0] is an executable file, and run_as can
access(X_OK) cmd[0] via runuser-first / fork+seteuid fallback. Non-root
hosts without runuser surface an INFO finding rather than silently
skipping.

S7-3 adds check_hook_invariants, check_catchall_user, and
translate_load_warnings. Catchall mailboxes without an aimx-catchall
user fail; legacy aimx-hook surfaces as INFO; every LoadWarning variant
is translated into a matching doctor finding.

S7-4 adds aimx hooks prune --orphans [--dry-run]. Root-only, atomic
config rewrite via write_config_atomic + SIGHUP, refuses when doctor
reports non-orphan Fail findings, idempotent on a second pass.

…owned by the reserved user

The Sprint 7 `check_catchall_user` fired a FAIL finding whenever any
catchall mailbox was configured and the `aimx-catchall` system user was
absent, regardless of the catchall's actual owner. Ingest only chowns
mail as the catchall mailbox's configured owner, so the reserved user is
only required when the operator keeps the `aimx setup` default
(`owner = "aimx-catchall"`).

The integration fixture (`setup_test_env`) sets both mailbox owners to
the current test-runner username, which means the reserved user is not
needed, yet the check still fired on CI runners where `aimx-catchall`
does not exist — breaking `doctor_shows_domain_and_mailboxes` and
`doctor_renders_logs_pointer_section` on ubuntu-latest.

Narrow the check to only fire when a catchall mailbox is owned by the
reserved `aimx-catchall` user. Update the finding message to reflect
the refined semantics, adjust the matching unit tests, and add a new
unit test covering the non-reserved-owner path.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint 7 Re-Review (CI-fix cycle, commit 35253d9)

Verdict: Ready to merge

Scope of this re-review

Commit 35253d9 narrows check_catchall_user in src/doctor.rs so CATCHALL-USER-MISSING only fires when a catchall mailbox is actually owned by the reserved aimx-catchall user. When the operator has assigned a different owner to the catchall, ingest chowns as that owner and the reserved user is irrelevant — so the check is now a no-op on that configuration.

Correctness of the narrowing

Verified the architectural claim by reading the ingest path:

  • src/ingest.rs (chown_as_owner_or_warn) → src/ownership.rs::chown_as_owner resolves mb.owner, not aimx-catchall unconditionally. The narrowing correctly matches what ingest actually needs.
  • Hook-level run_as = aimx-catchall orphan detection remains covered by LoadWarning::OrphanHookRunAs + translate_load_warnings — a completely separate code path, unaffected by this change. No gap introduced.
  • is_catchall(&config) still gates the outer check correctly.

Test coverage

  • New unit test catchall_user_check_skips_when_catchall_owner_is_not_reserved directly exercises the narrowed branch: non-reserved catchall owner + unresolvable aimx-catchall → zero findings.
  • Two existing unit tests (catchall_user_check_fires_when_user_missing, catchall_user_check_passes_when_user_exists) correctly updated to use RESERVED_RUN_AS_CATCHALL as the owner so they now exercise the reserved-user path.
  • Integration tests doctor_shows_domain_and_mailboxes (and the sibling restored .success() assertion): setup_test_env sets the catchall owner to current_username(), so with the narrowed check CATCHALL-USER-MISSING no longer fires on CI hosts that lack aimx-catchall.

CI

Run 24766341612: core-tests, integration-isolation, and verifier-tests all green.

Findings

None. No blockers, no non-blockers, no nice-to-haves beyond what has already been deferred to Sprint 7.5.

Recommended merge commit message

[Sprint 7] aimx doctor extensions + hooks prune --orphans (#131)

Adds Sprint 7 doctor extensions (mailbox ownership/dir drift, catchall
user, hook invariants, load-warning surfacing) and `aimx hooks prune
--orphans` for removing hooks that reference missing templates or
unresolvable run_as users.

Doctor now exits non-zero when any FAIL-severity finding is emitted so
CI / `doctor && serve`-style wrappers catch misconfiguration.

CI-fix in 35253d9 scopes CATCHALL-USER-MISSING to catchall mailboxes
whose configured owner is the reserved `aimx-catchall`; when the
operator has assigned a non-reserved owner, the reserved user is not
required and the check is skipped. Adds a unit test for the new
branch and updates two prior tests to use the reserved owner.

@uzyn uzyn merged commit 68dcf4a into main Apr 22, 2026
3 checks passed
@uzyn uzyn deleted the sprint-7-doctor-prune branch April 22, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant