Skip to content

feat: end-to-end install.sh with welcome banner, sudo escalation, setup + agent-setup orchestration#153

Merged
uzyn merged 13 commits into
mainfrom
worktree-binary-crafting-pancake
Apr 26, 2026
Merged

feat: end-to-end install.sh with welcome banner, sudo escalation, setup + agent-setup orchestration#153
uzyn merged 13 commits into
mainfrom
worktree-binary-crafting-pancake

Conversation

@uzyn
Copy link
Copy Markdown
Owner

@uzyn uzyn commented Apr 24, 2026

Summary

Today install.sh is a narrow, POSIX-sh download-and-place script. It fetches the release tarball, drops aimx into /usr/local/bin/, handles the upgrade-swap path, then exits with a "→ next: sudo aimx setup" hint. The operator has to run two more commands by hand (sudo aimx setup <domain>, then re-login as a user and run aimx agent-setup) before anything works.

This PR makes curl -fsSL https://aimx.email/install.sh | sh drive the whole thing in one go: a branded welcome banner, a six-item progress list, transparent sudo escalation, ordered execution, and a closing "ask your LLM to set up a mailbox" message.

After the cycle-3 update, the binary owns the welcome banner, the six-step checklist, the agent-setup drop-through, and the closing message. install.sh becomes a thin downloader: it fails fast on missing sudo (before any network call), downloads the tarball, installs the binary, and execs sudo aimx setup. The wizard inside the binary does the rest in a single, continuous flow with the right section order.

Cycle 3 update

The reviewer-approved scope from the prior pass had three real gaps:

  1. Section ordering and labels inside aimx setup did not match the spec. Before: Trust ran before TLS, DNS verification ran at the end (after DKIM + catchall), and TLS / Trust had no labeled section header at all. Now: each section reads literally Preflight checks on port 25Set up domain and DNSSet up TLS certificateSet up trust policyInstall AIMXSet up MCP for agent(s), in that order.
  2. Step 6 was double-invoked. aimx setup already drops through to aimx agent-setup via runuser, then install.sh called run_agent_setup_as_invoker a second time — the operator saw the TUI twice. The shell-side function is gone now; the binary's drop-through is the single source of truth.
  3. Banner ownership was split. install.sh printed the welcome banner + 6-line checklist + final banner, but had no way to flip individual boxes ☐ → ☑/☒ as the binary's sections completed. Now the binary owns the banner + ticking, and install.sh only prints a thin two-line "AIMX installer" line.

What changed in the cycle-3 commit:

  • src/setup.rs: added Checklist + SETUP_STEPS + print_welcome_banner / print_final_banner / print_step_list. Restructured run_setup body to follow spec order with section headers via term::header. Each section ticks ☑ on success, ☒ on skip (with a "Step N skipped" line), and the final banner reprints the 1–5 ☑ + step-6 ◐ before the agent-setup TUI takes over. On re-entry (cert + DKIM + service all present), steps 3, 4, 5 mark ☒ skipped.
  • src/setup.rs: switched drop_through_to_agent_setup from CommandExt::exec to Command::status(). Control returns to aimx setup after agent-setup exits, so the closing message ("AIMX has been set up successfully…") fires AFTER step 6 — exactly as the spec mandates. Step 6 ticks ☑ on zero exit, ☒ on non-zero or $SUDO_USER unset (warning, not fatal).
  • src/setup.rs: replaced the old single-line aimx is running for <domain>. banner with the spec's closing message, including the @<DOMAIN> substitution and the claude -p "..." example prompt.
  • src/term.rs: added StepState + step_glyph (☐/◐/☑/☒/✗ on TTY, [ ]/[~]/[x]/[-]/[!] non-TTY) — colors: pending=dim, running=yellow, done=green, skipped=cyan, error=red.
  • install.sh: deleted print_welcome_banner / print_final_banner / print_step_list / set_step / _step_glyph / _supports_unicode / STEP{N}_TITLE,STATE / run_aimx_setup / run_agent_setup_as_invoker / extract_domain / print_closing_message. Replaced the welcome banner with a thin two-line "AIMX installer" line. Moved ensure_sudo earlier in main() so a non-root invoker without sudo bails BEFORE the GitHub tag-resolution roundtrip — dry-run is honored first so unprivileged auditors can still audit. The fresh-install tail is now a single exec ${SUDO} aimx setup </dev/tty (with non-TTY fallback). Backup-existing-config still runs immediately before the exec.
  • tests/install_sh.sh: dropped tests for removed helpers; added a fail-fast test that confirms ensure_sudo exits with a sudo-named error BEFORE any GitHub network call, by stubbing curl / wget to fail loudly and asserting the marker file was never written. Updated the dry-run smoke to expect the thin banner (no checklist).
  • Rust tests: 9 new tests covering SETUP_STEPS wording, Checklist mutations, term::step_glyph (TTY + non-TTY), the welcome / final banner content and structure, the section-order invariant via source-grep, the re-entrant skipped-step contract, and the AgentSetupOutcome dispatch.
  • book/installation.md and book/setup.md: rewritten "What the installer does" / "First-time setup flow" sections to reflect the new flow (thin shell handoff into the binary's wizard, full 6-step checklist owned by the binary, closing message printed by the binary).

Verification

  • cargo fmt -- --check → clean
  • cargo clippy --all-targets -- -D warnings → clean
  • cargo test1355 binary unit tests + 95 integration tests pass (3 ignored, 5 ignored — pre-existing)
  • sh -n install.sh and dash -n install.sh → clean
  • sh tests/install_sh.sh40/40 pass (was 34, +6 from new fail-fast + thin-banner tests)
  • AIMX_DRY_RUN=1 sh install.sh → prints the thin install banner, target, tarball, install path, "would exec aimx setup", no FS changes

Out of scope

  • No backup of DKIM keys or TLS certs (config.toml only)
  • No cursor-rewrite animated progress UI (POSIX sh + pipe compatibility over live TUI; reprinting the full checklist at section transitions is the chosen rendering)
  • No splitting aimx setup into discrete subcommands (single-command flow with internal sections is enough)
  • No backwards-compat for the old section header names (this is a fresh release; the names are operator-facing UX, not API)

Test plan

  • cargo fmt --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test (1355 + 95 pass)
  • sh -n install.sh / dash -n install.sh
  • sh tests/install_sh.sh (40/40)
  • AIMX_DRY_RUN=1 sh install.sh smoke
  • Manual fresh install in an Ubuntu container: thin install header → sudo prompt → download → aimx setup welcome banner with checklist → 6 sections in spec order → checklist ticks 1→2→3→4→5 → final banner with all five ☑ → drop into agent-setup TUI as $SUDO_USER → tick 6 → closing message with @<domain>

🤖 Generated with Claude Code

Cycle 8 update

Five more fixes around the agent-wiring surface:

  • aimx setup root error rewording. The old text pointed
    operators at sudo aimx setup <domain>, which both pushed the
    (now-hidden) positional and missed the common case where someone
    with aimx already running just wanted to wire an agent. The new
    error names both paths explicitly: sudo aimx setup for the
    server, aimx agents setup (no root needed) for the agent.
  • CLI restructure: aimx agents setup|remove|list. Plural noun
    • verbs, matching mailboxes / hooks. Singular aimx agent
      works as a clap alias; the historical aimx agent-setup flat
      verb is preserved as a hidden legacy alias so existing scripts
      and install.sh invocations keep working. aimx agents list is
      a thin alias of aimx agents setup --list. The <domain>
      positional on aimx setup is now hidden in --help (still
      parses for back-compat).
  • TUI polish. Status suffix is (AIMX MCP wired) for
    consistency with the "wire" verb used elsewhere; agent name uses
    term::dim and the status suffix uses a new one-shade-deeper
    term::very_dim helper for two-shade visual hierarchy. After the
    operator presses Enter on the picker, the TUI renders a
    confirmation screen listing each task with the right verb
    (Install AIMX MCP for X / Re-install AIMX MCP for Y) and
    reads Confirm? [Y/n] before any files are written. On n the
    picker re-enters with previous selections preserved.
  • Bug fix: Claude Code wired-detection. dest_contains_aimx_entry
    only scanned top-level files in the destination directory. Claude
    Code's plugin layout puts everything under .claude-plugin/ and
    skills/ subdirectories — top level has zero files in a real
    install — so a fully-wired Claude Code install always classified
    as InstalledNotWired. Now special-cased to check
    .claude-plugin/plugin.json filename presence, mirroring Goose's
    existing aimx.yaml case.
  • New aimx agents remove <agent>. Removes plugin files under
    $HOME, drops the invoke-<agent>-<username> template over UDS,
    and prints an agent-specific cleanup hint per agent (for example
    Run claude mcp remove aimx to also unregister the MCP server from Claude Code.). Refuses root with the same
    --dangerously-allow-root escape hatch as agents setup.
    Daemon-down with the plugin files still removed exits 2 so
    scripts can detect the partial cleanup.

Cycle-8 verification

  • cargo fmt -- --check → clean
  • cargo clippy --all-targets -- -D warnings → clean
  • cargo test1393 binary unit tests + 101 integration tests + 1 uds_authz test pass (3 + 5 + 1 ignored — pre-existing)
  • sh -n install.sh / dash -n install.sh → clean
  • sh tests/install_sh.sh → 45/45 pass
  • AIMX_DRY_RUN=1 sh install.sh → prints thin install banner, target, tarball, install path, "would exec aimx setup", no FS changes

Cycle-10 update

Drops every legacy alias and renames the source files to match the canonical aimx agents … shape:

  • Drop aimx agent-setup / aimx agent-cleanup / aimx agent <verb>. All three legacy paths now error with clap's standard "unrecognized subcommand" message. Tests pin the new failure shape.
  • Rename source files via git mv (blame preserved): src/agent_setup.rsagents_setup.rs, src/agent_tui.rsagents_tui.rs, src/agent_remove.rsagents_remove.rs, src/agent_cleanup.rsagents_cleanup.rs. Internal symbols renamed in lockstep (AgentSetupOutcomeAgentsSetupOutcome, drop_through_to_agent_setupdrop_through_to_agents_setup, build_agent_setup_argvbuild_agents_setup_argv).
  • build_agents_setup_argv argv shape. Now pushes "agents" then "setup" instead of one "agent-setup" token, so the runuser drop-through inside aimx setup re-execs the canonical verb.
  • Operator-facing strings migrated. Every aimx agent-setup literal in source code, tests, embedded LLM primers (agents/common/aimx-primer.md and agents/common/references/*.md, baked into plugins via include_dir!), book pages, the top-level README, CLAUDE.md, hook-templates/defaults.toml, and install.sh now reads aimx agents setup. Where the message points at the registry dump (e.g. unknown-agent error), the canonical form is aimx agents list.
  • Primary book examples drop the <agent> positional. The TUI is the documented onboarding path; the positional remains supported in clap and is still shown in scripted / --no-interactive snippets and per-agent install tables.
  • scripts/check-docs.sh allowlist tightened. agent, agent-setup, agent-cleanup are dropped from ALLOWED_NON_VERBS, so any future doc that reintroduces the legacy form fails the docs lint.

Verification re-run on this commit: cargo fmt, cargo clippy --all-targets -- -D warnings, cargo test (1393 binary unit + 101 integration + 1 uds_authz all green; same 3 + 5 + 1 ignored as before), bash scripts/check-docs.sh, sh -n install.sh, dash -n install.sh, sh tests/install_sh.sh (45/45). Final grep over *.rs/*.md/*.sh/*.toml for aimx agent-setup, aimx agent-cleanup, and aimx agent setup yields zero hits outside the negative-confirmation integration tests that intentionally pin the legacy spellings.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #153 — end-to-end install.sh

Changes Overview

install.sh grows from a narrow download-and-place script into a full onboarding pipeline: welcome banner, up-front sudo escalation, download+install, sudo aimx setup with </dev/tty reattached, drop to $SUDO_USER for aimx agent-setup, closing message with the configured domain. Adds a POSIX-sh test harness (tests/install_sh.sh) that sources the script under INSTALL_SH_TEST=1 and exercises the helpers. The upgrade path is preserved and all previously-supported flags/env vars still work. .PLAN.md is correctly absent from the diff.

Scope Alignment

The PR meets every bullet on its own requirements list. No Rust code changes, no new subcommands, no unrelated refactors. Backup is config.toml only (not DKIM/TLS), per spec. --tag, --target, --to, --force, AIMX_VERSION, AIMX_PREFIX, AIMX_DRY_RUN, AIMX_VERBOSE, GITHUB_TOKEN all still honored. Local sh tests/install_sh.sh run: 34/34 pass.

Potential Bugs

1. sudo -u "${_invoker}" -- sh -c 'aimx agent-setup …' may write MCP config into /root, not the invoker's home — Blocker

File: install.sh line 591.

On distros where sudoers does NOT set env_reset (the non-default behavior — some custom-built minimal distros / containers), sudo -u alice preserves the invoking user's HOME=/root. aimx agent-setup then writes its agent plugin bundles (Claude Code, Codex CLI, etc.) under $HOME/.claude/…, $HOME/.config/… — i.e. into /root/.claude/ rather than /home/alice/.claude/. The invoker then cannot find the MCP config in their own shell.

Even on distros that do default to env_reset, sudo -u without -H or -i is documented as not guaranteed to reset HOME across all sudoers configurations. The robust form is sudo -u "${_invoker}" -H -- sh -c '…' or sudo -iu "${_invoker}" -- aimx agent-setup </dev/tty.

Since step 6's entire purpose is to wire MCP into the invoker's agent config, writing it to the wrong HOME silently defeats the feature. No test covers this.

Recommend: add -H (or switch to -i) and, while here, propagate a minimal env whitelist (TERM, LANG) so the TUI picker renders correctly.

2. backup_existing_config and upgrade-path commands unconditionally invoke sudoBlocker for root-without-sudo installs

File: install.sh line 272, plus every sudo … invocation in the upgrade branch (789–828) and in extract_domain (616).

ensure_sudo short-circuits with return 0 when id -u is 0 — so on a plain root shell (common in containers, Alpine minimal, Docker USER root), the script never verifies sudo is on PATH before continuing. Then:

  • backup_existing_config runs sudo mv -f "${_cfg}" "${_bak}" — fails when sudo is absent.
  • The upgrade path runs sudo sh -c '…' for stop/start service and sudo install -m 0755 …, sudo mv -f … — all fail under root-without-sudo.
  • extract_domain's fallback branch calls sudo awk.

On a root shell without sudo installed, the fresh-install path will err out of backup_existing_config even though root can trivially mv the file directly. The upgrade path will fail at stop_service (exit 127 wrapped in || echo unknown, so _init=""), then sudo install fails — partial state.

Simplest fix: early in main, pick the root of privileged commands once:

if [ "$(id -u)" -eq 0 ]; then
    SUDO=""
else
    SUDO="sudo"
fi

and use ${SUDO} mv -f …, ${SUDO} install -m 0755 … everywhere. The ensure_sudo call can then be scoped to the non-root branch. Alternatively, add sudo to the need list inside a [ "$(id -u)" -ne 0 ] guard.

Prior version of the script used install -m 0755 without sudo inside a branch that first verified -w "${PREFIX}"; the writability check is now deleted, so the only remaining path to ${PREFIX} assumes sudo is present.

3. Upgrade-path stop_service inlined in sudo sh -c drops the "no init system" warning — Non-blocker

File: install.sh lines 789–803.

The outer stop_service function (lines 446–464) prints say "warning: no systemd or OpenRC detected; skipping service stop" as a safety message. The inlined version in the upgrade branch omits that warning — silently falls through to printf unknown. Operators on non-systemd/non-openrc hosts lose the diagnostic. Also, the stop_service "systemd service present but not active" branch still prints nothing, so _init="" is captured and the later case '${_init}' in systemd) does not match — meaning if the service was stopped out-of-band before upgrade, the install-side restart is silently skipped. This was latent in the previous code too; flagged here because the rewrite makes it easier to fix (the outer stop_service is still defined and reachable).

Cleaner: keep using the outer stop_service/start_service functions and simply prepend sudo to the systemctl/rc-service calls inside them — rather than re-inlining the whole state machine twice.

4. printf systemd without newline, followed by return 0 — Non-blocker

In both the inlined and outer stop_service, printf systemd emits no trailing newline, which is fine for $(…) capture. But the && return 0 chain means if printf itself fails (EPIPE), return 0 is skipped. Extremely unlikely to bite, but if you're rewriting this block anyway for bug #2, use printf 'systemd'; return 0 unconditionally.

5. ensure_sudo hard-exits on password-prompt failure — Non-blocker

Line 233: sudo -v </dev/tty runs under set -e. If the user mistypes the password three times, sudo -v returns non-zero and the script dies with a cryptic set -e abort rather than a user-visible message. Wrap in if ! sudo -v </dev/tty; then ui_error "…"; exit 1; fi.

6. _msg / _url / _dst / _tag scope leakage — Non-blocker (style)

POSIX sh has no local. All UI helpers use bare _msg="$1", which leaks into the caller's scope. This is consistent with existing style (say, err, need), but worth noting because the new helpers are called more often now (e.g. ui_info inside the upgrade loop clobbers any outer _msg). Not a functional bug unless a caller depends on _msg across the call — none currently do.

Security

SUDO_USER ingestion as sudo -u target — Low

detect_invoker consumes $SUDO_USER verbatim and passes it to sudo -u "${_invoker}" -- sh -c '…'. The command template is single-quoted so no expansion occurs, and sudo -u rejects non-existent users. A hostile actor could set SUDO_USER=$'\nrm -rf'in the env of the install-script pipe, but they're already running the script and have whatever privilege they invoked it with, so no privilege escalation. Still, belt-and-braces: validate${invoker}matches^[a-zA-Z][a-zA-Z0-9_.-]*$before passing tosudo -u`.

backup_existing_config backup naming uses UTC timestamp, no randomness — Low

Two install.sh runs within the same second (e.g. racing CI) would collide: config.toml.bak-YYYYMMDD-HHMMSS → the second sudo mv -f clobbers the first backup. Non-exploitable but loses data. Add $$ or $RANDOM (the latter is bash-only; use date -u +%Y%m%d-%H%M%S-%N where GNU date is available, or append $$).

Test Coverage

Strengths:

  • Syntax validation under both sh -n and dash -n.
  • INSTALL_SH_TEST=1 source guard is a clean pattern for testing without running main.
  • Helper-level coverage of detect_invoker (three branches), backup_existing_config (rename + no-op-when-absent), ensure_sudo (passwordless, password-prompt-message, no-sudo-binary), parse_args.
  • Dry-run smoke asserts banner + all six step titles + "no filesystem changes."

Gaps:

  • No test for run_agent_setup_as_invoker — the function with the highest concentration of branching logic (invoker detection × TTY presence × euid). Bug #1 above would have been caught by a test that verifies sudo -u is called with -H (or that HOME is reset).
  • No test for extract_domain — the domain pulled from /etc/aimx/config.toml drives the closing message; regressions here would silently show <your-domain> in production.
  • No test for print_closing_message — operators see this on every fresh install; worth asserting the domain gets substituted.
  • No test for compare_tags — untouched by this PR but the upgrade decision depends on it; note for future.
  • backup_existing_config test uses a sudo passthrough stub — so it doesn't exercise the "root without sudo" path (bug #2). A follow-up test should point PATH at an empty dir, set id -u=0 shim, and assert the helper succeeds (after fix).
  • No test for ensure_sudo returning 0 when euid is 0 — the id -u mocking needed is awkward in POSIX sh, but doable with a wrapper script.
  • No test for --tag=value / --target=value / --to=value equals-form — only the space-separated forms are tested.

Code Quality

  • Upgrade-branch sudo sh -c '…long inlined function…' (lines 789–803, 810, 820, 824, 828) is significantly harder to read than the previous stop_service/start_service call sites. Prefer the SUDO= prefix pattern suggested in bug #2 and keep the helper functions.
  • ui_* helpers write to stderr, say writes to stderr. Consistent — good. But the closing-message claude -p "…" example is output via say (stderr), which means an operator cannot curl | sh 2>/dev/null and still see the prompt-to-paste. Minor.
  • Banner text "🐦‍⬛ AIMX installer" uses a ZWJ emoji sequence. Acceptable per author note; flagging for awareness.

Summary and Recommended Actions

Overall verdict: Needs minor fixes.

Blockers:

  1. sudo -u "${_invoker}" without -H/-i may write MCP config to /root instead of the invoker's home depending on sudoers env_reset setting (line 591).
  2. backup_existing_config, upgrade-path commands, and extract_domain all call sudo unconditionally; they will fail on a root shell that does not have sudo installed (lines 272, 616, 789–828).

Non-blockers:

  1. Upgrade-path stop_service re-inlined inside sudo sh -c drops the "no init system" warning and duplicates existing logic; refactor to reuse the outer stop_service/start_service with a SUDO= prefix.
  2. ensure_sudo password-prompt failure path relies on set -e rather than emitting a user-visible error.
  3. Missing tests for run_agent_setup_as_invoker, extract_domain, print_closing_message, and the --flag=value equals-form of parse_args.
  4. Backup timestamp has 1-second granularity; second run within the same second silently clobbers the first backup.
  5. _msg / other underscore-prefixed temporaries leak into the caller scope (POSIX limitation; consistent with existing style).

Nice-to-haves:

  • Validate ${_invoker} shape before passing to sudo -u.
  • Assert on sudo-calling behavior via test stubs rather than only asserting on user-visible messages.

@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented Apr 24, 2026

Addressed the two blockers plus three of the non-blockers in 2bba4e6.

Blockers fixed

  • B1 (install.sh:591): run_agent_setup_as_invoker now invokes sudo -H -u "${_invoker}" env HOME="${_inv_home}" sh -c '…' where _inv_home comes from getent passwd. MCP plugins now land in the invoker's home regardless of env_reset sudoers policy.
  • B2 (lines 272 / 616 / 789–828): introduced resolve_sudo_prefix that sets SUDO="" when already root and SUDO="sudo" otherwise. All privileged call sites (backup_existing_config, run_aimx_setup, extract_domain, upgrade mv/install, and stop_service/start_service) now go through ${SUDO}. Root-shell installs in minimal containers without sudo installed now succeed.

Non-blockers fixed

  • NB1: Upgrade path dropped the inlined sudo sh -c 'stop_service() { … }' state machine in favor of the outer stop_service/start_service helpers with ${SUDO} baked in — the "no init system" diagnostic is back and the logic isn't duplicated.
  • NB2: ensure_sudo emits an explicit [error] failed to obtain sudo credentials before exiting 1 when sudo -v fails, instead of a silent set -e abort.
  • NB3: New tests for run_agent_setup_as_invoker (stubs sudo with an argv-recording shim, asserts -H, -u <invoker>, and HOME=/home/<invoker> in argv), root-without-sudo path (SUDO=[] + backup succeeds), extract_domain, print_closing_message, and --flag=value equals-form of parse_args. Test count is now 49/49 passing (was 34).
  • NB4: Backup suffix now includes $$config.toml.bak-YYYYMMDD-HHMMSS-<pid> so concurrent runs in the same second no longer clobber.

Left as-is

  • NB5: _msg / underscore-prefixed temporaries leaking into caller scope. Skipped per reviewer's own note — consistent with existing POSIX-sh style across the file.

Verification

  • sh -n install.sh
  • dash -n install.sh
  • sh tests/install_sh.sh ✓ (49/49)
  • AIMX_DRY_RUN=1 sh install.sh ✓ (banner + step list + "no filesystem changes")

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: PR #153 — end-to-end install.sh (commit 2bba4e6)

Re-reviewed the diff between da4b778 and 2bba4e6, re-ran sh -n install.sh, dash -n install.sh, and sh tests/install_sh.sh against the PR head. 49/49 tests pass. All previously flagged blockers are fixed and four of the five non-blockers are addressed.

Previously Flagged Blockers — Resolved

B1: sudo -u could write MCP config into /root — FIXED

run_agent_setup_as_invoker (install.sh:608-646) now:

_inv_home="$(getent passwd "${_invoker}" 2>/dev/null | cut -d: -f6)"
if [ -z "${_inv_home}" ]; then
    _inv_home="/home/${_invoker}"
fi
sudo -H -u "${_invoker}" env HOME="${_inv_home}" sh -c 'aimx agent-setup </dev/tty' || { … }

Belt-and-braces: -H resets HOME via sudoers and env HOME=… pins it explicitly regardless of sudoers policy. The getent passwd lookup sources the authoritative home, with a sane /home/${_invoker} fallback when NSS returns nothing. The sudo stub-based test at tests/install_sh.sh:378-449 asserts [-H], [-u] [alice], and [HOME=/home/alice] all land in argv, and that the outer HOME=/tmp/bogus-outer-home does NOT leak through. The blocker is resolved.

B2: Unconditional sudo fatal on root-without-sudo — FIXED

resolve_sudo_prefix (install.sh:235-244) sets SUDO="" when euid is 0 or when sudo is absent, SUDO="sudo" otherwise. All previously-unconditional sudo invocations now go through ${SUDO}:

  • backup_existing_config${SUDO} mv -f (line 305)
  • stop_service${SUDO} systemctl stop / ${SUDO} rc-service … stop (lines 483, 491)
  • start_service${SUDO} systemctl start / ${SUDO} rc-service … start (lines 504, 508)
  • extract_domain${SUDO} awk (line 659)
  • Upgrade path: ${SUDO} mkdir -p, ${SUDO} mv -f, ${SUDO} install -m 0755 (lines 816, 840, 847, 850, 859)
  • Fresh-install path: ${SUDO} install -m 0755 (line 880)

ensure_sudo is still defined and called before resolve_sudo_prefix in main (lines 812-813), so non-root-with-sudo still caches credentials up front. Section 7c of the tests (lines 452-516) exercises the root-without-sudo path with an empty PATH (no sudo binary), stubs id -u → 0, and confirms backup_existing_config succeeds and SUDO=[] was resolved. The blocker is resolved.

The one remaining unprefixed sudo outside ensure_sudo/error messages is the intentional sudo -H -u <invoker> de-escalation at line 633 — that one must stay as raw sudo (it's dropping privilege, not escalating), and is only reached on the euid=0 branch where sudo is by definition available.

Previously Flagged Non-Blockers

  • NB1 (upgrade path re-inlined stop_service) — FIXED. The upgrade branch now calls the outer stop_service / start_service helpers (install.sh:835, 842, 852, 856, 860). The "no systemd or OpenRC detected" diagnostic (line 495) is reachable again. Logic is no longer duplicated.
  • NB2 (ensure_sudo silent set -e on wrong password) — FIXED. ensure_sudo now captures _sudo_rc from sudo -v and, on failure, emits [error] failed to obtain sudo credentials before exit 1 (lines 258-267).
  • NB3 (test coverage) — FIXED. Test count is now 49 (was 34). New sections: 7b run_agent_setup_as_invoker (argv-recording sudo shim asserts -H/-u/HOME=… pinning), 7c root-without-sudo SUDO prefix, 7d extract_domain (happy path + missing-config fallback), 7e print_closing_message (domain substitution), plus --tag=VAL / --to=VAL / --target=VAL equals-form coverage in section 7. Ran locally — 49/49 pass.
  • NB4 (backup timestamp collision) — FIXED. _bak="${_cfg}.bak-${_ts}-$$" (line 304) appends the PID; concurrent runs in the same second no longer clobber.
  • NB5 (underscore-prefixed temporary leakage) — intentionally deferred as consistent with existing POSIX-sh style. Agreed.

New Issues Found

None. No regressions introduced by the fixes.

Still Unresolved

Nothing from the prior review remains unresolved in the blocker/non-blocker tier. The nice-to-have from the first review (regex-validate ${_invoker} before passing to sudo -u) is still outstanding but was classified Low at the time and remains acceptable given sudo -u validates usernames itself.

Summary and Recommended Actions

Overall verdict: Ready to merge.

Both blockers are fixed with proper test coverage, four of five non-blockers are addressed, and the one deferred non-blocker was an explicit style choice. Local verification: sh -n install.sh ✓, dash -n install.sh ✓, sh tests/install_sh.sh ✓ (49/49).

Recommended merge commit message

feat: end-to-end install.sh with welcome banner, sudo escalation, and setup + agent-setup orchestration

Expand install.sh from a download-and-place script into the full onboarding
flow. Prints a branded welcome banner with a six-item checklist, escalates
to sudo once up front (with /dev/tty reattached so `curl | sh` still
prompts), downloads and installs the binary, runs `sudo aimx setup` for
preflight / DNS / TLS / trust / service install, drops back to the
invoking user (with HOME pinned via `sudo -H` + `env HOME=…` from
`getent passwd` so MCP config lands in the invoker's home regardless of
sudoers env_reset policy) for `aimx agent-setup`'s interactive MCP
picker, and closes with a domain-aware example prompt.

A `resolve_sudo_prefix` helper picks `SUDO=""` (already root) or
`SUDO="sudo"` (non-root) once in main() and is threaded through every
privileged call site, so root-shell installs in minimal containers
without sudo installed now succeed. The upgrade path reuses the outer
`stop_service` / `start_service` helpers (no inlined state machine) and
preserves the "no init system" diagnostic. `ensure_sudo` surfaces an
explicit error when `sudo -v` fails instead of a silent set-e abort.
Pre-existing `/etc/aimx/config.toml` is backed up to
`config.toml.bak-YYYYMMDD-HHMMSS-<pid>` (PID suffix prevents same-second
collisions); DKIM keys and TLS certs are left in place.

Add tests/install_sh.sh (49 tests: syntax, INSTALL_SH_TEST=1 source
guard, detect_invoker, backup_existing_config, ensure_sudo,
run_agent_setup_as_invoker with argv-recording sudo shim, root-without-
sudo SUDO="" path, extract_domain, print_closing_message, parse_args
including --flag=value equals-form, and AIMX_DRY_RUN=1 smoke). Script
stays POSIX sh and passes both `sh -n` and `dash -n`.

Update book/installation.md to reflect the new end-to-end flow and the
config.toml backup behavior.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review (Cycle 3): PR #153 — setup wizard moved into the binary (commit 56831fa)

Re-reviewed the diff 2bba4e6..56831fa against the cycle-3 scope (move banner / 6-step checklist / agent-setup orchestration into the binary; thin install.sh to a downloader). Local verification:

  • cargo fmt -- --check clean
  • cargo clippy --all-targets -- -D warnings clean
  • Targeted Rust tests: setup::tests 307 pass, term::tests 9 pass (covers step_glyph + the new Checklist/SETUP_STEPS/banner/argv/section-order/re-entrant-skip cases)
  • sh -n install.sh, dash -n install.sh clean
  • sh tests/install_sh.sh 40/40 pass (matches the implementer's claim)

Changes Overview

The cycle-3 commit reorganises aimx setup to (a) own the welcome banner, the six-step checklist, and the closing message, and (b) drive aimx agent-setup via Command::status() so control returns and the closing message prints after step 6. install.sh is thinned: removed print_welcome_banner / print_final_banner / print_step_list / set_step / _step_glyph / STEP{N}_* / run_aimx_setup / run_agent_setup_as_invoker / extract_domain / print_closing_message, replaced with a thin two-line print_install_banner. ensure_sudo now runs before the GitHub tag-resolution roundtrip. New term::StepState + term::step_glyph (Unicode ☐ ◐ ☑ ☒ ✗ on TTY, ASCII [ ] [~] [x] [-] [!] non-TTY). The DKIM keypair is generated earlier — in step 2 — so the DNS guidance table can render the actual p= value.

Resolved from prior cycles

  • Section ordering (Preflight → Domain & DNS → TLS → Trust → Install → MCP) — the six section_header(N, SETUP_STEPS[N-1]) calls appear in spec order at unconditional positions in run_setup. SETUP_STEPS matches the spec wording exactly.
  • Step 6 double-invocationinstall.sh no longer calls aimx agent-setup; only the binary's drop_through_to_agent_setup does. Confirmed by source-grep (run_agent_setup_as_invoker is gone; tests/install_sh.sh has a regression guard at lines 159–172).
  • TTY handoffdrop_through_to_agent_setup now uses Command::status() (not CommandExt::exec) and the runuser -u $SUDO_USER -- /proc/self/exe agent-setup invocation inherits the parent's stdin/stdout/stderr so the agent-setup TUI still gets the controlling terminal. Control returns after the TUI exits.
  • ensure_sudo reorderingmain() runs ensure_sudo + resolve_sudo_prefix only when DRY_RUN != "1", before any download call. The new fail-fast test (lines 362–442) confirms curl is never invoked before ensure_sudo exits when sudo is missing.
  • CLAUDE.md compliance — no raw .red()/.green()/.yellow()/.blue()/.bold() calls in setup.rs or term.rs outside term.rs's own helper bodies; no Sprint / User Story / Acceptance criteria strings in code or book/.

Potential Bugs

1. Step-6 final state never visible to the operator — Non-blocker

run_setup (lines 2293–2315) prints print_final_banner(&pre_handoff) with step 6 forced to Running (◐) before the drop-through. After drop_through_to_agent_setup returns, the checklist is mutated with mcp_state (Done / Skipped) and print_closing_message(&domain) is called, but print_final_banner is not reprinted. Effect: the operator sees step 6 as ◐ during the TUI handoff, and then a closing-message paragraph — but never sees step 6 flip to ☑ or ☒. The PR description's "tick 6 → closing message" sequence does not match what the code actually emits.

This contradicts the PR's stated UX contract ("operators see ☐ flip to ☑ / ☒ as a clear summary"). Fix is small: call print_final_banner(&checklist) after checklist.set(6, mcp_state) and before print_closing_message. Non-blocker because the closing message itself is informative, but worth fixing.

2. Closing-message wording diverges from spec on the trailing word "emails" — Non-blocker

Spec text from the review request:

"Your agents now have access to set up, send and receive emails from @ emails."

Implementation (print_closing_message, line 2344):

println!(
    "Your agents now have access to set up, send and receive emails from {}.",
    term::highlight(&format!("@{domain}"))
);

The trailing word "emails" is dropped, and book/setup.md matches the code rather than the spec. Meaning is unambiguous so this is non-blocking, but flagging since the user explicitly highlighted the trailing "emails" in the review brief.

3. Re-entrant gating uses two different criteria for steps 3 vs 4 — Non-blocker (informational)

Step 3 (TLS) skips when already_configured (= service_running && cert_exists && dkim_exists). Step 5 (Install) uses the same criterion. Step 4 (Trust) skips when config_path.exists() alone. On a partial install — config file exists but service hasn't been started — step 3 runs fully (TLS gen) and step 4 silently skips (trust preserved). The trust skip is actually correct (preserve operator's prior choice), but the criterion mismatch is worth a comment so a future maintainer doesn't accidentally collapse them. No functional bug.

4. Early DKIM keypair generation leaks state on Ctrl-C between step 2 and step 5 — Non-blocker

Lines 2147–2150 generate the DKIM keypair at step 2 (DNS section) so the p= value is available for the DNS table. generate_keypair(&dkim_root, false) is not internally idempotent (returns Err if private.key exists and force=false), but the call is guarded by an explicit !dkim_root.join("private.key").exists() check so re-runs are fine. finalize_setup (step 5) has the same guard at line 1496–1502 and prints "DKIM keypair already exists." Idempotent end-to-end.

The minor regression: if the operator aborts at the trust prompt (step 4) on a fresh install, /etc/aimx/dkim/{private,public}.key are now present on disk while config.toml is not. Re-running picks up where they left off (good). But "wipe and start over" now requires deleting both dkim/ and config.toml, where previously DKIM keys were only created in step 5. The implementer's own scope note marks this as an unplanned change; flagging as informational.

5. Source-grep section-order assertion only validates lexical order — Non-blocker

run_setup_emits_section_headers_in_spec_order (lines 5656–5698) finds the literal substrings "section_header(1,""section_header(6," in setup.rs's source and asserts the byte offsets are monotonically increasing. The implementer's note acknowledges that per-thread stdout capture made dup2-based output assertion impractical. The source-grep is a reasonable proxy because the section_header calls are at unconditional top-level positions in run_setup and not inside a helper or branch — but if a future refactor moves any of these calls into a helper (or inside an early-return branch), the test would silently stop testing runtime order. Worth a comment in the test pinning the assumption ("call sites must remain top-level in run_setup for this source-grep to mean what it claims").

The complementary run_setup_reentrant_path_marks_tls_trust_install_skipped test (lines 5700–5730) is also pure source-grep on checklist.set(N, term::StepState::Skipped) and "Step N skipped" literals. Same caveat.

Security

Nothing new. The Command::status() invocation passes runuser -u $SUDO_USER -- /proc/self/exe agent-setup [--data-dir …]. $SUDO_USER flows into the runuser -u argument unvalidated; runuser itself rejects malformed usernames so injection isn't possible, but the empty-string case is handled separately (line 2411) and the code never feeds $SUDO_USER into a shell. build_agent_setup_argv deliberately omits --dangerously-allow-root (regression-tested at lines 5489–5502).

Test Coverage

Strong on the new pieces:

  • setup_steps_match_spec_wording locks in the exact six titles
  • checklist_defaults_all_pending_and_supports_per_step_mutation covers the Checklist API
  • print_step_list_renders_six_titled_lines / welcome_banner_includes_title_and_step_titles / final_banner_signals_completion_and_renders_step_list lock in the banner structure (source-grep, see #5)
  • build_agent_setup_argv_* (4 tests) cover the runuser argv shape, including the never-implicit---dangerously-allow-root regression guard
  • run_setup_marks_step_six_skipped_when_sudo_user_unset confirms all three AgentSetupOutcome arms are referenced in run_setup (source-grep)
  • run_setup_reentrant_path_marks_tls_trust_install_skipped source-greps the skip-mutation contract for steps 3/4/5
  • term::tests::step_glyphs_use_unicode_on_tty / step_glyphs_use_ascii_fallback_on_non_tty cover the new Unicode/ASCII switch

Gaps:

  • No behavioural test that the closing message prints AFTER drop_through_to_agent_setup returns (only an argv-shape test plus a source-grep that the function exists). A future refactor that inadvertently moves the closing message before the handoff would not be caught.
  • No test that print_final_banner is reprinted with step-6 ☑/☒ at the end — and indeed the code does not reprint it (see Bug #1). A test for this would have caught the gap.
  • No coverage of the AgentSetupOutcome::Failed user-facing message distinction. Both Skipped and Failed map to term::StepState::Skipped, but the operator-visible eprintln for Failed differs (warning + status code vs. guidance to run agent-setup). Worth at least a snapshot-style test.

Code Quality

  • AgentSetupOutcome::Failed and AgentSetupOutcome::Skipped both render as ☒. The user-facing distinction is preserved via different eprintln wording (drop_through_to_agent_setup lines 2412–2421 vs 2449–2456). Acceptable, but I'd argue the operator would benefit from AgentSetupOutcome::Failed rendering as StepState::Error (✗ red) instead of Skipped (☒ cyan) so a TUI crash is visually distinct from a "no SUDO_USER, run it manually" advisory. Currently they share the same glyph. Non-blocker.
  • The double banner — install.sh's "AIMX installer" two-liner immediately followed by the binary's "🐦‍⬛ AIMX setup" welcome — is documented as intentional in the PR description. Acceptable.
  • print_install_banner writes to stderr (lines 115–119) consistent with the rest of install.sh. The binary's banner / step list / closing message all write to stdout. Operators piping curl | sh > log.txt 2>&1 see them interleaved correctly; piping curl | sh > log.txt only would lose the install.sh banner but capture all the binary output. Acceptable for an installer.

Summary

Verdict: Ready to merge.

The cycle-3 refactor delivers what the PR description promises: the binary owns the wizard UX, install.sh is a thin downloader, the section order matches the spec, the agent-setup drop-through returns control, and the previously flagged blockers (B1: HOME pinning; B2: root-without-sudo) remain fixed. All 40 install-script tests pass; targeted Rust tests pass; no clippy warnings; no fmt drift.

The five non-blockers above are minor (closing-message phrasing trim, step-6 banner not reprinted, source-grep scope, early DKIM key write, gating-criterion drift). None are correctness or security issues; none block merge. The implementer should consider addressing #1 (reprint final banner with terminal step-6 state) and #2 (closing-message trailing "emails") in a follow-up since they are user-facing.

Recommended merge commit message

feat: end-to-end install.sh + binary-owned setup wizard

Move the operator-facing setup UX into the `aimx setup` binary so the
same flow runs whether the operator hit the installer one-liner or
invoked `aimx setup` directly. The binary now owns the welcome banner,
the six-step checklist (☐ → ◐ → ☑ / ☒ via `term::step_glyph`, with
ASCII fallback), the agent-setup drop-through, and the closing message.
Sections render in spec order: Preflight → Domain & DNS → TLS → Trust →
Install → MCP. Re-entry on a configured host marks steps 3, 4, 5 as
skipped (☒) instead of re-running. Drop-through uses `Command::status()`
so control returns after the agent-setup TUI exits and the closing
message prints after step 6.

`install.sh` is reduced to a thin downloader: parse args → two-line
"AIMX installer" banner → platform/need checks → AIMX_DRY_RUN early
exit → `ensure_sudo` (now BEFORE the GitHub tag-resolution call so a
non-root user without sudo bails before any network roundtrip) →
`resolve_sudo_prefix` (`SUDO=""` when already root or sudo absent;
`SUDO="sudo"` otherwise) → download + extract → install binary →
`backup_existing_config` → `exec ${SUDO} aimx setup </dev/tty`
(non-TTY fallback for CI). Upgrade path unchanged — non-interactive
binary swap + service restart, reusing the outer `stop_service` /
`start_service` helpers.

Tests: 40/40 in `tests/install_sh.sh` (syntax, source-guard, helper
removal regression, fail-fast-before-network, `parse_args` equals-form,
root-without-sudo SUDO=""); new Rust tests for `SETUP_STEPS` wording,
`Checklist` mutations, `term::step_glyph` (TTY + non-TTY), welcome /
final banner structure, section-order invariant, re-entrant skipped-
step contract, `AgentSetupOutcome` dispatch, and `build_agent_setup_argv`
shape (including never-implicit-`--dangerously-allow-root`). Updates
`book/installation.md` and `book/setup.md` to reflect the binary's
ownership of the wizard.

uzyn added a commit that referenced this pull request Apr 25, 2026
…rify re-entrant gating

Five non-blocker items from the cycle-3 review on PR #153:

setup.rs (run_setup):
  - Reprint print_final_banner AFTER drop_through_to_agent_setup
    returns, so the operator actually sees step 6 flip from ◐ to
    ☑ / ☒ before the closing message. Previously the banner was only
    printed BEFORE the drop-through with step 6 forced to Running, and
    the resolved step-6 state was never visible.
  - Restore the trailing word "emails" in the closing message so it
    reads "...emails from @<DOMAIN> emails." per the spec.
  - Add a comment above the trust-step skip-gate documenting why it
    uses `config_path.exists()` instead of the composite
    `already_configured` used by steps 3 and 5: the goal is to never
    overwrite an operator's prior trust decision, even on partial-
    install re-entries where `already_configured` would be false.
  - Add CAVEAT comments above the source-grep section-order and
    re-entrant-skip tests pinning the assumption that section_header
    and checklist.set calls remain literal top-level call sites in
    run_setup. Future helper extractions need to either inline the
    helper for the test or convert to stdout-capture invariants.
  - Add closing_message_preserves_trailing_emails_word test that
    locks in both the source template and the rendered substring.
  - Add run_setup_reprints_final_banner_after_drop_through test that
    asserts run_setup contains at least two print_final_banner calls
    and that the second one comes AFTER drop_through_to_agent_setup.

book/setup.md:
  - Sync the closing-message reference with the trailing "emails" word.

book/troubleshooting.md:
  - Add a "Restarting setup from scratch" section explaining that
    aborting at the trust prompt now leaves the DKIM keypair on disk
    (because step 2 generates it early so the DNS guidance table can
    print the actual `p=` value). A hard reset means clearing both
    /etc/aimx/config.toml and /etc/aimx/dkim/, plus optionally
    /etc/ssl/aimx/ for a fresh TLS cert.

Verification:
  - cargo fmt clean
  - cargo clippy --all-targets -- -D warnings clean
  - cargo test: 1357 + 95 + ... all green; setup::tests 309 (was 307)
  - sh -n install.sh / dash -n install.sh clean
  - sh tests/install_sh.sh: 40/40 pass
  - AIMX_DRY_RUN=1 sh install.sh prints the thin banner and the
    binary handoff line as before.
@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented Apr 25, 2026

Addressed the five non-blockers from the cycle-3 re-review (commit 3cac841):

  1. Step-6 final state now visiblerun_setup reprints print_final_banner(&checklist) AFTER drop_through_to_agent_setup returns, so the operator sees step 6 flip from ◐ to ☑ / ☒ before the closing message. New regression test run_setup_reprints_final_banner_after_drop_through enforces that there are two print_final_banner call sites and that the second one comes AFTER the drop-through.
  2. Closing message trailing "emails" restoredprint_closing_message now emits "...emails from @<DOMAIN> emails." to match the spec wording. book/setup.md is updated to agree with the code. New test closing_message_preserves_trailing_emails_word locks in both the source template and the rendered substring.
  3. Re-entrant gating discrepancy explained — added a comment above the trust-step skip-gate noting that it deliberately uses config_path.exists() instead of the composite already_configured used by steps 3 and 5, so an operator's prior trust decision is preserved even on partial-install re-entries.
  4. DKIM-on-Ctrl-C operator notebook/troubleshooting.md gains a "Restarting setup from scratch" section explaining that a hard reset now requires clearing /etc/aimx/config.toml and /etc/aimx/dkim/ together (and optionally /etc/ssl/aimx/ for a fresh TLS cert), because step 2 generates the DKIM keypair early so the DNS guidance table can render the actual p= value.
  5. Source-grep test scope pinned — added CAVEAT doc comments above run_setup_emits_section_headers_in_spec_order and run_setup_reentrant_path_marks_tls_trust_install_skipped documenting the assumption that the asserted call sites remain literal top-level calls in run_setup; future helper extractions need to either inline the helper for the test or convert to stdout-capture invariants.

Verification (all green):

  • cargo fmt -- --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test: 1357 unit + 95 integration tests pass; setup::tests 309 (was 307)
  • sh -n install.sh, dash -n install.sh
  • sh tests/install_sh.sh: 40/40
  • AIMX_DRY_RUN=1 sh install.sh unchanged

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: PR #153 — cycle 4 (3cac841)

Changes Overview

Cycle-4 commit (3cac841) addresses the 5 non-blockers from the cycle-3 review. Touches src/setup.rs (+96 lines, including 2 new tests), book/setup.md (1 line), and book/troubleshooting.md (+20 lines). No install.sh changes, so the previously-resolved cycle-2 blockers (HOME pinning for agent-setup, ${SUDO} prefix for root-without-sudo) cannot regress in this diff.

Resolution of Cycle-3 Non-Blockers

NB1 — Step 6 final state never visible — FIXED.
Verified runtime order in run_setup (src/setup.rs:2300-2326):

  1. line 2304: print_final_banner(&pre_handoff) with step 6 forced to Running (◐) — interim banner before handoff.
  2. line 2313: drop_through_to_agent_setup(...) — control returns when the TUI exits (because Command::status() is used, not exec).
  3. line 2319: checklist.set(6, mcp_state) resolves step 6 to Done / Skipped based on the agent-setup outcome.
  4. line 2323: print_final_banner(&checklist) reprints the banner with step 6 in its terminal state — operator now sees the ◐→☑/☒ flip.
  5. line 2326: print_closing_message(&domain) fires last.

The test run_setup_reprints_final_banner_after_drop_through (src/setup.rs:5462-5497) source-greps the run_setup body to assert (a) ≥2 print_final_banner( call sites, and (b) at least one of them comes AFTER drop_through_to_agent_setup(. Both assertions hold against the current source. The test passes (cargo test setup::tests::run_setup_reprints_final_banner_after_drop_through).

NB2 — Closing message dropped trailing "emails" — FIXED.
src/setup.rs:2354-2357 now reads:

println!(
    "Your agents now have access to set up, send and receive emails from {} emails.",
    term::highlight(&format!("@{domain}"))
);

Test closing_message_preserves_trailing_emails_word (src/setup.rs:5435-5459) asserts the literal substring "emails from @{domain} emails." against both the source template and the rendered output (with domain = "agent.example.com" substituted). Not a fuzzy match — it's rendered.contains(&format!("emails from @{domain} emails.")), which is a literal-string String::contains call. book/setup.md:76 agrees with the code.

NB3 — Re-entrant gating uses two different criteria — DOCUMENTED.
src/setup.rs:2244-2250 adds a 7-line comment above the trust-step skip-gate explaining that trust uses config_path.exists() (broader, preserves operator decision) while steps 3 and 5 use the composite already_configured (stricter, only skips when full prior install exists). Code-clarity fix only; no behavior change. Reads naturally and is in the right place.

NB4 — Early DKIM keys leak on Ctrl-C — DOCUMENTED.
book/troubleshooting.md:40-58 adds a "Restarting setup from scratch" section that explains why the DKIM keypair persists after an aborted run (step 2 generates it early so the DNS guidance table can render the actual p= value), and gives a copy-pasteable hard-reset recipe (rm -rf /etc/aimx/config.toml /etc/aimx/dkim/, optional /etc/ssl/aimx/). Operator-facing guidance is sufficient; the underlying lifecycle quirk is intentional.

NB5 — Source-grep tests don't validate runtime order — DOCUMENTED.
CAVEAT comments added above run_setup_emits_section_headers_in_spec_order (src/setup.rs:5731-5739) and run_setup_reentrant_path_marks_tls_trust_install_skipped (src/setup.rs:5784-5790). Each comment names the assumption (literal top-level call sites in run_setup) and states the recovery path if the assumption ever breaks (inline the helper for the test, OR convert to a stdout-capture invariant). Documentation-only, no behavior change.

Verification

All checks clean against 3cac841:

  • cargo fmt -- --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test — 1357 unit + 95 integration + smaller crates all pass; 0 failures
  • The two new named tests pass individually:
    • setup::tests::closing_message_preserves_trailing_emails_word — 1 passed
    • setup::tests::run_setup_reprints_final_banner_after_drop_through — 1 passed
  • sh -n install.sh — clean
  • dash -n install.sh — clean
  • sh tests/install_sh.sh40/40 passed

No Regressions

The cycle-4 commit only touches book/setup.md, book/troubleshooting.md, and src/setup.rs. The previously-resolved cycle-2 blockers (B1 HOME pinning via runuser -u <sudo_user>, which inherits HOME from the user's passwd entry — equivalent to the sudo -H flag; B2 ${SUDO} prefix for root-without-sudo) live in install.sh and drop_through_to_agent_setup's runuser invocation respectively, and neither is touched.

Summary and Recommended Actions

  • Overall verdict: Ready to merge.
  • Blockers: none.
  • Non-blockers: none — all 5 from the cycle-3 review are resolved or properly documented.
  • Nice-to-haves: none worth deferring.

Recommended merge commit message

feat: end-to-end install.sh + binary-owned setup wizard with branded onboarding

Expand the one-line installer from a narrow download-and-place script into a
full onboarding pipeline. install.sh becomes a thin downloader: ensure-sudo
fail-fast (before any GitHub roundtrip), download the tarball, install the
binary, exec sudo aimx setup. The binary owns the welcome banner, the six-step
checklist (☐/◐/☑/☒/✗ glyphs with ASCII fallback), the section ordering
(Preflight → Domain & DNS → TLS → Trust → Install → MCP), the runuser-based
drop-through to aimx agent-setup as $SUDO_USER, and the closing message with
@<DOMAIN> substitution.

Setup is idempotent on re-entry: TLS / DKIM / install steps skip when a prior
install is detected; trust is preserved whenever any config exists. Step 6
fires Command::status() so control returns to the wizard, the final banner
reprints with the resolved step-6 state, and the closing message lands last.

install.sh: HTTPS-only download, --tag/--target/--to/--force flags +
AIMX_VERSION/AIMX_PREFIX/AIMX_DRY_RUN/AIMX_VERBOSE/GITHUB_TOKEN env vars
preserved, atomic upgrade swap (stop_service → mv → start_service) intact,
config.toml backup with $$ collision suffix, ${SUDO} prefix for root-without-
sudo containers, sudo -v with /dev/tty reattached so curl|sh prompts work.

Tests: 1357 unit + 95 integration + 40 install.sh shell tests all green.
sh -n / dash -n clean; cargo fmt / clippy clean.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: PR #153 — cycle 5 (bcac18d)

Changes Overview

Cycle-5 commit (bcac18d) addresses four UX gaps:

  1. prompt_domain and prompt_trusted_senders are reworked into a block-prompt → echo → [Y/n] confirm → loop pattern via three new private helpers (prompt_confirm, prompt_confirm_list, read_yn_line). Invalid input no longer cancels — it loops; declining the echo re-prompts; empty trusted-senders surfaces an explicit "Leave hooks disabled? [Y/n]" instead of silently disabling them.
  2. Each Done / Skipped step in run_setup redraws the full six-row checklist via new helpers print_step_complete(n, &checklist) and print_step_skipped(n, &checklist, reason) that wrap a status line + print_step_list redraw + bracketing blanks.
  3. drop_through_to_agent_setup now resolves argv[0] via std::env::current_exe() instead of the literal /proc/self/exe so runuser no longer re-execs /usr/sbin/runuser and dies with exe: user agent-setup does not exist. On the rare current_exe() failure the wizard surfaces the error and lands step 6 as Skipped rather than hardcoding /usr/local/bin/aimx (which would trip the install_service_file_has_no_silent_fallback regression guard).
  4. agent_tui::render header is reworded to "Setting up MCP integration for AI agents for <USER>." + sub-line "Select which AI agents you want to set up AIMX MCP for:". The "Space toggles..." hint moved below the rows, right above the cursor. interact's clear_last_lines is bumped from rows.len() + 2 to rows.len() + 4 to match the new layout (3-line header band + N rows + 1-line trailing hint). Username flows in via Option<&str> (resolved at the run_tui callsite via env.caller_username()), with a literal <user> fallback when the lookup returns None. The doc-comment example in agent_tui.rs is updated to match.

14 new tests added (1357 → 1371 unit tests); install.sh tests grew 40 → 45 between cycle 4 and cycle 5 (the +5 are from ac36cf5's install.sh keyboard-hang fix, not from bcac18d).

Verification

All checks clean against bcac18d:

  • cargo fmt -- --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test1371 unit + 95 integration + others all pass; 0 failures; setup::tests::prompt_* family runs green; new agent_tui::tests::render_* cases pass
  • sh tests/install_sh.sh45/45 pass (the +5 vs cycle 4 are from ac36cf5's [ -t 0 ] gating fix for curl|sh, unrelated to cycle 5)

Resolution of Cycle-5 scope

1. Block-form prompt + echo + [Y/n] confirm — DONE

prompt_domain (src/setup.rs:1848-1883) wraps the read in a loop. The block form is:

Enter the domain you want to use for email
(e.g. agent.example.com).
> 

Then the operator's input goes through validate_domain (failure prints ✗ <reason> and continues — no abort), through prompt_confirm(reader, "Domain", &domain) (echo + [Y/n]), and finally through the existing read_yn_line for the DNS-access sanity check. Sanity-check kept its existing inline question form deliberately, which I think is correct — the secondary "Do you control this domain?" gate is supplementary to the echo+confirm pattern, not redundant with it.

prompt_trusted_senders (src/setup.rs:1771-1816) adopts the same shape: block prompt → > cursor → on Ok(senders) non-empty, prompt_confirm_list echoes Trusted senders: + bullet list + Confirm? [Y/n]; on Ok(empty), prints the warning then read_yn_line(reader, "Leave hooks disabled?") so empty input is no longer a silent commit. On invalid sender, the existing MAX_TRUSTED_SENDERS_ATTEMPTS budget still applies; the cycle-5 reset of attempt = 0 on operator-decline is correct (only invalid-parse entries should burn the budget; "let me try a different list" is operator-driven and shouldn't trip the abort threshold).

The three helpers (prompt_confirm, prompt_confirm_list, read_yn_line) all use the same !buf.trim().chars().next().is_some_and(|c| c == 'n' || c == 'N') form, which means:

  • blank Enter → YES (default)
  • y / yes / any input not starting with n/N → YES
  • n / no / nope / N → NO

Consistent with previous behavior; confirmed by the six dedicated prompt_confirm_yn_* tests at lines 4858-4906.

2. Per-step checklist redraw — DONE

print_step_complete and print_step_skipped (src/setup.rs:2112-2134) emit blank line + status line + blank + 6-row checklist + blank. Six call-sites in run_setup (steps 1, 2, 3, 4, 5; some Skipped) all route through these helpers. The print_step_complete_emits_full_checklist test (line 4894) source-greps to verify (a) the helper redraws the full list and (b) no straggling raw println!("Step N complete.") survived in run_setup.

This adds ~54 lines of output across the 6 steps (9 lines × 6) plus the welcome+final banner redraws (~12 lines). It's verbose but matches the implementer's stated "operators see ☐ flip to ☑/☒ as a clear summary after every section" UX promise. I flagged this in the brief as "potential redraw spam" — I think it's fine as-is for a one-shot wizard run; it would be different if it ran in a tight loop.

3. runuser /proc/self/exe bug fix — DONE, deviation is sound

drop_through_to_agent_setup (src/setup.rs:2547-2636) now resolves argv[0] in our process context via std::env::current_exe() (line 2575). The deviation from the plan's unwrap_or_else(_ -> /usr/local/bin/aimx) fallback is the right call:

  • The codebase already has a install_service_file_has_no_silent_fallback regression guard (src/setup.rs:5658-5754) that AST-walks setup.rs and asserts the literal /usr/local/bin/aimx does NOT appear outside #[cfg(test)] modules. The plan's hardcoded fallback would have tripped it.
  • The graceful-skip path (lines 2576-2590) is consistent with how get_aimx_binary_path (line 419) already handles current_exe() failures — that one propagates via ? and aborts service-file install. The drop-through can't abort because step 6 is best-effort (operator can re-run agent-setup manually); skipping with a guidance message is the right operator-visible behavior.
  • The operator-visible message is actionable: prints a warning: line with the underlying io::Error, then a prompt-mark line "Run aimx agent-setup as <sudo_user> to wire aimx into Claude Code, Codex, etc." — covers the recovery path.
  • On a healthy box, current_exe() essentially never fails (it's a thin wrapper around /proc/self/exe readlink on Linux; failure paths are chroot without /proc mounted, or sandboxes that block the syscall). So the only behavior change vs. the spec'd fallback is in scenarios where the spec's fallback would have also been wrong (the literal /usr/local/bin/aimx may not exist under custom AIMX_PREFIX).

The renamed test build_agent_setup_argv_uses_resolved_exe_path (src/setup.rs:5887-5926) replaces the old ..._proc_self_exe_shape and asserts (a) std::env::current_exe() is called somewhere in setup.rs, and (b) the literal PathBuf::from("/proc/self/exe") does NOT appear inside the drop_through_to_agent_setup body. Both assertions are scoped tightly enough not to false-positive on doc comments elsewhere.

4. agent_tui::render header rewording — DONE, deviation is acceptable

render (src/agent_tui.rs:339-396) now takes username: Option<&str> directly rather than &dyn AgentEnv. The interact function threads it through from run_tui's call site (line 131-132), where env.caller_username() is invoked once. The deviation from the plan's "thread &dyn AgentEnv" is fine:

  • AgentEnv is a trait with many methods, only one of which (caller_username) is needed in render. Threading the entire trait would couple render to the wider AgentEnv surface for no benefit.
  • Resolving once at the run_tui call site is correct — caller_username() is fixed for the lifetime of the process (it's getpwuid(geteuid())), so re-resolving on every redraw would be wasteful.
  • The Option<&str> shape signals "may be absent" at the type level. The unwrap_or("<user>") fallback at line 345 produces Setting up MCP integration for AI agents for \`.` when the passwd lookup returns None.

The <user> fallback isn't ideal (the user brief asked for "your user" or omit-entirely). It's not wrong — the angle-bracket form is a recognizable placeholder convention — but it's not maximally graceful either. Non-blocker, flagged below.

The redraw-bounds bump in interact (src/agent_tui.rs:294: rows.len() + 2rows.len() + 4) is correct: the new layout is banner-line + sub-line + blank + N rows + hint = N + 4. Verified by reading render body.

The doc-comment example at lines 14-25 matches the live render output exactly (including the Setting up MCP integration for AI agents for \ubuntu`.` header, the sub-line, the blank, the rows, the blank, and the trailing hint with the arrow).

New issues found

NB1 — <user> fallback when caller_username() returns None — Non-blocker

render falls back to a literal <user> token (src/agent_tui.rs:345):

Setting up MCP integration for AI agents for `<user>`.

This is a recognizable placeholder convention but reads slightly awkwardly. Two readable alternatives:

  • term_handle.write_line(&term::header("Setting up MCP integration for AI agents.").to_string())? when username.is_none() — drop the trailing for X. clause entirely.
  • Replace <user> with your user so the sentence reads naturally.

In practice this branch is essentially never taken (every container with passwd-resolvable user hits the Some(...) path), so this is purely cosmetic for the long tail of edge cases. Non-blocker.

NB2 — prompt_confirm first-character-only check accepts typos starting with y/anything-else — Non-blocker (informational, pre-existing)

The check !buf.trim().chars().next().is_some_and(|c| c == 'n' || c == 'N') means nope / no / n → NO, but anything else → YES, including typos like s (slip on the keyboard), q, 1, etc. The existing prompt_domain_* tests document this contract via prompt_confirm_yn_parses_no_word (covers no\n) but no test covers a deliberate typo. Pre-existing pattern in the codebase (the previous prompt_domain confirm logic was identical), so the cycle-5 commit is just inheriting it. Non-blocker.

NB3 — prompt_confirm_list and prompt_confirm have copy-pasted parser logic — Non-blocker (style)

Both helpers parse the response with the same first-char-of-trim is_some_and(|c| c == 'n' || c == 'N') chain. So does read_yn_line. Consider extracting a parse_yn_default_yes(buf: &str) -> bool helper. Three call sites is the threshold where extraction starts paying for itself. Non-blocker; readability tweak only.

NB4 — Output stream consistency: status messages on stdout, warnings on stderr — Non-blocker (informational)

print_step_complete / print_step_skipped use println! (stdout). The current_exe() failure path in drop_through_to_agent_setup uses eprintln! for the warning but println! for the recovery prompt. This is consistent with how the rest of run_setup handles things — happy-path narration on stdout, errors on stderr — so no change needed. Flagging only because the user brief asked me to spot-check.

Mergeable status

gh pr view 153 --json mergeable,headRefOid returns:

  • mergeable: "CONFLICTING"
  • mergeStateStatus: "DIRTY"
  • headRefOid: "bcac18d84601955a582b300c7232ab84638db7c0"

The PR currently conflicts with main. This is not a code-quality issue in the cycle-5 commit; it's because main advanced during the cycle. Operator action: rebase onto current main and force-push. No code change needed.

Things that remain fixed from earlier cycles

Spot-checked — all previously-resolved blockers from cycles 2-4 remain in place:

  • HOME pinning (cycle-2 B1): runuser -u $SUDO_USER -- <exe> agent-setup inherits HOME from getpwuid(SUDO_USER).pw_dir per runuser's standard behavior, equivalent to sudo -H -u. No regression.
  • Root-without-sudo (cycle-2 B2): ${SUDO} prefix in install.sh still resolved via resolve_sudo_prefix. No regression.
  • TTY-stdin-isatty gate (cycle-3 fix in ac36cf5): [ -t 0 ] gating on </dev/tty redirection still in place. No regression.
  • Step-6 final state visible (cycle-3 NB1): print_final_banner(&checklist) reprint after drop_through_to_agent_setup returns, at the same call-site (src/setup.rs around line 2459). Still in place.
  • Closing-message trailing "emails" (cycle-3 NB2): "...emails from @<DOMAIN> emails." still in print_closing_message.
  • Section ordering (cycle-3): six section_header(N, SETUP_STEPS[N-1]) calls in spec order. No regression.
  • build_agent_setup_argv no implicit --dangerously-allow-root (cycle-3): regression guard still asserted.

Summary and Recommended Actions

  • Overall verdict: Ready to merge — pending rebase.
  • Blockers: none.
  • Non-blockers:
    • NB1 (<user> placeholder fallback) — cosmetic
    • NB3 (copy-paste in three confirm helpers) — readability tweak
  • Nice-to-haves: none worth deferring.
  • Mergeable note: PR is currently CONFLICTING/DIRTY against main. Rebase and force-push before merging — no code changes required.

Recommended merge commit message

feat: end-to-end install.sh + binary-owned setup wizard with confirm-after-prompt UX

Expand the one-line installer from a narrow download-and-place script into
a full onboarding pipeline. install.sh becomes a thin downloader: ensure-
sudo fail-fast (before any GitHub roundtrip), download the tarball,
install the binary, exec sudo aimx setup. The binary owns the welcome
banner, the six-step checklist (☐/◐/☑/☒/✗ glyphs with ASCII fallback),
the section ordering (Preflight → Domain & DNS → TLS → Trust → Install →
MCP), the runuser-based drop-through to aimx agent-setup as $SUDO_USER,
and the closing message with @<DOMAIN> substitution.

Setup is idempotent on re-entry: TLS / DKIM / install steps skip when a
prior install is detected; trust is preserved whenever any config exists.
Step 6 fires Command::status() so control returns to the wizard, the
final banner reprints with the resolved step-6 state, and the closing
message lands last.

Operator UX:
- prompt_domain and prompt_trusted_senders use a block-prompt + echo +
  [Y/n] confirm + loop pattern. Invalid input re-prompts (no abort);
  declining the echo re-prompts; empty trusted-senders surfaces an
  explicit "Leave hooks disabled?" rather than silently disabling them.
- Each Done / Skipped step in run_setup redraws the full six-row
  checklist so progress is visible after every section.
- drop_through_to_agent_setup resolves argv[0] via std::env::current_exe()
  instead of the literal /proc/self/exe (runuser would otherwise re-exec
  /usr/sbin/runuser and parse "agent-setup" as a username).
- agent_tui header reworded to "Setting up MCP integration for AI agents
  for `<user>`." with a "Select which AI agents you want to set up AIMX
  MCP for:" sub-line; the Space/Enter/q hint moved below the rows.

install.sh: HTTPS-only download, --tag/--target/--to/--force flags +
AIMX_VERSION/AIMX_PREFIX/AIMX_DRY_RUN/AIMX_VERBOSE/GITHUB_TOKEN env vars
preserved, atomic upgrade swap (stop_service → mv → start_service)
intact, config.toml backup with $$ collision suffix, ${SUDO} prefix for
root-without-sudo containers, sudo -v with /dev/tty reattached so
curl|sh prompts work, [ -t 0 ] gating so direct `./install.sh` invocations
don't double-attach /dev/tty (which would deadlock under sudo use_pty).

Tests: 1371 unit + 95 integration + 45 install.sh shell tests all green.
sh -n / dash -n clean; cargo fmt / clippy clean.

uzyn added 8 commits April 25, 2026 08:27
… + agent-setup orchestration

Expand the one-line installer from a narrow download-and-place script
into the full onboarding flow. Prints a branded welcome banner with a
six-item checklist, escalates to sudo once up front (with /dev/tty
reattached so `curl | sh` still gets a password prompt), downloads and
installs the binary, runs `sudo aimx setup` for preflight / DNS / TLS /
trust / service install, drops back to the invoking user for
`aimx agent-setup`'s interactive MCP picker, and closes with a
domain-aware example prompt. Preserves the existing upgrade path
(service stop -> atomic swap -> restart), all flags and env vars
(--tag / --target / --to / --force, AIMX_VERSION / AIMX_PREFIX /
AIMX_DRY_RUN / AIMX_VERBOSE / GITHUB_TOKEN), and HTTPS-only download
policy. Backs up any pre-existing /etc/aimx/config.toml to
config.toml.bak-YYYYMMDD-HHMMSS before setup runs; DKIM keys and TLS
certs are left in place so deliverability survives re-runs.

Add tests/install_sh.sh which sources install.sh under
INSTALL_SH_TEST=1 (new guard that suppresses main auto-invocation)
and unit-tests detect_invoker, backup_existing_config, ensure_sudo,
parse_args, plus an AIMX_DRY_RUN=1 smoke that confirms the banner,
step list, and absence of filesystem changes. Script stays POSIX sh
(no bashisms) and passes both `sh -n` and `dash -n`.

Update book/installation.md to reflect the new end-to-end flow and the
config.toml backup behavior.
- Drop into invoker with `sudo -H -u <user> env HOME=<passwd-home>` so
  `aimx agent-setup` writes MCP config under the invoker's home instead
  of /root on distros that don't force env_reset in sudoers.
- Resolve a $SUDO prefix once in main() and thread it through every
  privileged call site (backup_existing_config, run_aimx_setup,
  extract_domain, stop_service/start_service, upgrade mv/install).
  Scripts running as root on containers without sudo now succeed.
- Refactor upgrade path to reuse stop_service/start_service helpers
  (removes the inlined sudo sh -c state machine and restores the
  "no init system" diagnostic).
- ensure_sudo surfaces an explicit error when sudo -v fails instead
  of a silent set -e abort.
- Append $$ to config.toml backup suffix so concurrent runs cannot
  collide within the same second.
- Tests: cover run_agent_setup_as_invoker (asserts -H / -u / HOME
  pinning via a sudo argv-recording shim), SUDO="" on root-without-sudo,
  --flag=value equals-form of parse_args, extract_domain, and
  print_closing_message.
The welcome banner now uses ☐ / ◐ / ☑ / ☒ / ✗ glyphs (with [ ] / [~] /
[x] / [-] / [!] ASCII fallbacks for non-UTF-8 locales) and is reprinted
once at the end with the final per-step states, so operators see ☐ flip
to ☑ / ☒ as a clear summary. Each of run_aimx_setup, the upgrade path,
and run_agent_setup_as_invoker mutates STEP{1..6}_STATE at the right
boundaries (running → done / skipped / error) so both the closing banner
and the upgrade-path banner reflect what actually happened.

Locale detection caches the UTF-8 decision in _AIMX_UTF8 once at import
time so banner prints stay cheap. Color emission still goes through
_ui_color_enabled, so NO_COLOR=1 and non-TTY stderr keep producing
plain glyphs.

Tests grow from 49 → 70: glyph coverage in both UTF-8 and ASCII-fallback
locales, set_step / print_step_list state mutation reflected in output,
print_welcome_banner asserts only pending glyphs render initially, and
print_final_banner asserts the post-run states render with the
completion title.
The shell script previously owned the welcome banner, the six-step
checklist, the agent-setup invocation, and the closing message. That
left two real problems: install.sh had no way to flip individual boxes
☐→☑ as `aimx setup`'s sections completed, and step 6 was being
double-invoked because the binary already drops through to
`aimx agent-setup` on its own.

This commit moves all of that into `aimx setup` so the binary works the
same end-to-end whether you ran it via the installer or directly.

setup.rs:
  - Adds Checklist + SETUP_STEPS plus print_welcome_banner /
    print_final_banner / print_step_list. Sections render in spec order:
    Preflight → Domain & DNS → TLS → Trust → Install → MCP, with each
    section header reading the literal spec wording.
  - Reorders run_setup so the trust prompt comes after TLS and the DNS
    guidance + verify loop sit under "Set up domain and DNS".
  - On re-entry (cert + DKIM + service all present) steps 3, 4, 5 mark
    ☒ skipped instead of re-running.
  - Switches drop_through_to_agent_setup from CommandExt::exec to
    Command::status() so control returns to `aimx setup` and the closing
    message can fire AFTER the agent-setup TUI exits. Step 6 ticks ☑ on
    zero exit, ☒ on non-zero or `$SUDO_USER` unset (warning, not fatal).
  - Adds a closing message with `@<DOMAIN>` substitution and the
    `claude -p "..."` example. Removes the old single-line success
    banner.

term.rs:
  - Adds StepState + step_glyph: ☐ pending / ◐ running / ☑ done /
    ☒ skipped / ✗ error on TTY, [ ] / [~] / [x] / [-] / [!] non-TTY.

install.sh:
  - Drops print_welcome_banner / print_final_banner / print_step_list /
    set_step / _step_glyph / _supports_unicode / STEP{N}_TITLE,STATE /
    run_aimx_setup / run_agent_setup_as_invoker / extract_domain /
    print_closing_message. The binary owns all of this now.
  - Replaces them with a thin two-line "AIMX installer" banner.
  - Moves ensure_sudo earlier in main() so a non-root invoker without
    sudo bails BEFORE the GitHub tag-resolution roundtrip. Dry-run is
    honored first so unprivileged auditors can still audit.
  - Fresh-install tail is now a single `exec ${SUDO} aimx setup
    </dev/tty` (with non-TTY fallback). Backup-existing-config still
    runs immediately before the exec.
  - Upgrade path stays non-interactive; binary swap + service restart
    only.

tests/install_sh.sh:
  - Drops tests for removed helpers; adds a fail-fast test that
    confirms ensure_sudo bails before the GitHub network call when
    sudo is missing on a non-root box.
  - Updates the dry-run smoke to expect the thin banner (no checklist).

book/installation.md and book/setup.md updated to reflect the binary's
ownership of the wizard UX, the spec section ordering, and the closing
message.
…rify re-entrant gating

Five non-blocker items from the cycle-3 review on PR #153:

setup.rs (run_setup):
  - Reprint print_final_banner AFTER drop_through_to_agent_setup
    returns, so the operator actually sees step 6 flip from ◐ to
    ☑ / ☒ before the closing message. Previously the banner was only
    printed BEFORE the drop-through with step 6 forced to Running, and
    the resolved step-6 state was never visible.
  - Restore the trailing word "emails" in the closing message so it
    reads "...emails from @<DOMAIN> emails." per the spec.
  - Add a comment above the trust-step skip-gate documenting why it
    uses `config_path.exists()` instead of the composite
    `already_configured` used by steps 3 and 5: the goal is to never
    overwrite an operator's prior trust decision, even on partial-
    install re-entries where `already_configured` would be false.
  - Add CAVEAT comments above the source-grep section-order and
    re-entrant-skip tests pinning the assumption that section_header
    and checklist.set calls remain literal top-level call sites in
    run_setup. Future helper extractions need to either inline the
    helper for the test or convert to stdout-capture invariants.
  - Add closing_message_preserves_trailing_emails_word test that
    locks in both the source template and the rendered substring.
  - Add run_setup_reprints_final_banner_after_drop_through test that
    asserts run_setup contains at least two print_final_banner calls
    and that the second one comes AFTER drop_through_to_agent_setup.

book/setup.md:
  - Sync the closing-message reference with the trailing "emails" word.

book/troubleshooting.md:
  - Add a "Restarting setup from scratch" section explaining that
    aborting at the trust prompt now leaves the DKIM keypair on disk
    (because step 2 generates it early so the DNS guidance table can
    print the actual `p=` value). A hard reset means clearing both
    /etc/aimx/config.toml and /etc/aimx/dkim/, plus optionally
    /etc/ssl/aimx/ for a fresh TLS cert.

Verification:
  - cargo fmt clean
  - cargo clippy --all-targets -- -D warnings clean
  - cargo test: 1357 + 95 + ... all green; setup::tests 309 (was 307)
  - sh -n install.sh / dash -n install.sh clean
  - sh tests/install_sh.sh: 40/40 pass
  - AIMX_DRY_RUN=1 sh install.sh prints the thin banner and the
    binary handoff line as before.
The post-install handoff was unconditionally re-pointing stdin at
/dev/tty when the controlling terminal existed:

    if has_tty; then
        exec ${SUDO} aimx setup </dev/tty

That breaks on modern Ubuntu/Debian where sudo ships with `Defaults
use_pty`. With use_pty, sudo allocates a pty and bridges its own
stdin↔pty-master to the child. When the operator runs `./install.sh`
directly (stdin already the terminal), redirecting `</dev/tty` opens a
separate fd to the same controlling terminal — sudo bridges from that
fd while keystrokes are queued on the original fd 0, the child blocks
on read forever, and the foreground process group disconnect prevents
SIGINT from reaching it. Operator sees a dead keyboard and a
non-responsive Ctrl+C.

Standard installer pattern (rustup, nvm, brew, OpenClaw): only
re-attach /dev/tty when stdin is NOT already a terminal — i.e., the
curl|sh case. Gate on `[ -t 0 ]` instead of /dev/tty existence. Apply
the same fix to ensure_sudo's `sudo -v` site since the latent defect
is identical (passwordless sudo on the OVH image masked it during
testing). Drop the now-unused has_tty helper.

Add five regression assertions in tests/install_sh.sh:
- `[ -t 0 ]` appears in 2+ sites
- unredirected `exec ${SUDO} aimx setup` form is present
- redirected `exec ${SUDO} aimx setup </dev/tty` appears exactly once
- redirected handoff is gated behind `elif [ -e /dev/tty ]`
- `sudo -v </dev/tty` is gated behind `elif [ -e /dev/tty ]`

Test count 40 → 45, all pass; sh -n / dash -n clean;
AIMX_DRY_RUN=1 smoke clean.
Four UX fixes for the `aimx setup` and `aimx agent-setup` flows:

- prompt_domain and prompt_trusted_senders now use the standard
  block-prompt + echo + [Y/n] confirm + loop pattern. The operator
  sees the parsed value (or list) before committing, can decline to
  re-enter, and empty trusted-senders surfaces an explicit "leave
  hooks disabled?" confirm rather than silently disabling them.
- Each `Done` / `Skipped` step in `run_setup` now redraws the full
  six-row checklist via new helpers `print_step_complete` and
  `print_step_skipped` so progress is visible after every section,
  not just at the welcome and final banners.
- `drop_through_to_agent_setup` resolves its argv[0] via
  `std::env::current_exe()` instead of passing the literal string
  "/proc/self/exe". `runuser` was re-execing /proc/self/exe in its
  own context — which points at /usr/sbin/runuser — causing the
  child runuser to parse "agent-setup" as a username and die with
  "exe: user agent-setup does not exist". Step 6 now lands as the
  multi-select TUI drop-through it was always meant to be.
- The agent-setup TUI header is reworded to "Setting up MCP
  integration for AI agents for `<user>`." with a sub-line "Select
  which AI agents you want to set up AIMX MCP for:". The username
  comes from the existing `AgentEnv::caller_username()`
  (getpwuid(geteuid())) helper. The "Space toggles, Enter confirms,
  q cancels." hint moved below the rows, right above the cursor.
- agent_tui::render: when caller_username() returns None, omit the
  trailing `for `<user>`` clause entirely instead of rendering the
  literal placeholder. Header becomes the clean "Setting up MCP
  integration for AI agents." Add a render_omits_for_clause_when_
  username_is_none test.
- setup::parse_yn: extract the !buf.trim().chars().next().is_some_and(
  |c| c == 'n' || c == 'N') idiom shared by prompt_confirm,
  prompt_confirm_list, and read_yn_line into a single private helper.
  Add a parse_yn_handles_default_yes_and_n_first_char test covering
  blank, y/Y/yes/random, n/N/no/NO/nope, and leading whitespace.
@uzyn uzyn force-pushed the worktree-binary-crafting-pancake branch from bcac18d to d80c89e Compare April 25, 2026 08:38
@uzyn
Copy link
Copy Markdown
Owner Author

uzyn commented Apr 25, 2026

Cycle 6 — review polish + rebase on main.

Non-blockers addressed

  • NB1 — agent_tui::render <user> fallback: when caller_username() returns None we now render the clean Setting up MCP integration for AI agents. instead of the literal <user> placeholder. New test render_omits_for_clause_when_username_is_none walks the render body and asserts both the no-username header line and the absence of the placeholder.
  • NB3 — extract parse_yn shared parser: the !buf.trim().chars().next().is_some_and(|c| c == 'n' || c == 'N') idiom that lived in three call sites (prompt_confirm, prompt_confirm_list, read_yn_line) is now a single private parse_yn(input: &str) -> bool helper. New test parse_yn_handles_default_yes_and_n_first_char exercises blank/Enter, y / Y / yes / random non-n input, n / N / no / NO / nope, and leading-whitespace cases.

Rebase

Branch was rebased on top of current origin/main (40 commits ahead). One conflict resolved in book/setup.md (kept this PR's checklist-aware re-entry wording — the cycle-5 ☒ marker explanation is the more accurate description). All other commits (mostly dependabot bumps + the aimx uninstall friendly-message fix) merged cleanly.

Verification

  • cargo fmt -- --check: clean
  • cargo clippy --all-targets -- -D warnings: clean
  • cargo test --bins: 1373 unit tests pass (1371 → 1373; +2 new tests for NB1 and NB3)
  • cargo test: all integration suites green
  • sh -n install.sh / dash -n install.sh: clean
  • sh tests/install_sh.sh: 45/45 pass
  • AIMX_DRY_RUN=1 sh install.sh: prints expected dry-run banner, no FS writes

PR is now MERGEABLE.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (cycle 6)

This re-review verifies cycle 6's claims: NB1 (<user> placeholder fallback) and NB3 (parse_yn extraction) are addressed, the rebase on main did not regress earlier cycles, and CI is now green.

Changes Overview

Cycle 6 polishes two non-blocker items from cycle 5 and rebases the branch on top of main:

  • src/agent_tui.rs: when caller_username() returns None, the header now reads Setting up MCP integration for AI agents. (no trailing for ... clause, no <user> placeholder). New test render_omits_for_clause_when_username_is_none asserts the forbidden placeholder string is not present in the render body and smoke-renders with username=None.
  • src/setup.rs: extracted a single parse_yn(&str) -> bool helper; prompt_confirm, prompt_confirm_list, and read_yn_line now all delegate to it. New test parse_yn_handles_default_yes_and_n_first_char covers blank/Enter/whitespace/y/Y/n/N/leading-whitespace cases.
  • 40 commits from main (mostly dependabot bumps + aed7c21 "friendly aimx uninstall") came in via rebase. The book/setup.md conflict was resolved in favor of the PR's wording, which accurately describes the cycle-5 wizard behavior (welcome banner, six-section ordering, re-entry skipping steps 3/4/5, step-6 drop-through via Command::status()).

Verification

NB1 (agent_tui placeholder) — fixed in src/agent_tui.rs:347-350. The match username produces a clean fallback string with no for ... clause when None. The new test (render_omits_for_clause_when_username_is_none) walks the render function body and asserts the literal <user> token is absent — a strong, source-grep-based regression guard. Verified.

NB3 (parse_yn extraction) — fixed in src/setup.rs:1719-1733 (helper definition) with all three callers (prompt_confirm, prompt_confirm_list, read_yn_line) now collapsed to Ok(parse_yn(&buf)). The new unit test exercises 18 input cases including blank, whitespace-only, y, Y, yes, Yeah, ok, 1, n, N, no, NO, nope\n, n, \tno\n, and N . The default-yes / first-char-n semantics are preserved exactly. Verified.

Rebase preservation spot-checks:

  • install.sh:776 still has the cycle-4 [ -t 0 ] keyboard-hang fix (lines 776–784: if [ -t 0 ]; then exec ${SUDO} aimx setup; elif [ -e /dev/tty ] && [ -r /dev/tty ]; ...).
  • src/setup.rs still has SETUP_STEPS, print_welcome_banner, print_final_banner, print_step_list, print_step_complete, and print_step_skipped. Each section uses section_header(N, SETUP_STEPS[N-1]) then print_step_complete(N, &checklist) to redraw the full ticking checklist.
  • src/setup.rs:2600 uses std::env::current_exe() with explicit error handling that surfaces a warning and returns AgentSetupOutcome::Skipped on failure — no literal /proc/self/exe is hardcoded into the runtime path. The book documentation reference to /proc/self/exe is descriptive, not a runtime constant.
  • src/agent_tui.rs:127-130 uses the spec wording Setting up MCP integration for AI agents and now correctly handles None per NB1.

book/setup.md conflict resolution — the PR's wording in lines 58–82 correctly describes the cycle-5 behavior (welcome banner + six-line checklist, six sections in spec order, ☒ skipped semantics on re-entry for steps 3/4/5, Command::status() drop-through, the closing message format with @<DOMAIN> substitution). Accurate.

Build / lint / test:

  • cargo fmt --check — clean.
  • cargo clippy --all-targets -- -D warnings — clean.
  • cargo test --bin aimx1373 passed, 0 failed, 3 ignored (matches claim).
  • cargo test --test '*'95 passed, 0 failed, 5+1+1 ignored = 7 ignored across integration suites (matches claim of 95 integration).
  • sh tests/install_sh.sh45/45 pass (the script reports 45, slightly higher than the 40 claimed in the PR body — additional cycle-5 tests landed; not a problem).

CI status (gh pr checks 153): all required checks PASS — Analyze (rust), Analyze (actions), Analyze (javascript-typescript), CodeQL, build, core-tests, docs-build, verifier-tests, dependabot. deploy is skipping (preview-only, expected on PRs). The earlier UNSTABLE state was simply checks-pending, not failures.

Mergeable state: MERGEABLE, mergeStateStatus: CLEAN (was CONFLICTING / UNSTABLE before this cycle).

Summary

  • Overall verdict: Ready to merge
  • Blockers: none
  • Non-blockers: none
  • Nice-to-haves: none

Both cycle-5 non-blockers are genuinely fixed (not just claimed) — NB1 has a new regression test that walks the render function body to forbid the placeholder string, and NB3's parse_yn is exercised by 18 input cases. The rebase preserved every cycle-1..5 change I spot-checked: the keyboard-hang [ -t 0 ] gate, the welcome banner / 6-step checklist / per-step print_step_complete redraw, the current_exe() fix in drop_through_to_agent_setup, and the agent_tui spec wording. The book/setup.md conflict resolution accurately describes the actual cycle-5 wizard behavior.

Recommended merge commit message

feat: end-to-end install.sh with welcome banner, sudo escalation, setup + agent-setup orchestration (#153)

Make `curl -fsSL https://aimx.email/install.sh | sh` drive the full
provisioning flow in one continuous wizard instead of leaving the
operator with two more manual commands. The binary owns the welcome
banner, the six-section checklist (Preflight → DNS → TLS → Trust →
Install → MCP), and the closing message. `install.sh` becomes a thin
downloader: fail fast on missing sudo before any network call,
download the tarball, install the binary, then `exec sudo aimx setup`
with stdin handled correctly for both interactive shells and
`curl|sh` pipelines. Re-entry preserves existing TLS / trust /
service config (steps 3-5 mark skipped) so `aimx setup` is also the
"wire another agent" path. Step 6 re-execs `aimx agent-setup` as
`$SUDO_USER` via `runuser` and `Command::status()` so control returns
to the wizard for the closing message; `current_exe()` resolves the
real binary path with explicit error handling instead of relying on a
`/proc/self/exe` constant. Trust prompt validates entries with up to
five re-prompts, defaults to empty list under `AIMX_NONINTERACTIVE=1`
with a logged warning. Adds `parse_yn` helper, Unicode/non-TTY step
glyphs, and 1373 unit + 95 integration + 45 install_sh tests.

When `aimx setup` drops through to `aimx agent-setup` via `runuser`, the
child inherits sudoers' `secure_path` — typically just system bin dirs.
Per-user installs of `claude`, `codex`, `gemini`, `goose`, etc. under
`~/.local/bin`, `~/.npm-global/bin`, or nvm shims aren't on that PATH,
so template registration fails even though the operator's interactive
shell sees the binary fine.

Add a `probe_login_shell` fallback that spawns `sh -lc 'command -v "$1"'`
(POSIX, not bash — works on Alpine / busybox), passes the binary name
positionally as `$1` to keep argv injection structurally impossible, and
filters the result to absolute paths pointing at executable regular
files. Wire it into `AgentEnv::probe_binary`'s default impl after the
fast `probe_path` walk; the fast path still wins when the binary lives
in `$PATH`.

Refresh the operator-facing error message to reflect that we now
search both PATH and the login shell.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle-7 re-review: PR #153

Cycle-7 scope (sole focus)

AgentEnv::probe_binary now falls back to a POSIX login-shell probe (sh -lc 'command -v "$1"' sh <name>) when probe_path returns None. All cycle-7 verification points check out:

Argv-injection invariant — confirmed safe

The spawn is Command::new("sh").arg("-lc").arg("command -v \"$1\"").arg("sh").arg(binary_name). With sh -lc <cmd>, the next positional arg becomes $0 and subsequent args become $1, $2, …. The binary name therefore reaches command -v as a single literal argv entry — never re-parsed by the shell. probe_login_shell_immune_to_argv_injection pins this by attempting a sentinel-deletion payload (foo; rm -f <sentinel>) and asserting the sentinel survives. The test would fail loudly if the implementation ever switched to string-interpolation, so the property is genuinely locked in.

Edge cases — all handled and (mostly) tested

  • Empty binary_name → guard at line 788–790 returns None. Tested by probe_login_shell_returns_none_for_empty_name.
  • Builtins (cd) → command -v prints the bare name cd, which fails the path.is_absolute() filter at line 809. Tested by probe_login_shell_rejects_non_absolute_resolution.
  • Functions / aliases → same path as builtins (bare name, not absolute). Same coverage.
  • Non-existent binary → exit status non-zero / empty stdout. Both branches covered (output.status.success() at 798, raw.is_empty() at 803). Tested by probe_login_shell_returns_none_for_missing.
  • Path resolves but isn't a regular file → meta.is_file() filter at 813. (Untested directly but the is_file() predicate is unambiguous.)
  • Path is a regular file but not executable → nix::unistd::access(..., X_OK) filter at 816–818 catches it. (Untested directly but the access call is the standard X_OK probe.)

Sequential probe ordering — fast path stays fast

probe_path(name).or_else(|| probe_login_shell(name)): when $PATH contains the binary, probe_path returns Some and or_else is never evaluated, so no sh is spawned. The fallback only fires when the inherited $PATH actually misses, which in practice is once per register_template call (≤7 agents).

Error message update — correctly threaded

src/agent_setup.rs:1377 now emits "Could not find '{bin}' in $PATH or your login shell. Install it, then re-run 'aimx agent-setup {}'." The pinning test at src/agent_setup.rs:2503 (msg.contains("Could not find 'claude' in $PATH or your login shell")) was updated to match. No stale assertions remain.

Integration-style fallback test — faithful runuser simulation

probe_binary_default_falls_back_when_path_misses is the right shape. It:

  1. Builds a tmp home/bin/<sentinel> outside any system PATH.
  2. Writes ~/.profile that prepends home/bin to PATH.
  3. Sets the parent $PATH to a secure_path-shaped string (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) that lets sh spawn but cannot resolve the sentinel.
  4. Sets HOME to the tmp dir so sh -l sources the test's ~/.profile.
  5. Calls env.probe_binary(sentinel) and asserts the sentinel resolves.

If probe_login_shell were absent or unreachable from probe_binary, probe_path would miss against the secure-like PATH and expect() would panic — so this is a real positive signal, not a tautology. Env restoration before the assertions prevents leakage. serial_test::serial correctly serializes the env mutation.

Other cycle-7 details

  • sh (POSIX) not bash — works on Alpine/musl. ✓
  • Returns canonicalised PathBuf matching probe_path's shape. ✓
  • Doc comment on probe_login_shell describes the use case (runuser secure_path) and the safety invariants (positional-arg passing). ✓

Prior cycles' work — spot check

  • install.sh:248,776 still gates the </dev/tty redirect on [ -t 0 ] (keyboard-hang fix from cycle 5).
  • src/setup.rs still has welcome banner, print_step_complete per-step redraw, current_exe() for the runuser drop-through, and the parse_yn extraction.
  • src/agent_tui.rs:348-349 still renders Setting up MCP integration for AI agents for \`.with theNonefallback toSetting up MCP integration for AI agents.`

No regressions to prior-cycle artifacts.

Verification (run locally)

Check Result
cargo fmt --check clean
cargo clippy --all-targets -- -D warnings clean
cargo test --bins 1379 passed (was 1373 → +6 cycle-7 tests)
cargo test --tests 95 passed
sh -n install.sh clean
dash -n install.sh clean
sh tests/install_sh.sh 45/45 pass
CI checks All 9 pass (build, core-tests, verifier-tests, docs-build, Analyze rust/actions/javascript-typescript, CodeQL); deploy correctly skipped
Mergeable MERGEABLE

Verdict: Ready to merge

No blockers, no non-blockers, no nice-to-haves. The cycle-7 fallback is implemented correctly, tested at the right granularity, doesn't regress the fast path, and tightens the error message to reflect the new behavior.

Fall back to sh -lc 'command -v X' for agent CLI path discovery

When `aimx setup` drops through to `aimx agent-setup` via runuser, the
child inherits sudoers' secure_path. Per-user installs of `claude`,
`codex`, `gemini`, `goose`, etc. under ~/.local/bin, ~/.npm-global/bin,
or nvm shims aren't on that PATH, so template registration fails even
though the operator's interactive shell sees the binary fine.

AgentEnv::probe_binary now falls back to a POSIX login-shell probe
(sh -lc 'command -v "$1"' sh <name>) after probe_path misses. The
binary name is passed positionally as $1, structurally preventing argv
injection; results are filtered to absolute paths pointing at
executable regular files. Refresh the operator-facing error message to
reflect that we now search both PATH and the login shell.

…ude Code detect fix

Five fixes bundled around the agent-wiring surface:

- Reword the `aimx setup` root-required error message. The old text
  pointed operators at `sudo aimx setup <domain>`, which both pushed
  the (now-hidden) positional and missed the common case where
  someone running aimx-already-up just wanted to wire an agent. The
  new error explicitly names both paths: `sudo aimx setup` for the
  server, `aimx agents setup` (no root needed) for the agent.
- CLI restructure: agent management moves to `aimx agents
  setup|remove|list` (plural noun + verbs, matching `mailboxes` /
  `hooks`). Singular `aimx agent` works as a clap alias, and the
  historical hyphenated `aimx agent-setup` flat verb is preserved as
  a hidden legacy alias so existing scripts and `install.sh`
  invocations keep working. `aimx agents list` is a thin alias of
  `aimx agents setup --list`.
- TUI polish: status suffix is now `(AIMX MCP wired)` for
  consistency with the "wire" verb used elsewhere; agent name uses
  `term::dim` and the status suffix uses a new one-shade-deeper
  `term::very_dim` helper for two-shade visual hierarchy. After the
  operator presses Enter on the picker the TUI renders a
  confirmation screen listing each task with the right verb
  ("Install AIMX MCP for X" / "Re-install AIMX MCP for Y") and
  reads `Confirm? [Y/n]` before any files are written. On `n` the
  picker re-enters with previous selections preserved.
- Bug fix: `dest_contains_aimx_entry` only scanned top-level files
  in the destination directory. Claude Code's plugin layout puts
  everything under `.claude-plugin/` and `skills/` subdirectories,
  so a fully-wired install always classified as `InstalledNotWired`.
  Special-case `claude-code` to check for `.claude-plugin/plugin.json`
  filename presence (mirroring goose's existing `aimx.yaml` case).
- New `aimx agents remove <agent>` command. Removes plugin files
  under `$HOME`, drops the `invoke-<agent>-<username>` template over
  UDS, and prints an agent-specific cleanup hint pointing at any
  external command the operator still needs to run (for example
  `claude mcp remove aimx`). Refuses root with the same
  `--dangerously-allow-root` escape hatch as `agents setup`.

Drops `<domain>` from the setup positional in --help (kept hidden as
a back-compat input). Updates `book/setup.md`,
`book/installation.md`, and `book/agent-integration.md` for the new
command names and the renamed status label.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: PR #153 cycle 8 — agent CLI restructure + bundled fixes

Cycle-8 scope verification

The five bundled changes are implemented well. Quick rundown of each, before getting to the blocker:

  1. aimx setup root error rewordingsrc/setup.rs:2215-2217 now names both paths explicitly: sudo aimx setup for the server, aimx agents setup for the agent path. Old <domain> reference is gone everywhere I checked. Good.
  2. CLI restructuresrc/cli.rs:163-216 defines the new Agents(AgentsCommand) (with agent clap alias) plus hidden legacy AgentSetup and AgentCleanup commands. src/main.rs:97-160 dispatches all three. agents list is a thin alias of agents setup --list (verified at main.rs:149-159). The four invocation forms (agent-setup, agent setup, agents setup, agents setup claude-code --list) all parse and dispatch correctly — confirmed by tests/integration.rs:5557-5612.
  3. TUI polishsrc/agent_tui.rs:366-394 defines render_confirmation with the right Install / Re-install verb selection. src/agent_tui.rs:138-166 runs the picker in a loop and re-enters with selections preserved on n. The (AIMX MCP wired) and (not detected) suffixes are wrapped in term::very_dim (agent_tui.rs:461,467); the agent display name uses term::dim for non-selectable rows (agent_tui.rs:456).
  4. Claude Code wired-detection bug fixsrc/agent_setup.rs:935-948 special-cases claude-code to check .claude-plugin/plugin.json, mirroring the goose aimx.yaml case. Pinned by detect_install_state_installed_wired_for_claude_code_plugin_layout at agent_setup.rs:2228. The fixture update on detect_install_state_installed_wired_when_dest_exists (agent_setup.rs:2196-2220) is correct, not a regression cover-up — the old test wrote a top-level plugin.json that no real install ever produces.
  5. aimx agents removesrc/agent_remove.rs refuses root with --dangerously-allow-root escape hatch (agent_remove.rs:55-57), errors clearly on unknown agent (agent_remove.rs:59-64), unconditionally removes plugin files via agent_cleanup::run_with_env in --full --yes mode (agent_remove.rs:95-109), unregisters the template via the same path, and prints an agent-specific cleanup hint via removal_hint (agent_remove.rs:131-153). The delegation to agent_cleanup::run_with_env is sound — MaskRootEnv (agent_remove.rs:158-199) cleanly skips the inner per-user refusal when --dangerously-allow-root is set without doubly applying the root override. removal_hint_covers_every_registered_agent enforces no orphan agents.

term::very_dim (src/term.rs:94-105) uses truecolor(88, 88, 88) and degrades to bright black on non-truecolor terminals; tested by very_dim_is_dimmer_than_dim_when_color_forced_on and very_dim_degrades_to_named_color_without_truecolor. Fallback chain is sound.

cargo fmt --check, cargo clippy --all-targets -- -D warnings, and the local test suite all pass: 1393 unit + 101 integration + 45 install.sh tests, matching the implementer's reported counts.

No prior-cycle regressions — install.sh's [ -t 0 ] gate, print_step_complete, current_exe() runuser fix, parse_yn, <user> placeholder fallback, and sh -lc PATH fallback are all still in place.


Blocker

B1. CI docs-build job fails — book/ references the now-hidden legacy verbs

The docs-build job (scripts/check-docs.sh) fails on this PR. Reproduced locally:

check-docs: book/ has command-line `aimx <verb>` references where <verb> is not in `aimx --help`:
  - aimx agent
  - aimx agent-cleanup
  - aimx agent-setup

Because AgentSetup, AgentCleanup, and the agent alias are all marked hide = true in cli.rs, they no longer appear in aimx --help. The lint script's ALLOWED_NON_VERBS (in scripts/check-docs.sh) currently allowlists only hook and mailbox aliases, not the agent legacy aliases, so every aimx agent-setup ..., aimx agent-cleanup ..., aimx agent setup ... reference inside book/ code blocks is flagged. The book has these references in many places — book/getting-started.md:70,72,102,105, book/setup.md:84, book/agent-integration.md (multiple), book/cli.md:205,220, book/mcp.md:23,183,185, book/faq.md:162, book/configuration.md:131, book/security.md:221, book/release-notes.md:24,25, book/README.md:52.

This is a Blocker because:

  • CI is red on the PR. Mergeable state aside, the project's own merge-quality bar is core-tests + docs-build green.
  • The fix is mechanical: either add agent, agent-setup, agent-cleanup to ALLOWED_NON_VERBS (since the implementer has explicitly committed to keeping these as hidden back-compat aliases) or rewrite the book references to use aimx agents setup / aimx agents remove. The PR's own commit message says these legacy aliases are retained "so existing scripts and install.sh invocations keep working" — adding them to the allowlist matches that intent, and is one-line.

The lint exists exactly to catch doc drift like this; it's working as intended. The PR needs to make a deliberate call: keep the hidden legacy verbs documented (extend the allowlist) or migrate the book over to the new names. Either is fine, but CI must go green before merge.


Summary

  • Verdict: CHANGES_REQUESTED (posted as --comment because GitHub blocks self---request-changes; treat as request-changes)
  • Resolved cycle-8 scope items: all five bundled changes verified end-to-end.
  • Blockers: B1 — docs-build CI failure on the legacy aimx agent-setup / aimx agent-cleanup / aimx agent references in book/.
  • Non-blockers: none.
  • Nice-to-haves: none.

CI status: core-tests pass (3m10s), verifier-tests pass, build pass, Analyze (actions) and Analyze (javascript-typescript) pass; docs-build fail; Analyze (rust) pending.

Once the book references / allowlist are reconciled and docs-build goes green, this is ready to merge. The implementation work itself is solid — the cycle-8 scope is completely fulfilled and the testing is appropriately tight (the new detect_install_state_installed_wired_for_claude_code_plugin_layout regression test alone earns a lot of trust). The one missing piece is the doc-lint update.

…nonical form

Extend `scripts/check-docs.sh`'s `ALLOWED_NON_VERBS` to allow `agent`,
`agent-setup`, and `agent-cleanup` — the cycle-8 restructure marked
these as `hide = true` clap aliases so they no longer appear in
`aimx --help`, but the lint flagged every reference in `book/`.

Audit the book and switch primary documentation to the canonical
`aimx agents setup` / `aimx agents remove` / `aimx agents list` forms,
keeping a single back-compat note per page where the legacy hyphenated
form is intentionally surfaced. Headers, troubleshooting sections, and
the cli.md reference now mirror the actual CLI shape post-cycle-8.
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (cycle 9): PR #153docs-build blocker fix (commit 64cb8ce)

Cycle-9 scope verification

The cycle-8 blocker (docs-build CI failing on hidden legacy aimx agent / aimx agent-setup / aimx agent-cleanup references) is resolved. The implementer took the hybrid path I sketched — extending ALLOWED_NON_VERBS AND rewriting the book's primary forms — which is the right call here:

  • scripts/check-docs.sh (lines 69–72): adds agent, agent-setup, agent-cleanup to ALLOWED_NON_VERBS with a comment pointing at this PR. The diff is narrow — only the allowlist array changed, the matching logic and the rest of the lint are untouched, so there's no risk of new false negatives. I re-read the whole file and confirmed that.
  • Book audit: book/agent-integration.md, book/cli.md, book/configuration.md, book/faq.md, book/getting-started.md, book/hook-recipes.md, book/hooks.md, book/mcp.md, book/README.md, book/release-notes.md, book/security.md all switch primary documentation from aimx agent-setupaimx agents setup (and agent-cleanupagents remove). The cli.md reference grew a ### aimx agents list section and rewrote the ### aimx agents remove <agent> reference to match the new shape.

bash scripts/check-docs.sh passes locally. cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test, and sh tests/install_sh.sh all green: 1393 unit + 101 integration + 1 uds_authz + 45 install.sh tests, matching the cycle-8 counts (no regressions). All earlier-cycle work (CLI restructure, TUI confirmation screen, Claude Code .claude-plugin/plugin.json detection, agent_remove module, aimx setup root error rewording) is still intact — I spot-checked the relevant lines.

CI is fully green: core-tests, verifier-tests, docs-build, build, Analyze (actions), Analyze (javascript-typescript), Analyze (rust), CodeQL all pass; deploy skipping (PR-only); no pending checks.


Non-blocker

NB1. book/cli.md:235 documents a -y, --yes flag on aimx agents remove that doesn't exist

The new "### aimx agents remove <agent>" section in book/cli.md lists this row:

| `-y`, `--yes` | Skip the interactive confirmation prompt. |

But AgentsCommand::Remove in src/cli.rs:430-437 only declares agent: String and dangerously_allow_root: bool — there is no --yes / -y flag. The actual binary rejects it:

$ aimx agents remove --yes claude-code
error: unexpected argument '--yes' found

The flag was carried over from the legacy agent-cleanup --yes (which exists on AgentCleanup at cli.rs:213-215 and gates the --full interactive prompt). The new aimx agents remove deliberately runs in unconditional full: true, yes: true mode (src/agent_remove.rs:96-99) — there's no prompt to skip, so the flag is both undocumented-on-the-binary and conceptually moot.

Two fixes are equivalent:

  1. Drop the row from book/cli.md — match the implementation. The "Refuses to run as root" + "removes plugin files unconditionally" model is fine; users don't need an opt-out.
  2. Add the flag to AgentsCommand::Remove — make the doc match. But there's nothing to opt out of, so this would be a no-op flag. Option (1) is cleaner.

Why this is a non-blocker, not a blocker: the wrong doc row doesn't crash anything, doesn't cause data loss, and doesn't bypass any security. A user copying the flag from the doc gets a clean clap-level error and can resolve it by re-reading the help. But it's still a doc accuracy bug introduced by this commit and the PR's stated intent is to make book/ accurate — keeping it merits a one-line fix.

NB2 (informational, not requested as a fix). Binary error message still embeds the legacy verb

In src/agent_setup.rs:1403-1406 and src/agent_setup.rs:1444-1447, the aimx serve is not running daemon-down error message is built with the literal token aimx agent-setup {} .... The book troubleshooting heading at book/agent-integration.md:449 correctly mirrors that error string verbatim, so the heading is accurate. But the binary itself is now telling users to run a hidden legacy alias instead of the canonical form. The PR's commit message explicitly says it "switch[ed] primary documentation to the canonical aimx agents setup" — that's true for the narrative copy; the binary's own user-visible error string was not updated.

Not in scope for this commit and the lint already passes, so I'm flagging it as informational only — keep or fix at your discretion. If you do touch it, the troubleshooting heading needs to track whatever the binary emits.


Summary

  • Verdict: CHANGES_REQUESTED (posted as --comment because GitHub blocks self---request-changes; treat as request-changes) — one non-blocker doc accuracy fix, no marker emitted.
  • Resolved: B1 (docs-build CI failure) — fully fixed.
  • Still unresolved: none from prior cycles.
  • Blockers: none.
  • Non-blockers: NB1 (book/cli.md documents nonexistent -y, --yes flag on aimx agents remove).
  • Nice-to-haves: NB2 (binary's daemon-down error embeds the legacy aimx agent-setup verb; the troubleshooting heading mirrors it correctly, so this is purely cosmetic — flagging only because the cycle-9 scope was "make the book canonical").

Once NB1 is reconciled (drop the row, or add the flag), this is ready to merge. The cycle-9 implementation work is otherwise correct, narrow, and well-justified — the allowlist comment cites this PR explicitly, which makes future audits trivial.

Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review verdict: Ready to merge

Commit 291c3af ("Drop nonexistent --yes flag from aimx agents remove docs") is a 1-line deletion of a single row in the aimx agents remove flag table in book/cli.md. It cleanly addresses the lone non-blocker (NB1) from the prior pass.

Resolved

  • NB1 (book/cli.md:235): the documented -y, --yes row for aimx agents remove is gone. The remaining flags for that command — <agent> positional and --dangerously-allow-root — now exactly match the AgentsCommand::Remove definition in src/cli.rs (lines 430–437).

Verification (all green)

  • book/cli.md aimx agents remove table is now consistent with src/cli.rs AgentsCommand::Remove.
  • Other -y, --yes references in book/cli.md (lines 46 / 129 / 193) belong to uninstall, mailboxes delete, and hooks delete, each of which actually accepts --yes. None are spurious.
  • The aimx agent-cleanup ... --full references in book/cli.md:237 and book/agent-integration.md:102 correctly describe the hidden legacy alias (AgentCleanup { full, yes } in src/cli.rs:205), so they are not drift.
  • bash scripts/check-docs.sh → ok
  • cargo fmt --check → clean
  • cargo clippy --all-targets -- -D warnings → clean
  • cargo test → 1393 unit + 101 integration + 1 uds_authz pass, 0 fail (3 / 5 / 1 ignored — pre-existing)
  • sh tests/install_sh.sh → 45/45 pass
  • gh pr checks 153 → CodeQL pass, Analyze (actions / javascript-typescript / rust) pass, build pass, core-tests pass, docs-build pass, verifier-tests pass; deploy skipping is expected for non-main.
  • mergeable: MERGEABLE, state: OPEN.

Still unresolved

None.

New findings

None. The change is a pure doc-table deletion with zero logic surface; no regression risk introduced.

NB2 from the prior pass (legacy aimx agent-setup nudge in src/agent_setup.rs error text) was explicitly informational and out of scope for this cycle, per the re-review brief. Not flagged here.

Recommended merge commit message

feat: end-to-end install.sh with welcome banner, sudo escalation, setup + agent-setup orchestration

Make `curl -fsSL https://aimx.email/install.sh | sh` drive the full
onboarding flow end-to-end. The binary now owns the welcome banner, the
six-step checklist (Preflight -> Domain & DNS -> TLS -> Trust ->
Install -> MCP), the agent-setup drop-through, and the closing message;
install.sh is a thin downloader that fails fast on missing sudo,
backs up any pre-existing /etc/aimx/config.toml, installs the binary,
and execs `sudo aimx setup`.

Restructures agent management into `aimx agents setup | remove | list`
(plural noun + verbs, matching `mailboxes` / `hooks`). Singular
`aimx agent` and the historical hyphenated `aimx agent-setup` /
`aimx agent-cleanup` forms are preserved as hidden legacy aliases so
existing scripts and `install.sh` invocations keep working. Adds new
`aimx agents remove <agent>` that removes plugin files under `$HOME`,
drops the `invoke-<agent>-<username>` template over UDS, and prints an
agent-specific cleanup hint.

Other fixes bundled in:
- pin HOME for agent-setup drop-through (sudo -H -u $invoker env HOME=...)
- make sudo optional when already root (resolve_sudo_prefix)
- use `[ -t 0 ]` instead of /dev/tty existence to gate stdin redirect
  (fixes keyboard-hang on distros where sudo defaults to use_pty)
- fix `runuser` exec path via std::env::current_exe()
- fix Claude Code wired-detection (special-case `.claude-plugin/plugin.json`)
- fall back to `sh -lc 'command -v X'` for agent CLI path discovery
- confirm-after-prompt UX for domain and trusted-senders
- per-step checklist redraw on every section transition

Test surface: 1393 unit + 101 integration tests pass; 45/45 install.sh
shell tests pass; cargo fmt clean; clippy clean (-D warnings); docs-build
green.

…nonical aimx agents setup

Renames source files src/agent_*.rs → src/agents_*.rs (preserved via
git mv), drops the hidden Command::AgentSetup / Command::AgentCleanup
clap variants and the singular `agent` alias on the Agents subcommand,
and migrates every literal occurrence (source code, tests, embedded
LLM-facing primers shipped inside agent plugins, README, the book,
hook-templates/defaults.toml, install.sh) to `aimx agents setup`.

Internal renames: drop_through_to_agent_setup → drop_through_to_agents_setup,
build_agent_setup_argv → build_agents_setup_argv (the constructed argv
now pushes "agents" then "setup" instead of one "agent-setup" token),
AgentSetupOutcome → AgentsSetupOutcome.

Tightens scripts/check-docs.sh by dropping the agent-setup / agent /
agent-cleanup back-compat allowlist entries so docs that reintroduce
the legacy form fail CI. Replaces the legacy-alias-keeps-working
integration test with two negative-confirmation tests asserting
that `aimx agent-setup` and `aimx agent setup` now error with
"unrecognized subcommand".
Copy link
Copy Markdown
Owner Author

@uzyn uzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle 10 Re-review — APPROVED

All scope items from cycle 10 land cleanly. No blockers, no non-blockers, no nice-to-haves.

Cycle-10 scope verified

  1. Hard back-compat break — confirmed by running the binary locally:
    • aimx agent-setup --listerror: unrecognized subcommand 'agent-setup'
    • aimx agent setup --listerror: unrecognized subcommand 'agent'
    • aimx agent-cleanuperror: unrecognized subcommand 'agent-cleanup'
    • aimx agents setup --list → still works (canonical path)
  2. Renames detected as renames by gitgit diff --find-renames 61b74dd^..61b74dd --name-status | grep '^R' returns:
    • R089 src/agent_cleanup.rs → src/agents_cleanup.rs
    • R093 src/agent_remove.rs → src/agents_remove.rs
    • R098 src/agent_setup.rs → src/agents_setup.rs
    • R098 src/agent_tui.rs → src/agents_tui.rs
      All four detected as renames with high similarity. Blame is preserved.
  3. Zero straggler legacy references. All four greps from the prompt return empty (the only matches for the legacy literals are inside tests/integration.rs:5590,5606 doc comments and assertion fixtures that pin the negative-confirmation behavior).
  4. scripts/check-docs.sh allowlist tightened. agent, agent-setup, and agent-cleanup are dropped from ALLOWED_NON_VERBS; only hook and mailbox remain. bash scripts/check-docs.sh passes.
  5. Embedded primers (agents/common/aimx-primer.md, agents/common/references/{hooks,mcp-tools}.md). All aimx agents … references use the canonical plural form. The unrelated mailbox: "agent" mentions are mailbox-name examples, not CLI references — no ambiguity for downstream LLMs.
  6. build_agents_setup_argv argv shape correct. src/setup.rs:2540-2554 pushes "agents" then "setup" as separate args. Test fixtures at src/setup.rs:5905-5927 assert argv.len() == 3 without data-dir and argv.len() == 5 with --data-dir, with indices 1 = "agents", 2 = "setup".
  7. tests/e2e_agent_flow.rs migrated and passing. cargo test --test e2e_agent_flow -- --ignored runs and passes 1/1. Argv shape uses ["agents", "setup"] (line 362) and ["agents", "remove"] (line 595). The bare agents remove claude-code correctly relies on do_remove's implicit full: true, yes: true (verified at src/agents_remove.rs:97-98).
  8. agents_cleanup::run correctly removed. Module doc updated to "Internal cleanup core". Only pub(crate) fn run_with_env remains, called by agents_remove::do_remove. No Command::AgentSetup / Command::AgentCleanup clap variants remain in cli.rs / main.rs. The Agents(AgentsCommand) variant carries no singular agent alias attribute.
  9. Negative-confirmation tests are precise. tests/integration.rs:5596-5604 and 5611-5619 assert both failure() exit AND predicate::str::contains("unrecognized subcommand") — correctly catches the clap-error shape, not just any non-zero exit.
  10. Prior-cycle work intact. Spot-checked: parse_yn (setup.rs:1726), print_step_complete (setup.rs:2139), current_exe() use (setup.rs:419), term::very_dim (term.rs:94), sh -lc fallback (agents_setup.rs:526). All in place.

CI status (all green)

Check Status
CodeQL pass
Analyze (actions) pass
Analyze (javascript-typescript) pass
Analyze (rust) pass
build pass
core-tests pass
docs-build pass
verifier-tests pass

Local verification

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings (root + verifier) — clean
  • cargo test — 101 integration tests pass; 1393 binary unit tests pass; 5 + 1 ignored (pre-existing)
  • cargo test --test e2e_agent_flow -- --ignored — 1/1 pass
  • bash scripts/check-docs.shok (no unknown 'aimx <verb>' command lines in book/)
  • sh tests/install_sh.sh — 45/45 pass
  • gh pr view 153 --json mergeable,mergeStateStatusMERGEABLE / CLEAN

Summary

The cycle 10 migration is complete and surgical. The hard back-compat break is correctly enforced at clap parse time, all four file renames are tracked as renames (so blame is intact), every operator-facing string in shipped artifacts (source, tests, embedded LLM primers, README, book, hook templates, install.sh) has been migrated to the canonical aimx agents setup form, the docs lint allowlist is tightened so future regressions are caught, and the e2e test exercises the new argv shape end-to-end.

Verdict: Ready to merge.

Recommended merge commit message

feat: end-to-end install.sh with welcome banner, sudo escalation, setup + agents setup orchestration

- Move setup wizard UX into the binary; thin install.sh to a downloader
- Six-step interactive checklist with per-step redraw and Unicode glyphs
- TUI-driven `aimx agents setup` with per-agent wired detection
- New `aimx agents remove <agent>` with template + plugin cleanup over UDS
- Drop `aimx agent-setup` / `aimx agent-cleanup` / `aimx agent <verb>` legacy aliases
- Rename source files src/agent_*.rs → src/agents_*.rs (blame preserved via git mv)
- Tighten scripts/check-docs.sh allowlist; CI catches future doc drift
- `current_exe()` runuser fix; `sh -lc 'command -v X'` fallback for per-user agent CLIs
- Claude Code wired-detection special case (.claude-plugin/plugin.json)

@uzyn uzyn merged commit ab4076a into main Apr 26, 2026
9 checks passed
@uzyn uzyn deleted the worktree-binary-crafting-pancake branch April 26, 2026 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant