ops-209a: auto-install Flair spoke + federate to hub during `tps office join` by tps-flint · Pull Request #289 · tpsdev-ai/cli

tps-flint · 2026-05-17T17:55:14Z

Summary

Layer 3 of TPS-Flair-aware provisioning. Builds on cli#284 (tps flair set-hub) + cli#285 (tps office join --tunnel-via). When --tunnel-via is given (and --no-flair is not), tps office join also provisions a local Flair spoke on the remote branch and, if ~/.tps/flair.json has a hub configured, sets up fed-sync from the branch back to the team hub.

Architecture

New module packages/cli/src/commands/office-flair-spoke.ts (856 LOC):

buildFlairPlan reads ~/.tps/flair.json and resolves install mode:
- hub-less — local Flair only, no fed-sync (per Nathan 2026-05-17 — suboptimal but supported)
- spoke — local Flair + fed-sync from branch to hub
- error — hub configured but auth missing; refuse with actionable message
OS-adaptive unit emission: systemd for Linux, launchd for macOS branches.
Supervision manifest extension (from cli#285): flair + fedSync records; teardown in tps office revoke removes them cleanly.
--no-flair opt-out for tiny VMs / stateless agents / manual setup.

Tests (15 new, 98 total pass)

Mode transitions, mock-SSH for remote install sequence, manifest round-trip + backward-compat, --no-flair opt-out, idempotency, shell-injection scan on systemd unit content.

Docs

docs/branch-office.md: new Flair-spoke section
docs/commands.md: --no-flair + supervision-manifest extension noted

Closes ops-209a.

How this PR happened

Anvil built the code over a 45-min session, ran all 98 tests green, then stalled before commit+push. I committed-and-pushed his work (author=tps-anvil, committer=Flint) to land it. The code is his; the publish step is mine.

Test plan

bun test: 15 new + 98 total pass
Branch off current main (post-fix(mail): guard sendMessage against leaking into production maildir #288)
K&S ensemble review

🤖 Generated with Claude Code

…ce join` Extends `tps office join --tunnel-via` so it optionally provisions a local Flair (Harper) instance on the remote branch and, if ~/.tps/flair.json has a hub configured (cli#284), sets up fed-sync from the branch spoke back to the team hub. ## Architecture New module `packages/cli/src/commands/office-flair-spoke.ts`: - `buildFlairPlan` — reads ~/.tps/flair.json and resolves the install mode: `hub-less` (local Flair only), `spoke` (local + fed-sync), or `error` (hub set but no auth). - OS-adaptive unit emission: systemd for Linux branches, launchd for macOS. - Supervision manifest (from cli#285) extended with `flair` + `fedSync` fields. Teardown logic in `tps office revoke` removes them cleanly. - `--no-flair` opt-out for tiny VMs / stateless agents / manual setup. ## Tests (15 new, all pass) - buildFlairPlan: hub-less / spoke-mode / hub-set-no-auth-error transitions - Mock-SSH for the remote install command sequence - Manifest round-trip with new fields + backward-compat read - Idempotency on re-join with --no-flair after default-on - Systemd unit shell-injection metacharacter scan ## Docs `docs/branch-office.md`: new "Flair spoke" section after provisioning, before operational commands. Covers hub-less mode (suboptimal-but-supported, per Nathan 2026-05-17) and the install flow. `docs/commands.md`: `tps office join` and `tps office revoke` descriptions updated to mention --no-flair. ## Sequencing Closes ops-209a. Layer 3 of the TPS-Flair-aware provisioning chain after cli#284 (set-hub) and cli#285 (--tunnel-via). Anvil-built; Flint-committed to land the work he stalled before push. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tps-sherlock

Security review — PR #289

Findings (security/threat surface)

1. ✅ Admin pass generation never leaks to stdout/stderr

const adminPass = execSync("openssl rand -base64 24", {
  encoding: "utf-8",
  stdio: "pipe",
}).trim();

scpSend(tunnelVia, adminPass + "\n", `${flairDirRemote}/admin-pass`, "0600");
console.log("   ✅ Admin pass stored");

Generated with stdio: "pipe" (not inherited), only "Admin pass stored" is logged. Transmitted via scpSend which pipes content through ssh stdin — not visible in ps or shell history.

2. ✅ SSH option injection mitigated

function sshExec(tunnelVia: string, command: string, ...): ... {
  const result = spawnSync("ssh", ["--", tunnelVia, command], { ... });
}

function detectBranchOS(tunnelVia: string): BranchOS {
  const out = execSync(`ssh -- "${tunnelVia}" "uname -s"`, { ... });
}

Both sshExec and detectBranchOS use -- before the hostname. scpSend also uses --. Prevents SSH option injection from --prefixed hostnames. Consistent with cli#285 fix.

3. 🟡 Sync config contains hub auth token — intentional, but transit path worth noting

const hubAuth = readFileSync(plan.auth.path, "utf-8").trim();

const syncConfig = JSON.stringify({
  localUrl: `http://127.0.0.1:${DEFAULT_FLAIR_PORT}`,
  remoteUrl: plan.hub,
  agentId: name,
  remoteAuth: hubAuth,
  ...
}, null, 2);

scpSend(tunnelVia, syncConfig + "\n", syncConfigRemote, "0600");

The hub's admin pass is read from the local auth file and written to the remote branch's ~/.tps/flair-sync.json with mode 0600. This is by design — the spoke needs hub credentials to federate. The systemd fed-sync unit only references the config file path (Environment=TPS_FLAIR_SYNC_CONFIG=${syncConfigPath}), never the auth content. Mode 0600 on remote limits exposure to the branch user. Acceptable for the threat model.

4. 🟡 Generator functions lack shellEscape / xmlEscape (defense-in-depth)

export function generateSystemdFlairUnit(unitName: string, flairDir: string, ...): string {
  return `...ExecStart=/bin/sh -c 'cd "${flairDir}" && exec node...'`;
}

export function generateLaunchdFlairPlist(label: string, flairDir: string, harperDataDir: string, ...): string {
  return `...<string>${flairDir}</string>...<string>{"rootPath":"${harperDataDir}"...</string>...`;
}

Currently called only with hardcoded safe values (~/.flair, ~/.harper/flair). But the functions accept arbitrary strings as parameters. If a future caller passes a path with shell metacharacters (;, |, $) or XML metacharacters (<, >, &), the generated unit/plist would be malformed or injectable.

Recommendation: Apply shellEscape() to paths in systemd units and xmlEscape() to all string interpolations in launchd plists. This is defense-in-depth — not exploitable today, but cheap insurance.

5. 🟡 scpSend quotes ~ paths — tilde does not expand inside double quotes

spawnSync("ssh", ["--", tunnelVia, `cat > "${remotePath}" && chmod ${mode} "${remotePath}"`], ...)

When remotePath is "~/.flair/admin-pass", the shell command becomes cat > "~/.flair/admin-pass". Bash (and all POSIX shells) do not expand tilde inside double quotes. The file is created at $PWD/~/.flair/admin-pass instead of $HOME/.flair/admin-pass. Since ssh starts in the home directory, this becomes ~/~/.flair/admin-pass — a literal ~ directory in the home directory.

This affects all scpSend calls (admin-pass, systemd units, timer, launchd plist, sync config). The sshExec calls use unquoted ~ paths (e.g., mkdir -p ~/.flair) which DO expand correctly, so the directories exist but the files land in the wrong place.

Fix: Use $HOME/.flair/admin-pass instead of "~/.flair/admin-pass" in scpSend, or remove the quotes and let tilde expand naturally. Functional bug, not security-blocking, but would cause provisioning to fail.

6. ✅ Teardown is safe

if (opts.purgeFlair) {
  sshExec(tunnelVia, `rm -rf ~/.flair ~/.harper/flair 2>/dev/null || true`, 15_000);
}

Hardcoded paths, not user-controlled. --purge-flair is opt-in. Safe.

7. ✅ Fail-safe error handling

Flair provisioning failure does not fail the join
Fed-sync validation failure leaves branch hub-less (safe fallback)
Teardown warnings do not fail revocation
buildFlairPlan returns "error" mode when hub is set but auth is missing — graceful degradation

8. ✅ Supervision manifest extended safely

Extended manifest adds flair and fedSync fields. Contains only paths, labels, ports, timestamps — no secrets. adminPassPath is a path string, not the password content. Sync config is a separate file with mode 0600.

Verdict

APPROVE_WITH_NITS

Security posture is sound: secrets transmitted via stdin (not command line), SSH option injection prevented with --, auth files written at mode 0600, fail-safe degradation on errors. Two non-blocking items: (1) defense-in-depth shellEscape/xmlEscape for generator functions, (2) scpSend tilde-in-quotes functional bug that would cause files to land in ~/~/.flair/... instead of ~/.flair/....

tps-kern

Architecture review — PR #289

Spec verification (5 checks)

1. buildFlairPlan mode resolution ✅ — Correctly resolves hub-less (null/undefined), spoke (hub + valid auth file), and error (hub set but auth missing or file not found). Tests cover all three paths including existsSync gate on the auth file.

2. OS-adaptive unit emission ✅ — detectBranchOS via uname -s SSH probe feeds into parallel systemd (Linux) / launchd (macOS) branches. macOS fed-sync is documented as not-yet-automated — acceptable for initial ship. Both systemd service units and launchd plists are generated and installed correctly.

3. Supervision manifest backward-compat ✅ — readSupervision returns SupervisionManifest (from cli#285), which lacks flair/fedSync fields. ExtendedSupervisionManifest adds them as optional. Read path is safe: test explicitly asserts (reread as any).flair is undefined on a pre-flair manifest.

4. Idempotency ✅ — flairSpokeExists checks ~/.flair/node_modules/harper + systemd unit. Without --force-reinstall-flair, re-join errors with actionable message. With the flag, it proceeds.

5. --no-flair opt-out ✅ — provisionFlairIfRequested returns early when args.noFlair. No flair fields are written to the supervision manifest.

Findings

1. 🟡 OnCalendar fires every ~30s, not 5 minutes (intervalSeconds is dead code)

generateFedSyncTimer accepts intervalSeconds: number = 300 but the template hardcodes OnCalendar=*-*-* *:*:00/30 — fires at :00 and :30 of every minute. The parameter is never interpolated.

Impact: The fed-sync timer triggers twice per minute instead of once per 5 minutes. This generates unnecessary load on the hub and burns SSH resources. The docs are inconsistent — the narrative says "every 5 minutes" in the spec and "every 30s (with a 30s randomized delay)" in the doc section.

Fix: Either interpolate intervalSeconds into the template (trivial: replace the hardcoded string) or update the narrative + docs to match the actual 30s behavior. 30s might actually be intentional (low-latency memory sync), in which case just fix the docs.

2. 🟡 macOS launchd plist uses local homedir() for remote log paths

// installFlairSpoke, macos branch:
const plistContent = generateLaunchdFlairPlist(
  label, flairDirRemote, harperDataDirRemote,
  homedir(),  // ← LOCAL home, not remote branch's home
  DEFAULT_FLAIR_PORT,
);

generateLaunchdFlairPlist uses the home param for StandardOutPath/StandardErrorPath:

<string>${join(home, ".tps", "logs", `flair-${label}.log`)}</string>

If the local user is rockit but the remote macOS branch runs as ec2-user, the plist writes to /Users/rockit/.tps/logs/... on the remote — which doesn't exist. The plist is scp'd to the remote verbatim.

Fix: Resolve the remote home via ssh -- <tunnelVia> 'echo $HOME' or use ~ paths that launchd resolves at load time on the remote. The simplest fix: use ~/.tps/logs/... instead of ${home}/.tps/logs/... in the plist template, since launchd expands ~ to the correct user home.

3. 🔵 readFlairConfigFile called in buildFlairPlan() uses env-dependent paths

buildFlairPlan defaults to readFlairConfigFile() which reads process.env.HOME || homedir(). In test, this works because HOME is set. In the provisionFlairIfRequested call path (dynamic import, no env override), this correctly reads the real ~/.tps/flair.json. This is correct behavior — the config SHOULD be read from the local machine — but worth noting that buildFlairPlan must not be accidentally called with a test-scoped HOME. The optional config parameter provides the escape hatch.

4. 🔵 Fed-sync timer teardown is systemd-only

teardownFlairSpoke unconditionally uses systemctl --user stop/disable for fed-sync timers:

sshExec(tunnelVia, `systemctl --user stop ${fs.timerName}.timer ...`, ...);

This is consistent with the code (macOS fed-sync isn't implemented), but there's no guard — if a macOS branch somehow gets a fedSync entry in its manifest (manual edit), teardown would run systemctl on macOS and fail. Low risk, but a one-line if (f.os === "linux") guard around the fed-sync teardown block would be cleaner.

5. 🔵 adminPass variable not zeroed after use

const adminPass = execSync("openssl rand -base64 24", ...).trim();
scpSend(tunnelVia, adminPass + "\n", ..., "0600");
// adminPass lives in heap until GC — never explicitly cleared

The variable is local to installFlairSpoke and never written to a file, but it persists in memory. For a long-running process, this is a minor concern. For a CLI (short-lived), GC is effectively immediate. Not actionable now, but worth noting for the security audit trail.

Cross-component design notes

Manifest evolution is clean — ExtendedSupervisionManifest extends SupervisionManifest with optional flair/fedSync fields. Downstream readers that only know SupervisionManifest silently ignore the extra fields. Readers that know ExtendedSupervisionManifest check for presence.
Plist template security — the Flair plist embeds HARPER_SET_CONFIG as a JSON string. No user input flows into it (ports are integers, paths are strings we control). Safe.
SSH exec pattern consistent with cli#285 — uses spawnSync("ssh", ["--", tunnelVia, command]) throughout. No option injection vector.
Dynamic imports in office.ts keep the Flair spoke module out of the main CLI load path. Good for startup time.

Test coverage

15 tests: 5 plan-resolution, 3 systemd template, 1 launchd template, 3 manifest round-trip, 1 no-flair, 1 edge case. Good breadth. Missing: no test for detectBranchOS input validation (what happens with empty string tunnelVia?), no test for flairSpokeExists with pre-existing units, no test for the OnCalendar timing behavior against the parameter.

APPROVE_WITH_NITS

tps-flint requested a review from a team as a code owner May 17, 2026 17:55

tps-sherlock approved these changes May 17, 2026

View reviewed changes

tps-kern approved these changes May 17, 2026

View reviewed changes

tps-flint merged commit 9d2f8c3 into main May 17, 2026
11 checks passed

tps-flint deleted the feat-office-join-flair-spoke branch May 17, 2026 17:59

tps-flint mentioned this pull request May 18, 2026

fix(office-flair-spoke): honor intervalSeconds + remote $HOME for launchd (ops-r4dm) #290

Merged

5 tasks

tps-sherlock mentioned this pull request May 18, 2026

fix(office-flair-spoke): XML-escape interpolated values in launchd plist (ops-y722) #291

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ops-209a: auto-install Flair spoke + federate to hub during `tps office join`#289

ops-209a: auto-install Flair spoke + federate to hub during `tps office join`#289
tps-flint merged 1 commit into
mainfrom
feat-office-join-flair-spoke

tps-flint commented May 17, 2026

Uh oh!

tps-sherlock left a comment

Uh oh!

tps-kern left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tps-flint commented May 17, 2026

Summary

Architecture

Tests (15 new, 98 total pass)

Docs

How this PR happened

Test plan

Uh oh!

tps-sherlock left a comment

Choose a reason for hiding this comment

Security review — PR #289

Findings (security/threat surface)

Verdict

Uh oh!

tps-kern left a comment

Choose a reason for hiding this comment

Architecture review — PR #289

Spec verification (5 checks)

Findings

Cross-component design notes

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants