Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ All notable user-visible changes to the TMB plugin. Versions follow [SemVer](htt

## Unreleased

### Added β€” two more hard-enforcement hooks + ENFORCEMENT.md (#108)

Per the doctrine "prompt-only enforcement caps at the LLM compliance ceiling β€” promote load-bearing rules to a harder layer," two new hooks land:

- **`scripts/hooks/no-source-edit-from-main.sh`** (PreToolUse on `Edit|Write|MultiEdit|NotebookEdit`). Blocks the call when bro mode is active *and* the target is source code *and* the current shell isn't inside an SWE worktree. Allowlist covers markdown, `LICENSE`, `.gitignore`-class configs, agent/skill prompts, plugin/hooks manifests, `.github/`. Bypass via `TMB_ALLOW_SOURCE_EDIT=1` for emergencies. Enforces the "bro is a pure planner β€” every code change goes through SWE" rule that until now was prompt-only.
- **`scripts/hooks/session-start-regen-check.sh`** (SessionStart). Reads `regen_state.last_seen_sha`, computes drift to `HEAD`, and emits `additionalContext` suggesting `tmb_refresh-architecture` when drift exceeds the threshold (default 25 commits, override via `TMB_REGEN_DRIFT_THRESHOLD`). Pre-empts the manual lazy-regen check bro is supposed to do at the start of every code-touching ask.

New doc: **`docs/architecture/ENFORCEMENT.md`** β€” canonical reference for the 6 enforcement layers (MCP middleware β†’ hooks β†’ frontmatter β†’ tool-handler validation β†’ skill `paths:` auto-load β†’ prompts) plus a per-agent Γ— per-interaction coverage matrix showing which layer covers what. Includes a section listing remaining Layer-6-only doctrine items as promotion candidates.

### Refactored β€” all plugin-shipped skills now use `tmb_` prefix

The 7 default workflow skills (`code-quality`, `docs-conventions`, `git-conventions`, `naming-conventions`, `review-findings`, `review-protocol`, `swe-checklist`) were the only plugin-shipped skills without the `tmb_` namespace prefix β€” an inconsistency with the rule "global plugin skills use `tmb_`; the open namespace is reserved for user/`tmb_skill-creator`-generated project-local skills." Renamed to `tmb_code-quality`, `tmb_docs-conventions`, …
Expand Down
115 changes: 115 additions & 0 deletions docs/architecture/ENFORCEMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Enforcement layers

How TMB doctrine is enforced at runtime β€” what is structurally guaranteed by code, what depends on prompt discipline, and which mechanism covers which interaction.

> **Why this doc matters.** The h3 + h4 A/B scenarios proved that prompt-only doctrine has a hard compliance ceiling (0/10 in both wording arms for the activation routine, 0/5 in both for Direct Mode). Anything load-bearing should sit on a hard layer; soft prompts are for judgment, style, and rare-fire instructions.

## The 6 layers (hardest β†’ softest)

| # | Layer | Mechanism | LLM-bypassable? | Cost to add |
|---|---|---|---|---|
| **1** | **MCP server middleware** | `requireRoles` in `mcp/trajectory-server/src/middleware/agent-scope.ts` rejects MCP calls violating role/scope at the wire | **No** β€” server-side, returns `is_error: true` | TS handler addition |
| **2** | **Hooks** | Shell scripts under `scripts/hooks/` fired by CC on lifecycle events (Pre/PostToolUse, UserPromptSubmit, SessionStart, Stop, WorktreeCreate, …); can `permissionDecision: deny` or inject `additionalContext` | **No** β€” runs outside the LLM's context | New `.sh` + `hooks.json` entry |
| **3** | **Frontmatter directives** | `disallowedTools`, `isolation: worktree`, `memory: false` in agent `.md` files; CC enforces structurally for subagents | **No** β€” host-enforced at spawn | YAML edit |
| **4** | **Tool-handler validation** | The MCP tool handler itself rejects malformed input (e.g. `task_create_batch` requires non-empty body, `requireRoles` wrappers) | **No** β€” handler returns error result | TS handler change |
| **5** | **Skill `paths:` auto-load** | Skill auto-loads when matching files are in the active context; reduces prompt noise | Soft β€” model can ignore the loaded skill | `paths:` in skill frontmatter |
| **6** | **Prompts** | CLAUDE.md, agent prompts, skill SKILL.md prose | **Yes** β€” h3/h4 ceiling | Markdown edit |

**Doctrine: prefer the hardest layer that fits.** When designing a new constraint:

1. Can the MCP server reject the call? β†’ Layer 1.
2. Can a hook block / inject deterministically? β†’ Layer 2.
3. Can the agent file's frontmatter close the door for subagents? β†’ Layer 3.
4. Can the tool handler validate? β†’ Layer 4.
5. Can a skill load only when relevant? β†’ Layer 5.
6. Otherwise: prompt it. β†’ Layer 6 (and accept the compliance ceiling).

## Coverage matrix β€” agent Γ— interaction Γ— enforcement

The "enforcement" column names the **strongest layer currently deployed** for each interaction. `Layer 6 only` means "prompt-only β€” relies on LLM compliance and may need promotion to a harder layer if it fails to fire reliably."

### bro

| Interaction | Enforcement | Where |
|---|---|---|
| Activation routine (`identity_get + issue_resume` data flow on every triggered message) | Layer 2 (UserPromptSubmit hook) | `scripts/hooks/activation-routine.sh` |
| Bro never edits source code directly (every code change goes through SWE) | Layer 2 (PreToolUse hook on Edit/Write/MultiEdit/NotebookEdit) | `scripts/hooks/no-source-edit-from-main.sh` |
| Architecture-doc regen lazy nudge | Layer 2 (SessionStart hook) | `scripts/hooks/session-start-regen-check.sh` |
| MCP calls must include `agent: 'bro'` | Layer 1 (server `requireRoles`) | `mcp/trajectory-server/src/middleware/agent-scope.ts` |
| Welcome banner phrasing | Layer 6 only | CLAUDE.md `## Welcome banner` |
| Triage rule (`difficult` iff `docs/trustmybot/architecture/` touched) | Layer 6 only | CLAUDE.md `## Code-touching ask chain` |
| Verify-context check before answering | Layer 6 only | CLAUDE.md `## Before answering β€” verify context` |
| Standards check before recommending | Layer 6 only | CLAUDE.md `## Before answering β€” verify context` |
| Catchphrase rule ("Trust me bro" only after pass) | Layer 6 only | CLAUDE.md `## Catchphrase` |
| Voice / tone | Layer 6 only | CLAUDE.md `## Voice` |
| `file_registry` md5 + last-seen update on Read | Layer 6 only β€” **candidate for Layer 2** (PostToolUse hook, deferred pending `last_verified_at` schema column) | `tmb_project-prescan` skill |

### swe

| Interaction | Enforcement | Where |
|---|---|---|
| Spawned only with valid `task_id` referencing a `pending`/`open` task with non-empty `spec_body` | Layer 2 (PreToolUse hook on Task) | `scripts/hooks/require-task-spec.sh` |
| Runs in an isolated worktree | Layer 2 (WorktreeCreate hook) + Layer 3 (`isolation: worktree` frontmatter) | `scripts/hooks/create-worktree.sh`, `agents/swe.md` |
| `disallowedTools` keeps SWE off MCP-write tools intended for bro | Layer 3 (frontmatter) | `agents/swe.md` |
| MCP calls must include `agent: 'swe'`, scope-restricted | Layer 1 (`requireRoles`) | `mcp/trajectory-server/src/middleware/agent-scope.ts` |
| Atomic close (`task_update_status` on return, never self-validation_record) | Layer 1 (`requireRoles` rejects bro/swe writing pr-reviewer-only tools) | server middleware |
| Task spec compliance (only edits `Files` listed in spec) | Layer 6 only | `agents/swe.md` + `tmb_swe-spawn-workflow` skill |

### pr-reviewer

| Interaction | Enforcement | Where |
|---|---|---|
| Push to protected branch blocked unless every unsigned commit has a passing `validation_attempts` row | Layer 2 (PreToolUse hook on Bash, parses `git push`) | `scripts/hooks/git-push-guard.sh` |
| MCP calls must include `agent: 'pr-reviewer'`, scope-restricted | Layer 1 (`requireRoles`) | server middleware |
| `validation_record(verdict='pass')` is the only way past the push gate | Layer 1 + Layer 2 (server requires `agent='pr-reviewer'`; hook checks the row exists) | both |
| Can't edit code (read-only review) | Layer 3 (frontmatter `tools:` allowlist excludes Edit/Write) | `agents/pr-reviewer.md` |
| Review output format (`tmb_review-findings`) | Layer 6 only | skill |

### Consultants (`architect`, `cto`, `ceo`, `pm`, project-local custom)

| Interaction | Enforcement | Where |
|---|---|---|
| Cannot write workflow state (`task_create_batch`, `validation_record`, `task_update_status`, `issue_create`) | Layer 1 (`requireRoles` rejects non-bro/non-pr-reviewer callers) | server middleware |
| Spawned only by bro, never by Human directly | Layer 6 only (CLAUDE.md routing rule) | CLAUDE.md `## Routing` |
| Return analyses, never decisions | Layer 6 only | consultant agent prompts |

### git operations (universal)

| Interaction | Enforcement | Where |
|---|---|---|
| Force-push to protected branches blocked | Layer 2 (PreToolUse hook on Bash) | `scripts/hooks/git-guards.sh` |
| Direct commits to `dev`/`main` from outside dev→main PR flow blocked | Layer 2 | `scripts/hooks/git-guards.sh` |
| Worktree branch creation safety (CC bug workaround) | Layer 2 (WorktreeCreate hook) | `scripts/hooks/create-worktree.sh` |
| Push gate (see pr-reviewer section) | Layer 2 | `scripts/hooks/git-push-guard.sh` |

## Open Layer-6-only items β€” promotion candidates

These currently rely on prompt discipline. Each is a candidate for promotion to a harder layer. **Cost of leaving on Layer 6**: the same compliance ceiling that h3/h4 demonstrated (0% reliability for high-frequency operations).

| Item | Possible promotion | Notes |
|---|---|---|
| `file_registry` md5 + last-seen update on Read | Layer 2 (PostToolUse on Read β†’ direct sqlite3 update of `content_md5`, `last_verified_at`) | Needs schema migration to add `last_verified_at` column. Filed as follow-up. |
| Catchphrase audit ("Trust me bro" without passing validation) | Layer 2 (Stop hook β†’ grep transcript + check `validation_attempts`, write a soft warning row) | Needs a no-FK table to write warnings into; can use `debug_trajectory` or add `bro_warnings` table. Low priority (style enforcement). |
| Triage rule (`simple` triage that ends up writing arch docs) | Layer 2 (PostToolUse on Edit β†’ if path matches `docs/trustmybot/architecture/` and current task is `simple`-triaged, flag inconsistency) | Detection only, not blocking. |
| Welcome banner mandatory | Layer 2 (Stop hook β†’ check first response after `Entering bro mode.` contained banner pattern; inject correction next turn) | Banner phrasing is conversational; only the *presence* of a banner can be enforced. |
| Standards check / verify-context check / triage decision | Stays Layer 6 | Pure judgment β€” can't be deterministically encoded. |

## How to add a new enforcement

1. **Identify the failure mode.** What's the bad thing that happens if the LLM doesn't comply?
2. **Pick the strongest layer that fits** (table at top of this doc).
3. **Implement**:
- Layer 1: add a `requireRoles` wrapper in the relevant tool handler; add MCP-integration test.
- Layer 2: write the hook script under `scripts/hooks/`, register it in `hooks/hooks.json`, add `tests/hooks/<name>.test.sh`.
- Layer 3: edit the agent's `.md` frontmatter; verify with subagent spawn.
- Layer 4: tighten the tool handler's input validation; add unit test.
- Layer 5: add `paths:` to the skill's SKILL.md frontmatter.
- Layer 6: edit CLAUDE.md / agent prompt / SKILL.md prose; **note explicitly in this doc that the item is Layer-6-only and may need promotion**.
4. **Update this matrix** with the new row.
5. **If demoting from a harder layer to a softer one (e.g. removing a hook), justify in the PR**: what changed about the failure mode that makes the softer layer acceptable?

## See also

- [`FLOWS.md`](FLOWS.md) β€” workflow flowcharts; cross-references which hook fires when in each flow.
- [`FILES.md`](FILES.md) β€” file-by-file map of `scripts/hooks/`, `hooks/hooks.json`, the MCP middleware.
- [`ERD.md`](ERD.md) β€” schema; the role-by-tool matrix at the bottom is the source of truth for Layer 1's coverage.
7 changes: 6 additions & 1 deletion docs/architecture/FILES.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,14 @@ plugin/
β”‚ β”‚ └── README.md # diagnostic usage guide
β”‚ β”œβ”€β”€ lib/
β”‚ β”‚ └── query-task.sh # shared sqlite helpers (tmb_db_path, tmb_task_spec_status, …)
β”‚ β”œβ”€β”€ activation-routine.sh # UserPromptSubmit hook β€” pre-fetches identity + pending issue when bro mode active
β”‚ β”œβ”€β”€ create-worktree.sh # WorktreeCreate hook (workaround CC #27134/#44965)
β”‚ β”œβ”€β”€ debug-trajectory.sh # PostToolUse capture for non-MCP calls (TMB_DEBUG_TRAJECTORY=1)
β”‚ β”œβ”€β”€ git-guards.sh # protected-branch block, force-push block, dual-tier devβ†’main exception (v0.1.1)
β”‚ β”œβ”€β”€ git-push-guard.sh # blocks `git push` on unsigned commits β€” replaces require-review-sign.sh
β”‚ └── require-task-spec.sh # block SWE spawn unless task_id references a valid DB row
β”‚ β”œβ”€β”€ no-source-edit-from-main.sh # blocks bro from editing source files outside an SWE worktree
β”‚ β”œβ”€β”€ require-task-spec.sh # block SWE spawn unless task_id references a valid DB row
β”‚ └── session-start-regen-check.sh # SessionStart hook β€” nudges to run tmb_refresh-architecture when arch docs are stale
β”‚
β”œβ”€β”€ # Bundled MCP server β€” SQLite trajectory persistence
β”œβ”€β”€ mcp/
Expand Down Expand Up @@ -193,6 +197,7 @@ plugin/
β”œβ”€β”€ docs/
β”‚ β”œβ”€β”€ multi-platform.md # how the per-platform adapter pattern works
β”‚ └── architecture/ # contributor-facing reference
β”‚ β”œβ”€β”€ ENFORCEMENT.md # 6 enforcement layers + per-agent Γ— per-interaction coverage matrix
β”‚ β”œβ”€β”€ ERD.md # SQLite schema: Mermaid ER diagram + FK + soft-ref tables
β”‚ β”œβ”€β”€ FILES.md # this file
β”‚ └── FLOWS.md # workflow flowcharts
Expand Down
22 changes: 22 additions & 0 deletions hooks/hooks.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
{
"hooks": {
"SessionStart": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/hooks/session-start-regen-check.sh",
"timeout": 5
}
]
}
],
"UserPromptSubmit": [
{
"matcher": "",
Expand Down Expand Up @@ -50,6 +62,16 @@
}
]
},
{
"matcher": "Edit|Write|MultiEdit|NotebookEdit",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/hooks/no-source-edit-from-main.sh",
"timeout": 5
}
]
},
{
"matcher": "*",
"hooks": [
Expand Down
94 changes: 94 additions & 0 deletions scripts/hooks/no-source-edit-from-main.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/usr/bin/env bash
# No-source-edit-from-main hook (#108).
#
# Blocks Edit/Write tool calls when bro mode is active and the target is a
# source-code file outside an SWE worktree. Enforces the "bro never edits
# source code directly" doctrine β€” every code change goes through SWE.
#
# Block conditions (all must be true):
# 1. Trajectory DB exists (this is a TMB project)
# 2. Bro mode is active (transcript shows prior "Entering bro mode." with
# no later "exit bro mode" / "stop being bro")
# 3. CWD is NOT inside .claude/worktrees/ (so this is bro, not SWE)
# 4. Target file is NOT in the bro-allowlist (markdown, license, gitignore,
# git templates, plugin/agent/skill manifests, etc.)
#
# Allow conditions (any one allows):
# - DB missing (not a TMB project)
# - No bro mode (regular Claude Code session)
# - In worktree (SWE legitimately edits source there)
# - Target is in allowlist (docs / configs that are bro's job)
#
# Bypass: TMB_ALLOW_SOURCE_EDIT=1 (emergency override for hotfixes).

set -uo pipefail

INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // ""' 2>/dev/null)

case "$TOOL_NAME" in
Edit|Write|MultiEdit|NotebookEdit) ;;
*) exit 0 ;;
esac

if [ "${TMB_ALLOW_SOURCE_EDIT:-0}" = "1" ]; then
exit 0
fi

DB_PATH="${TRAJECTORY_DB_PATH:-}"
if [ -z "$DB_PATH" ]; then
PLUGIN_NAME="tmb"
if [ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -f "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" ]; then
PLUGIN_NAME=$(jq -r '.name // "tmb"' "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" 2>/dev/null || echo "tmb")
fi
DB_PATH="$PWD/.claude/$PLUGIN_NAME/trajectory.db"
fi

[ -f "$DB_PATH" ] || exit 0

TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // ""' 2>/dev/null)
if [ -z "$TRANSCRIPT" ] || [ ! -f "$TRANSCRIPT" ]; then
exit 0
fi
grep -q 'Entering bro mode.' "$TRANSCRIPT" 2>/dev/null || exit 0
if grep -qiE 'exit bro mode|stop being bro' "$TRANSCRIPT" 2>/dev/null; then
exit 0
fi

case "$PWD" in
*.claude/worktrees/*) exit 0 ;;
esac

TARGET=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.notebook_path // ""' 2>/dev/null)
[ -n "$TARGET" ] || exit 0

BASENAME=$(basename "$TARGET")

case "$TARGET" in
*.md|*.markdown|*.txt|*.rst) exit 0 ;;
*/docs/*|docs/*) exit 0 ;;
*/templates/*|templates/*) exit 0 ;;
esac

case "$BASENAME" in
LICENSE|LICENSE.*|.gitignore|.gitattributes|.editorconfig|.npmignore|.dockerignore) exit 0 ;;
CHANGELOG|CHANGELOG.md|README|README.md) exit 0 ;;
esac

case "$TARGET" in
*.claude-plugin/plugin.json|*.claude-plugin/marketplace.json) exit 0 ;;
*/hooks/hooks.json|hooks/hooks.json) exit 0 ;;
*/agents/*.md|agents/*.md) exit 0 ;;
*/skills/*/SKILL.md|skills/*/SKILL.md) exit 0 ;;
*.github/*) exit 0 ;;
esac

REASON="BLOCKED: bro is a pure planner β€” every code change goes through SWE. Target '$TARGET' looks like source code; route via the code-touching ask chain (tmb_planning-simple or tmb_planning-difficult β†’ task_create_batch β†’ spawn SWE in a worktree). For emergency hotfix-style overrides, set TMB_ALLOW_SOURCE_EDIT=1."

jq -nc --arg reason "$REASON" '{
hookSpecificOutput: {
hookEventName: "PreToolUse",
permissionDecision: "deny",
permissionDecisionReason: $reason
}
}'
59 changes: 59 additions & 0 deletions scripts/hooks/session-start-regen-check.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env bash
# Session-start lazy-regen check (#108).
#
# On session start, computes git drift between regen_state.last_seen_sha
# and current HEAD. If drift exceeds the threshold (25 commits, matching
# tmb_lazy-regen-check), emits additionalContext suggesting bro run
# tmb_refresh-architecture. Pre-empts the manual check that bro is
# supposed to do at the start of every code-touching ask β€” but doesn't
# always remember to do.
#
# Silent no-op when:
# - DB doesn't exist (first activation in a fresh project)
# - regen_state has no row for 'architecture' target (never regen'd)
# - HEAD is missing or matches last_seen_sha (no drift)
# - drift is below threshold
# - sqlite3 / git missing
# Capture failures must never break the user's session.

set -uo pipefail

DB_PATH="${TRAJECTORY_DB_PATH:-}"
if [ -z "$DB_PATH" ]; then
PLUGIN_NAME="tmb"
if [ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -f "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" ]; then
PLUGIN_NAME=$(jq -r '.name // "tmb"' "${CLAUDE_PLUGIN_ROOT}/.claude-plugin/plugin.json" 2>/dev/null || echo "tmb")
fi
DB_PATH="$PWD/.claude/$PLUGIN_NAME/trajectory.db"
fi

[ -f "$DB_PATH" ] || exit 0
command -v sqlite3 >/dev/null 2>&1 || exit 0
command -v git >/dev/null 2>&1 || exit 0
command -v jq >/dev/null 2>&1 || exit 0

LAST_SHA=$(sqlite3 "$DB_PATH" "SELECT last_seen_sha FROM regen_state WHERE target='architecture' LIMIT 1;" 2>/dev/null)
[ -n "$LAST_SHA" ] || exit 0

git rev-parse "$LAST_SHA" >/dev/null 2>&1 || exit 0

HEAD_SHA=$(git rev-parse HEAD 2>/dev/null)
[ -n "$HEAD_SHA" ] || exit 0
[ "$HEAD_SHA" != "$LAST_SHA" ] || exit 0

DRIFT=$(git rev-list --count "$LAST_SHA..HEAD" 2>/dev/null)
[ -n "$DRIFT" ] || exit 0

THRESHOLD=${TMB_REGEN_DRIFT_THRESHOLD:-25}
if [ "$DRIFT" -lt "$THRESHOLD" ]; then
exit 0
fi

CONTEXT="[tmb session-start regen check] Architecture docs are stale: ${DRIFT} commits since last regen (threshold: ${THRESHOLD}). When convenient, run \`tmb_refresh-architecture\` to bring docs/trustmybot/architecture/auto/ back in sync."

jq -nc --arg ctx "$CONTEXT" '{
hookSpecificOutput: {
hookEventName: "SessionStart",
additionalContext: $ctx
}
}'
Loading
Loading