Skip to content

Add a /tdd skill#7

Merged
mxriverlynn merged 6 commits into
mainfrom
tdd-skill
May 18, 2026
Merged

Add a /tdd skill#7
mxriverlynn merged 6 commits into
mainfrom
tdd-skill

Conversation

@mxriverlynn
Copy link
Copy Markdown
Collaborator

@mxriverlynn mxriverlynn commented May 18, 2026

Summary

This PR adds /tdd, the han plugin's first execution skill, so that Claude Code can implement features test-first through a disciplined red-green-refactor loop rather than producing another document.

  • Every prior han skill returns a markdown artifact. /tdd writes production code and tests directly into your working tree, driven by a BDD-framed (Behavior-Driven Development) behavior list and a strict observed-failure gate.
  • The skill is autonomous after the initial request: it reports scope (commands found, standards resolved, current branch) and immediately begins writing code without waiting for human confirmation. The one structural exception and the one hard dependency are described in Behavior changes below.
  • Documentation and indexes were updated to register the new skill and a new "Building" category, and skill counts were reconciled from 15 to 17 (correcting pre-existing drift, including a missing issue-triage entry in CLAUDE.md's catalog).
  • No plugin version bump and no CHANGELOG entry are included; both were explicit author constraints.

Behavior changes

Before this PR, han had no skill that wrote code. Every skill produced a report or document.

After this PR, /tdd works like this:

Phase What the skill does
Step 1: Scope Resolves test/lint/build commands and coding standards from CLAUDE.md, then project-discovery.md, then a one-time inference script. Reports branch, commands, and standards, then proceeds. Not a gate.
Step 2: Test list Builds a BDD behavior list ordered outside-in by user value. Reports the list, then proceeds. Not a gate.
Step 3: Loop Picks one item, runs RED (write one test, run it, paste the failing output), GREEN (minimum production code to pass, apply correctness and placement standards only), REFACTOR (full standards conformance + YAGNI, non-skippable). Repeats until the list is empty.
Step 4: Outer loop Runs any outer acceptance tests deferred while the inner loop ran.
Step 5: Summary Runs the full suite, lint, and build. Summarizes behaviors shipped, YAGNI deferrals, standards applied, and any scope warning.

The observed-failure gate is the critical constraint: no production code may change until the skill has run the test, pasted the real runner output, and confirmed the test fails for the intended reason. A first-run pass is a process violation: stop and diagnose, do not proceed. This gate is enforced by discipline and shown evidence (pasted runner output, strict sequencing, a first-run-pass stop rule), not by a mechanism that can physically prevent an out-of-order write. The long-form doc is deliberately honest about this limitation.

The standards split reconciles two constraints. GREEN obeys standards governing correctness and architectural placement (crossing an ADR -- architectural decision record -- boundary during green is wrong code, not deferrable mess). Stylistic and structural standards, plus YAGNI (You Aren't Gonna Need It) enforcement, are the REFACTOR hat. This keeps green minimal while ensuring the code that survives each cycle respects the project's architecture.

Two things can stop the otherwise autonomous run:

  1. The verify-plan exception. If the initial request or provided context explicitly asks the human to review, verify, or approve the plan or test list before implementation, the skill presents the Step 1 scope report and the Step 2 test list and waits for approval before writing any code.
  2. An unresolvable test command. If the command cannot be resolved from CLAUDE.md, project-discovery.md, the inference script, or manifest inference, the skill asks the user before the loop starts. TDD is impossible without a way to run tests.

What to look at first

  • The discipline-not-mechanism honesty in docs/skills/tdd.md. The observed-failure gate cannot be enforced by the plugin model: a sub-agent could, in principle, write production code without ever observing a red. The skill makes failure visible and diagnosable instead of pretending to prevent it. This was an explicit design choice; worth knowing before relying on the skill for high-stakes builds.
  • The GREEN-vs-REFACTOR standards split in plugin/skills/tdd/SKILL.md. Splitting coding-standards conformance across two phases is a deliberate departure from "commit whatever sins necessary in green." The rationale: some standards (architectural placement, ADR boundaries) are not sins you can tidy later, they are wrong code. Check whether the split as written matches your intuition of what a "correctness standard" is.
  • Autonomous-by-default operation. An earlier internal validation pass recommended a human gate before code writes. This PR overrides that: the scope report is informational only, and the skill writes code immediately. The one exception (explicit verify-plan request) is narrow. Satisfy yourself the autonomy design is right for a code-writing skill.
  • The allowed-tools list breadth in plugin/skills/tdd/SKILL.md. The skill grants Bash access to npm, pytest, go, cargo, rake, mix, dotnet, gradle, mvn, and make. A language-agnostic TDD loop needs all of these because it cannot know the stack ahead of time. Check whether any runner is missing or the list is wider than needed.

How this was tested

  • ✅ The discovery script was syntax-checked with bash -n and run in this repo (no manifest present, correctly emits inferred-test-command: none and fails closed to the user-prompt path).
  • ✅ The SKILL.md frontmatter description was validated to fit within the 1024-character limit and tightened to four sentences with all four sibling-skill boundaries named.
  • ✅ Reference link targets were confirmed to exist on disk.
  • ✅ Doc counts (17 skills, 21 agents, 6 sizing-aware skills) and absence of em-dashes in the long-form doc were grep-verified.
  • ✅ A three-agent documentation audit (counts and links, /tdd accuracy against the skill definition, broad staleness) was run; its findings were fixed before this branch was finalized.
  • ✅ This repo has no automated test suite for plugin markdown or skills; all verification was static analysis and manual audit, not a test runner.

Files of interest

  • plugin/skills/tdd/SKILL.md — the skill definition: the loop, the observed-failure gate, the standards split, and the autonomous-operation design
  • docs/skills/tdd.md — operator doc: when to use /tdd, the honest limitation on gate enforcement, and how to get the most out of it
  • plugin/skills/tdd/references/failure-modes.md — eight named agent-faking failure modes and the discipline that catches each
  • plugin/skills/tdd/scripts/detect-tdd-context.sh — the only non-document file: manifest inference for resolving test/lint/build commands when CLAUDE.md and project-discovery.md come up empty
  • plugin/skills/test-planning/SKILL.md — bidirectional boundary edit: /test-planning now points users toward /tdd for implementation

Execution skill that drives writing code through a disciplined
red-green-refactor loop with an enforced observed-failure gate, BDD
behavior framing, outside-in double loop for user-facing behavior, and
project coding standards + ADRs applied in green (correctness) and
refactor (full + YAGNI).

- SKILL.md: front-loaded constraints, Steps 1-5 (config/scope, test
  list, red-green-refactor loop, outer loop, final verification)
- references/tdd-loop.md: verbatim Three Laws, Canon TDD, gears, gate
- references/bdd-framing.md: behavior naming, GWT->AAA, double loop,
  mock-vs-stub, observable-only assertions
- references/failure-modes.md: agent TDD-faking modes + the discipline
  that catches each
- scripts/detect-tdd-context.sh: one-time git + manifest command
  inference (per-cycle runs use the resolved command directly)
- docs/skills/tdd.md: long-form operator doc per skill-long-form-template
  (writing-voice compliant: no em-dashes, second person)
- New 'Building' skill category in README.md and docs/skills/README.md
  (no existing category fits an execution skill)
- Skill counts reconciled to 17 across README.md, CLAUDE.md,
  docs/concepts.md (also corrects pre-existing issue-triage drift:
  CLAUDE.md/concepts.md were stale at 15 while the repo had 16; the
  CLAUDE.md catalog was missing issue-triage entirely)
- docs/quickstart.md: /tdd added to Path A and a combining example
- test-planning SKILL.md description: bidirectional /tdd boundary
- tdd SKILL.md description tightened to 4 sentences, all sibling
  skills named (test-planning, code-review, plan-a-feature, investigate)

Per instruction: no version bump, no CHANGELOG entry.
Both said 'Five' and omitted /architectural-analysis, which was
registered as the sixth sizing-aware skill in d85320f but missed in
concepts.md and quickstart.md. Now 'Six', including
/architectural-analysis, matching the authoritative docs/sizing.md
list and ordering.
- Sizing-aware skill list now includes /architectural-analysis (was
  the only nav line still listing five, after d85320f added it as the
  sixth and dcd11b8 fixed concepts/quickstart)
- 'What this plugin does' now names test-driven implementation and
  the build-it-test-first step in the compose chain, reflecting /tdd
- Fixed /issue-triage link to use ./ prefix like every other entry
From a three-agent documentation audit (counts/links, /tdd accuracy,
broad staleness). Counts, links, sizing list, voice, and boundaries
verified clean; these are the actionable findings:

- plugin/skills/tdd/SKILL.md: replace !`which git` injection with
  !`git --version` (covered by existing Bash(git *)); the old form
  needed Bash(which *) and would stall or silently fail at launch
- docs/skills/tdd.md: state the scope confirmation is the only
  interactive checkpoint and the loop then runs unattended; note the
  discovery script's inferred commands are confirmed, not trusted
- detect-tdd-context.sh: .csproj detection now uses find (was a
  top-level glob that missed src/**/x.csproj layouts)
- docs/yagni.md: add the missing /tdd row (was a contradiction with
  the YAGNI list in docs/skills/README.md)
- docs/quickstart.md: '(two or three skills)' -> '(a few skills)'
  (Path A is now five steps); Path A scent now names build-test-first
- docs/concepts.md: fix link label to match its path
  (docs/guidance/plugin-entity-taxonomy.md)
- CONTRIBUTING.md: add a step to update skill counts in CLAUDE.md,
  concepts.md, and README when adding a skill (the gap that caused
  the stale-count drift this review reconciled)

Skipped per scope: CHANGELOG entry (explicit user constraint);
pre-existing guidance-directory link convention (not introduced here).
The scope confirmation is no longer a human gate. It is a brief
informational report the skill emits and then proceeds past without
waiting. The branch offer becomes a recommendation, not a prompt. The
~10-item 'pause and ask' becomes a non-blocking scope warning recorded
in the final summary while the loop continues.

The one exception: if the request or provided context explicitly says
the human wants to review/verify/approve the plan or test list before
implementation, Step 1+2 present scope and the test list and wait for
approval before the Step 3 loop. The only other thing that can block is
a test command that cannot be resolved or inferred (a hard dependency,
not a discretionary checkpoint).

Updated SKILL.md Steps 1, 2, 3, 5; references/failure-modes.md; and
docs/skills/tdd.md (TL;DR concept, How to invoke, How to get the most
out of it, Cost and latency) to match. No version bump, no CHANGELOG.
@mxriverlynn mxriverlynn merged commit 8884547 into main May 18, 2026
@mxriverlynn mxriverlynn deleted the tdd-skill branch May 18, 2026 16:15
mxriverlynn added a commit that referenced this pull request May 26, 2026
The description's final sentence ("It applies the project's coding
standards and ADRs during the green and refactor steps, and enforces
YAGNI during refactor") described internal behavior of the skill, not a
trigger phrase or a boundary against sibling skills. Per the plugin's
skill-description-frontmatter guidance, every sentence must improve
trigger accuracy or disambiguation.

The remaining four sentences cover what (red-green-refactor with the
observed-failure gate), when to use, trigger breadth (paraphrasings),
and boundary (test-planning, code-review, plan-a-feature, investigate).
Frontmatter is Level 1, paid in every conversation; the saved tokens
add up across all users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant