Add a /tdd skill by mxriverlynn · Pull Request #7 · testdouble/han

mxriverlynn · 2026-05-18T15:21:27Z

Summary

This PR adds /tdd, the han plugin's first execution skill, so that Claude Code can implement features test-first through a disciplined red-green-refactor loop rather than producing another document.

Every prior han skill returns a markdown artifact. /tdd writes production code and tests directly into your working tree, driven by a BDD-framed (Behavior-Driven Development) behavior list and a strict observed-failure gate.
The skill is autonomous after the initial request: it reports scope (commands found, standards resolved, current branch) and immediately begins writing code without waiting for human confirmation. The one structural exception and the one hard dependency are described in Behavior changes below.
Documentation and indexes were updated to register the new skill and a new "Building" category, and skill counts were reconciled from 15 to 17 (correcting pre-existing drift, including a missing issue-triage entry in CLAUDE.md's catalog).
No plugin version bump and no CHANGELOG entry are included; both were explicit author constraints.

Behavior changes

Before this PR, han had no skill that wrote code. Every skill produced a report or document.

After this PR, /tdd works like this:

Phase	What the skill does
Step 1: Scope	Resolves test/lint/build commands and coding standards from CLAUDE.md, then `project-discovery.md`, then a one-time inference script. Reports branch, commands, and standards, then proceeds. Not a gate.
Step 2: Test list	Builds a BDD behavior list ordered outside-in by user value. Reports the list, then proceeds. Not a gate.
Step 3: Loop	Picks one item, runs RED (write one test, run it, paste the failing output), GREEN (minimum production code to pass, apply correctness and placement standards only), REFACTOR (full standards conformance + YAGNI, non-skippable). Repeats until the list is empty.
Step 4: Outer loop	Runs any outer acceptance tests deferred while the inner loop ran.
Step 5: Summary	Runs the full suite, lint, and build. Summarizes behaviors shipped, YAGNI deferrals, standards applied, and any scope warning.

The observed-failure gate is the critical constraint: no production code may change until the skill has run the test, pasted the real runner output, and confirmed the test fails for the intended reason. A first-run pass is a process violation: stop and diagnose, do not proceed. This gate is enforced by discipline and shown evidence (pasted runner output, strict sequencing, a first-run-pass stop rule), not by a mechanism that can physically prevent an out-of-order write. The long-form doc is deliberately honest about this limitation.

The standards split reconciles two constraints. GREEN obeys standards governing correctness and architectural placement (crossing an ADR -- architectural decision record -- boundary during green is wrong code, not deferrable mess). Stylistic and structural standards, plus YAGNI (You Aren't Gonna Need It) enforcement, are the REFACTOR hat. This keeps green minimal while ensuring the code that survives each cycle respects the project's architecture.

Two things can stop the otherwise autonomous run:

The verify-plan exception. If the initial request or provided context explicitly asks the human to review, verify, or approve the plan or test list before implementation, the skill presents the Step 1 scope report and the Step 2 test list and waits for approval before writing any code.
An unresolvable test command. If the command cannot be resolved from CLAUDE.md, project-discovery.md, the inference script, or manifest inference, the skill asks the user before the loop starts. TDD is impossible without a way to run tests.

What to look at first

The discipline-not-mechanism honesty in docs/skills/tdd.md. The observed-failure gate cannot be enforced by the plugin model: a sub-agent could, in principle, write production code without ever observing a red. The skill makes failure visible and diagnosable instead of pretending to prevent it. This was an explicit design choice; worth knowing before relying on the skill for high-stakes builds.
The GREEN-vs-REFACTOR standards split in plugin/skills/tdd/SKILL.md. Splitting coding-standards conformance across two phases is a deliberate departure from "commit whatever sins necessary in green." The rationale: some standards (architectural placement, ADR boundaries) are not sins you can tidy later, they are wrong code. Check whether the split as written matches your intuition of what a "correctness standard" is.
Autonomous-by-default operation. An earlier internal validation pass recommended a human gate before code writes. This PR overrides that: the scope report is informational only, and the skill writes code immediately. The one exception (explicit verify-plan request) is narrow. Satisfy yourself the autonomy design is right for a code-writing skill.
The allowed-tools list breadth in plugin/skills/tdd/SKILL.md. The skill grants Bash access to npm, pytest, go, cargo, rake, mix, dotnet, gradle, mvn, and make. A language-agnostic TDD loop needs all of these because it cannot know the stack ahead of time. Check whether any runner is missing or the list is wider than needed.

How this was tested

✅ The discovery script was syntax-checked with bash -n and run in this repo (no manifest present, correctly emits inferred-test-command: none and fails closed to the user-prompt path).
✅ The SKILL.md frontmatter description was validated to fit within the 1024-character limit and tightened to four sentences with all four sibling-skill boundaries named.
✅ Reference link targets were confirmed to exist on disk.
✅ Doc counts (17 skills, 21 agents, 6 sizing-aware skills) and absence of em-dashes in the long-form doc were grep-verified.
✅ A three-agent documentation audit (counts and links, /tdd accuracy against the skill definition, broad staleness) was run; its findings were fixed before this branch was finalized.
✅ This repo has no automated test suite for plugin markdown or skills; all verification was static analysis and manual audit, not a test runner.

Files of interest

plugin/skills/tdd/SKILL.md — the skill definition: the loop, the observed-failure gate, the standards split, and the autonomous-operation design
docs/skills/tdd.md — operator doc: when to use /tdd, the honest limitation on gate enforcement, and how to get the most out of it
plugin/skills/tdd/references/failure-modes.md — eight named agent-faking failure modes and the discipline that catches each
plugin/skills/tdd/scripts/detect-tdd-context.sh — the only non-document file: manifest inference for resolving test/lint/build commands when CLAUDE.md and project-discovery.md come up empty
plugin/skills/test-planning/SKILL.md — bidirectional boundary edit: /test-planning now points users toward /tdd for implementation

Execution skill that drives writing code through a disciplined red-green-refactor loop with an enforced observed-failure gate, BDD behavior framing, outside-in double loop for user-facing behavior, and project coding standards + ADRs applied in green (correctness) and refactor (full + YAGNI). - SKILL.md: front-loaded constraints, Steps 1-5 (config/scope, test list, red-green-refactor loop, outer loop, final verification) - references/tdd-loop.md: verbatim Three Laws, Canon TDD, gears, gate - references/bdd-framing.md: behavior naming, GWT->AAA, double loop, mock-vs-stub, observable-only assertions - references/failure-modes.md: agent TDD-faking modes + the discipline that catches each - scripts/detect-tdd-context.sh: one-time git + manifest command inference (per-cycle runs use the resolved command directly)

- docs/skills/tdd.md: long-form operator doc per skill-long-form-template (writing-voice compliant: no em-dashes, second person) - New 'Building' skill category in README.md and docs/skills/README.md (no existing category fits an execution skill) - Skill counts reconciled to 17 across README.md, CLAUDE.md, docs/concepts.md (also corrects pre-existing issue-triage drift: CLAUDE.md/concepts.md were stale at 15 while the repo had 16; the CLAUDE.md catalog was missing issue-triage entirely) - docs/quickstart.md: /tdd added to Path A and a combining example - test-planning SKILL.md description: bidirectional /tdd boundary - tdd SKILL.md description tightened to 4 sentences, all sibling skills named (test-planning, code-review, plan-a-feature, investigate) Per instruction: no version bump, no CHANGELOG entry.

Both said 'Five' and omitted /architectural-analysis, which was registered as the sixth sizing-aware skill in d85320f but missed in concepts.md and quickstart.md. Now 'Six', including /architectural-analysis, matching the authoritative docs/sizing.md list and ordering.

- Sizing-aware skill list now includes /architectural-analysis (was the only nav line still listing five, after d85320f added it as the sixth and dcd11b8 fixed concepts/quickstart) - 'What this plugin does' now names test-driven implementation and the build-it-test-first step in the compose chain, reflecting /tdd - Fixed /issue-triage link to use ./ prefix like every other entry

From a three-agent documentation audit (counts/links, /tdd accuracy, broad staleness). Counts, links, sizing list, voice, and boundaries verified clean; these are the actionable findings: - plugin/skills/tdd/SKILL.md: replace !`which git` injection with !`git --version` (covered by existing Bash(git *)); the old form needed Bash(which *) and would stall or silently fail at launch - docs/skills/tdd.md: state the scope confirmation is the only interactive checkpoint and the loop then runs unattended; note the discovery script's inferred commands are confirmed, not trusted - detect-tdd-context.sh: .csproj detection now uses find (was a top-level glob that missed src/**/x.csproj layouts) - docs/yagni.md: add the missing /tdd row (was a contradiction with the YAGNI list in docs/skills/README.md) - docs/quickstart.md: '(two or three skills)' -> '(a few skills)' (Path A is now five steps); Path A scent now names build-test-first - docs/concepts.md: fix link label to match its path (docs/guidance/plugin-entity-taxonomy.md) - CONTRIBUTING.md: add a step to update skill counts in CLAUDE.md, concepts.md, and README when adding a skill (the gap that caused the stale-count drift this review reconciled) Skipped per scope: CHANGELOG entry (explicit user constraint); pre-existing guidance-directory link convention (not introduced here).

The scope confirmation is no longer a human gate. It is a brief informational report the skill emits and then proceeds past without waiting. The branch offer becomes a recommendation, not a prompt. The ~10-item 'pause and ask' becomes a non-blocking scope warning recorded in the final summary while the loop continues. The one exception: if the request or provided context explicitly says the human wants to review/verify/approve the plan or test list before implementation, Step 1+2 present scope and the test list and wait for approval before the Step 3 loop. The only other thing that can block is a test command that cannot be resolved or inferred (a hard dependency, not a discretionary checkpoint). Updated SKILL.md Steps 1, 2, 3, 5; references/failure-modes.md; and docs/skills/tdd.md (TL;DR concept, How to invoke, How to get the most out of it, Cost and latency) to match. No version bump, no CHANGELOG.

The description's final sentence ("It applies the project's coding standards and ADRs during the green and refactor steps, and enforces YAGNI during refactor") described internal behavior of the skill, not a trigger phrase or a boundary against sibling skills. Per the plugin's skill-description-frontmatter guidance, every sentence must improve trigger accuracy or disambiguation. The remaining four sentences cover what (red-green-refactor with the observed-failure gate), when to use, trigger breadth (paraphrasings), and boundary (test-planning, code-review, plan-a-feature, investigate). Frontmatter is Level 1, paid in every conversation; the saved tokens add up across all users.

mxriverlynn added 6 commits May 18, 2026 09:03

mxriverlynn merged commit 8884547 into main May 18, 2026

mxriverlynn deleted the tdd-skill branch May 18, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a /tdd skill#7

Add a /tdd skill#7
mxriverlynn merged 6 commits into
mainfrom
tdd-skill

mxriverlynn commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mxriverlynn commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior changes

What to look at first

How this was tested

Files of interest

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mxriverlynn commented May 18, 2026 •

edited

Loading