Add a /tdd skill#7
Merged
Merged
Conversation
Execution skill that drives writing code through a disciplined red-green-refactor loop with an enforced observed-failure gate, BDD behavior framing, outside-in double loop for user-facing behavior, and project coding standards + ADRs applied in green (correctness) and refactor (full + YAGNI). - SKILL.md: front-loaded constraints, Steps 1-5 (config/scope, test list, red-green-refactor loop, outer loop, final verification) - references/tdd-loop.md: verbatim Three Laws, Canon TDD, gears, gate - references/bdd-framing.md: behavior naming, GWT->AAA, double loop, mock-vs-stub, observable-only assertions - references/failure-modes.md: agent TDD-faking modes + the discipline that catches each - scripts/detect-tdd-context.sh: one-time git + manifest command inference (per-cycle runs use the resolved command directly)
- docs/skills/tdd.md: long-form operator doc per skill-long-form-template (writing-voice compliant: no em-dashes, second person) - New 'Building' skill category in README.md and docs/skills/README.md (no existing category fits an execution skill) - Skill counts reconciled to 17 across README.md, CLAUDE.md, docs/concepts.md (also corrects pre-existing issue-triage drift: CLAUDE.md/concepts.md were stale at 15 while the repo had 16; the CLAUDE.md catalog was missing issue-triage entirely) - docs/quickstart.md: /tdd added to Path A and a combining example - test-planning SKILL.md description: bidirectional /tdd boundary - tdd SKILL.md description tightened to 4 sentences, all sibling skills named (test-planning, code-review, plan-a-feature, investigate) Per instruction: no version bump, no CHANGELOG entry.
Both said 'Five' and omitted /architectural-analysis, which was registered as the sixth sizing-aware skill in d85320f but missed in concepts.md and quickstart.md. Now 'Six', including /architectural-analysis, matching the authoritative docs/sizing.md list and ordering.
- Sizing-aware skill list now includes /architectural-analysis (was the only nav line still listing five, after d85320f added it as the sixth and dcd11b8 fixed concepts/quickstart) - 'What this plugin does' now names test-driven implementation and the build-it-test-first step in the compose chain, reflecting /tdd - Fixed /issue-triage link to use ./ prefix like every other entry
From a three-agent documentation audit (counts/links, /tdd accuracy, broad staleness). Counts, links, sizing list, voice, and boundaries verified clean; these are the actionable findings: - plugin/skills/tdd/SKILL.md: replace !`which git` injection with !`git --version` (covered by existing Bash(git *)); the old form needed Bash(which *) and would stall or silently fail at launch - docs/skills/tdd.md: state the scope confirmation is the only interactive checkpoint and the loop then runs unattended; note the discovery script's inferred commands are confirmed, not trusted - detect-tdd-context.sh: .csproj detection now uses find (was a top-level glob that missed src/**/x.csproj layouts) - docs/yagni.md: add the missing /tdd row (was a contradiction with the YAGNI list in docs/skills/README.md) - docs/quickstart.md: '(two or three skills)' -> '(a few skills)' (Path A is now five steps); Path A scent now names build-test-first - docs/concepts.md: fix link label to match its path (docs/guidance/plugin-entity-taxonomy.md) - CONTRIBUTING.md: add a step to update skill counts in CLAUDE.md, concepts.md, and README when adding a skill (the gap that caused the stale-count drift this review reconciled) Skipped per scope: CHANGELOG entry (explicit user constraint); pre-existing guidance-directory link convention (not introduced here).
The scope confirmation is no longer a human gate. It is a brief informational report the skill emits and then proceeds past without waiting. The branch offer becomes a recommendation, not a prompt. The ~10-item 'pause and ask' becomes a non-blocking scope warning recorded in the final summary while the loop continues. The one exception: if the request or provided context explicitly says the human wants to review/verify/approve the plan or test list before implementation, Step 1+2 present scope and the test list and wait for approval before the Step 3 loop. The only other thing that can block is a test command that cannot be resolved or inferred (a hard dependency, not a discretionary checkpoint). Updated SKILL.md Steps 1, 2, 3, 5; references/failure-modes.md; and docs/skills/tdd.md (TL;DR concept, How to invoke, How to get the most out of it, Cost and latency) to match. No version bump, no CHANGELOG.
mxriverlynn
added a commit
that referenced
this pull request
May 26, 2026
The description's final sentence ("It applies the project's coding
standards and ADRs during the green and refactor steps, and enforces
YAGNI during refactor") described internal behavior of the skill, not a
trigger phrase or a boundary against sibling skills. Per the plugin's
skill-description-frontmatter guidance, every sentence must improve
trigger accuracy or disambiguation.
The remaining four sentences cover what (red-green-refactor with the
observed-failure gate), when to use, trigger breadth (paraphrasings),
and boundary (test-planning, code-review, plan-a-feature, investigate).
Frontmatter is Level 1, paid in every conversation; the saved tokens
add up across all users.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds
/tdd, the han plugin's first execution skill, so that Claude Code can implement features test-first through a disciplined red-green-refactor loop rather than producing another document./tddwrites production code and tests directly into your working tree, driven by a BDD-framed (Behavior-Driven Development) behavior list and a strict observed-failure gate.issue-triageentry in CLAUDE.md's catalog).Behavior changes
Before this PR, han had no skill that wrote code. Every skill produced a report or document.
After this PR,
/tddworks like this:project-discovery.md, then a one-time inference script. Reports branch, commands, and standards, then proceeds. Not a gate.The observed-failure gate is the critical constraint: no production code may change until the skill has run the test, pasted the real runner output, and confirmed the test fails for the intended reason. A first-run pass is a process violation: stop and diagnose, do not proceed. This gate is enforced by discipline and shown evidence (pasted runner output, strict sequencing, a first-run-pass stop rule), not by a mechanism that can physically prevent an out-of-order write. The long-form doc is deliberately honest about this limitation.
The standards split reconciles two constraints. GREEN obeys standards governing correctness and architectural placement (crossing an ADR -- architectural decision record -- boundary during green is wrong code, not deferrable mess). Stylistic and structural standards, plus YAGNI (You Aren't Gonna Need It) enforcement, are the REFACTOR hat. This keeps green minimal while ensuring the code that survives each cycle respects the project's architecture.
Two things can stop the otherwise autonomous run:
project-discovery.md, the inference script, or manifest inference, the skill asks the user before the loop starts. TDD is impossible without a way to run tests.What to look at first
docs/skills/tdd.md. The observed-failure gate cannot be enforced by the plugin model: a sub-agent could, in principle, write production code without ever observing a red. The skill makes failure visible and diagnosable instead of pretending to prevent it. This was an explicit design choice; worth knowing before relying on the skill for high-stakes builds.plugin/skills/tdd/SKILL.md. Splitting coding-standards conformance across two phases is a deliberate departure from "commit whatever sins necessary in green." The rationale: some standards (architectural placement, ADR boundaries) are not sins you can tidy later, they are wrong code. Check whether the split as written matches your intuition of what a "correctness standard" is.allowed-toolslist breadth inplugin/skills/tdd/SKILL.md. The skill grants Bash access to npm, pytest, go, cargo, rake, mix, dotnet, gradle, mvn, and make. A language-agnostic TDD loop needs all of these because it cannot know the stack ahead of time. Check whether any runner is missing or the list is wider than needed.How this was tested
bash -nand run in this repo (no manifest present, correctly emitsinferred-test-command: noneand fails closed to the user-prompt path)./tddaccuracy against the skill definition, broad staleness) was run; its findings were fixed before this branch was finalized.Files of interest
plugin/skills/tdd/SKILL.md— the skill definition: the loop, the observed-failure gate, the standards split, and the autonomous-operation designdocs/skills/tdd.md— operator doc: when to use/tdd, the honest limitation on gate enforcement, and how to get the most out of itplugin/skills/tdd/references/failure-modes.md— eight named agent-faking failure modes and the discipline that catches eachplugin/skills/tdd/scripts/detect-tdd-context.sh— the only non-document file: manifest inference for resolving test/lint/build commands when CLAUDE.md and project-discovery.md come up emptyplugin/skills/test-planning/SKILL.md— bidirectional boundary edit:/test-planningnow points users toward/tddfor implementation