Test Planning: Usability and report output updates by mxriverlynn · Pull Request #43 · testdouble/han

mxriverlynn · 2026-06-02T14:34:53Z

Summary

This PR reworks the /test-planning skill so its output leads with plain-language behavior and recommends only public-API behavioral tests, so that a reader without code context can grasp the plan in thirty seconds and the recommended tests survive refactors.

Restructures the test-plan output so it opens with three reader-first sections (a prose summary, the testing work grouped into themes, and a plain-language line per test) and demotes all the technical detail (priority tiers, code paths, coverage counts, the scope table) into a ## Technical Reference region below.
Adds a review pass that dispatches two reviewer sub-agents against the freshly generated plan to confirm it actually reads well to an outsider, and applies their actionable edits before finalizing.
Threads a behavioral-depth rule through test generation and merging: every recommended test must verify observable behavior at a public boundary, not internal implementation, and the plan stops at the critical behaviors a caller depends on rather than testing every branch.
Reviewers should pay extra attention to whether these three changes reinforce each other: the public-API rule decides which tests exist, the plain-language restructure decides how they read, and the review pass verifies the result reads correctly to someone outside the code.

Behavior changes

When you run /test-planning today, the resulting test plan opens with a scope table and per-item technical detail (each test tagged with a priority tier, code paths, approach, and justification). After this PR, the same skill produces a plan that opens with plain-language sections written for a reader who has not seen the code: a summary paragraph, the work grouped into 2-4 themes, and one plain-language line per test led by its stable test ID. The old technical sections still exist but now live under a ## Technical Reference region at the bottom, with the scope table moved from first to last.

Two further behavioral shifts: the skill now runs a fifth step that asks two reviewer sub-agents to read the generated plan and flag structure or comprehension problems (since the plan lives in the chat rather than a file, the full plan text is embedded directly in each reviewer's prompt). And the tests it recommends are now constrained to observable behavior at a public seam (caller inputs, outputs, side effects, and interactions with collaborators), with internal-implementation recommendations rewritten to go through the public boundary, dropped if no public seam exposes them, or collapsed into the one behavioral test that catches the same failures.

What to look at first

The behavioral-depth operating principle and its rule of thumb ("if two implementations produce the same observable behavior, the test must pass for both") plus the depth ceiling that stops the plan from over-specifying. This is the change with the most judgment baked in: it decides what tests get recommended at all, and the ceiling deliberately trades coverage breadth for refactor durability.
The new "behavioral sweep" step in the merge process, which rewrites, drops, or collapses recommendations that reach into internals. Worth checking whether "drop it if no public seam exposes it" could silently discard a legitimately important test.
How the review pass handles its two reviewers (an information-architect agent that audits findability and structure, and a junior-developer agent that checks the plain-language layer stands on its own). Actionable edits are auto-applied; judgment-call findings are surfaced with a recommended resolution. Confirm the auto-apply boundary feels right.

Files of interest

han.core/skills/test-planning/references/template.md — the output template restructure; this is where the plain-language-first spine and the demoted Technical Reference region are defined.
han.core/skills/test-planning/SKILL.md — the skill definition carrying the new public-API operating principle, the Step 5 review pass, and the behavioral-sweep merge step.
docs/skills/test-planning.md — the operator-facing doc synced to describe all three changes and the now five-step process.

…on public-API tests Rework the /test-planning output so it reads as a plain-language overview first instead of a TP-item dump. The template now leads with a Summary, a What Needs Testing and Why themes section, and a What Each Test Covers walkthrough, then drops the per-item Test Plan, Deferred, Dropped, Coverage counts, and Scope into a labeled Technical Reference region below the spine. Add a Step 5 review pass that dispatches information-architect and junior-developer in parallel against the generated plan to confirm it leads with plain language and the plain-language layer stands on its own. Add a behavioral-depth operating principle: every recommended test verifies observable behavior at a public seam (caller inputs, observed outputs and side effects, collaborator interactions), never private methods or internal state, and stops at the critical behaviors a caller depends on rather than over-specifying every branch. Thread it through the agent dispatch prompts, add a behavioral sweep to the merge step, and sync the long-form operator doc.

mxriverlynn force-pushed the test-plan-usability branch from 7bd9374 to d571966 Compare June 2, 2026 14:36

mxriverlynn marked this pull request as ready for review June 2, 2026 14:46

mxriverlynn merged commit 2e89c04 into main Jun 2, 2026

mxriverlynn deleted the test-plan-usability branch June 2, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Planning: Usability and report output updates#43

Test Planning: Usability and report output updates#43
mxriverlynn merged 1 commit into
mainfrom
test-plan-usability

mxriverlynn commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mxriverlynn commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior changes

What to look at first

Files of interest

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mxriverlynn commented Jun 2, 2026 •

edited

Loading