Skip to content

Test Planning: Usability and report output updates#43

Merged
mxriverlynn merged 1 commit into
mainfrom
test-plan-usability
Jun 2, 2026
Merged

Test Planning: Usability and report output updates#43
mxriverlynn merged 1 commit into
mainfrom
test-plan-usability

Conversation

@mxriverlynn
Copy link
Copy Markdown
Collaborator

@mxriverlynn mxriverlynn commented Jun 2, 2026

Summary

This PR reworks the /test-planning skill so its output leads with plain-language behavior and recommends only public-API behavioral tests, so that a reader without code context can grasp the plan in thirty seconds and the recommended tests survive refactors.

  • Restructures the test-plan output so it opens with three reader-first sections (a prose summary, the testing work grouped into themes, and a plain-language line per test) and demotes all the technical detail (priority tiers, code paths, coverage counts, the scope table) into a ## Technical Reference region below.
  • Adds a review pass that dispatches two reviewer sub-agents against the freshly generated plan to confirm it actually reads well to an outsider, and applies their actionable edits before finalizing.
  • Threads a behavioral-depth rule through test generation and merging: every recommended test must verify observable behavior at a public boundary, not internal implementation, and the plan stops at the critical behaviors a caller depends on rather than testing every branch.
  • Reviewers should pay extra attention to whether these three changes reinforce each other: the public-API rule decides which tests exist, the plain-language restructure decides how they read, and the review pass verifies the result reads correctly to someone outside the code.

Behavior changes

When you run /test-planning today, the resulting test plan opens with a scope table and per-item technical detail (each test tagged with a priority tier, code paths, approach, and justification). After this PR, the same skill produces a plan that opens with plain-language sections written for a reader who has not seen the code: a summary paragraph, the work grouped into 2-4 themes, and one plain-language line per test led by its stable test ID. The old technical sections still exist but now live under a ## Technical Reference region at the bottom, with the scope table moved from first to last.

Two further behavioral shifts: the skill now runs a fifth step that asks two reviewer sub-agents to read the generated plan and flag structure or comprehension problems (since the plan lives in the chat rather than a file, the full plan text is embedded directly in each reviewer's prompt). And the tests it recommends are now constrained to observable behavior at a public seam (caller inputs, outputs, side effects, and interactions with collaborators), with internal-implementation recommendations rewritten to go through the public boundary, dropped if no public seam exposes them, or collapsed into the one behavioral test that catches the same failures.

What to look at first

  • The behavioral-depth operating principle and its rule of thumb ("if two implementations produce the same observable behavior, the test must pass for both") plus the depth ceiling that stops the plan from over-specifying. This is the change with the most judgment baked in: it decides what tests get recommended at all, and the ceiling deliberately trades coverage breadth for refactor durability.
  • The new "behavioral sweep" step in the merge process, which rewrites, drops, or collapses recommendations that reach into internals. Worth checking whether "drop it if no public seam exposes it" could silently discard a legitimately important test.
  • How the review pass handles its two reviewers (an information-architect agent that audits findability and structure, and a junior-developer agent that checks the plain-language layer stands on its own). Actionable edits are auto-applied; judgment-call findings are surfaced with a recommended resolution. Confirm the auto-apply boundary feels right.

Files of interest

  • han.core/skills/test-planning/references/template.md — the output template restructure; this is where the plain-language-first spine and the demoted Technical Reference region are defined.
  • han.core/skills/test-planning/SKILL.md — the skill definition carrying the new public-API operating principle, the Step 5 review pass, and the behavioral-sweep merge step.
  • docs/skills/test-planning.md — the operator-facing doc synced to describe all three changes and the now five-step process.

…on public-API tests

Rework the /test-planning output so it reads as a plain-language overview
first instead of a TP-item dump. The template now leads with a Summary, a
What Needs Testing and Why themes section, and a What Each Test Covers
walkthrough, then drops the per-item Test Plan, Deferred, Dropped, Coverage
counts, and Scope into a labeled Technical Reference region below the spine.

Add a Step 5 review pass that dispatches information-architect and
junior-developer in parallel against the generated plan to confirm it leads
with plain language and the plain-language layer stands on its own.

Add a behavioral-depth operating principle: every recommended test verifies
observable behavior at a public seam (caller inputs, observed outputs and
side effects, collaborator interactions), never private methods or internal
state, and stops at the critical behaviors a caller depends on rather than
over-specifying every branch. Thread it through the agent dispatch prompts,
add a behavioral sweep to the merge step, and sync the long-form operator doc.
@mxriverlynn mxriverlynn force-pushed the test-plan-usability branch from 7bd9374 to d571966 Compare June 2, 2026 14:36
@mxriverlynn mxriverlynn marked this pull request as ready for review June 2, 2026 14:46
@mxriverlynn mxriverlynn merged commit 2e89c04 into main Jun 2, 2026
@mxriverlynn mxriverlynn deleted the test-plan-usability branch June 2, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant