Add a /research skill by mxriverlynn · Pull Request #8 · testdouble/han

mxriverlynn · 2026-05-19T16:06:17Z

Summary

This PR adds /research, a sizing-aware skill that takes an open-ended question and returns an evidence-backed, adversarially-validated report recommending an option without producing any committed artifact, so that Han has a question-shaped sibling to /investigate instead of overloading the bug pipeline.

Adds the new /research skill (Han's 7th sizing-aware skill, 19th skill overall) plus a new research-analyst agent (22nd agent). It reuses codebase-explorer for codebase-grounded evidence and adversarial-validator to attack the result. The investigation behind this (recommendation.md) deliberately rejected expanding /investigate or building a two-mode skill, on Han's single-responsibility rule.
Hardens the shared adversarial-validator agent with a new, generally-applicable 4th strategy ("challenge the evidence-gathering integrity"). This is the load-bearing change: it moves the web-reach threat model (indirect prompt injection, astroturfing, single-source laundering) from brief text into the agent's hardcoded contract, and also strengthens /investigate. Reviewers should weigh whether this 4th strategy is correctly scoped as additive and low-risk for existing consumers.
Behavior controls worth scrutiny: web content is treated as data never instruction, the web-facing agent brief is isolated from codebase/operator context, evidence is required by default with an opt-in "exploratory" mode, and every report labels what does and does not have evidence.
Mechanical fan-out across the repo: bidirectional "Does not X — use research" routing added to 5 neighbor skills (and their long-form docs), skill/agent counts bumped 18→19 / 21→22, sizing lists moved six→seven, and ~36 doc footers updated.

Behavior changes

Before: an operator with an open-ended question ("what are my options for X", "should I use A or B", "how does Y work", "what's the prior art") had no skill that owned it. /investigate is a symptom→root-cause→fix pipeline; /plan-a-feature, /coding-standard, /gap-analysis, and /architectural-analysis each do research only as a bounded step toward a fixed artifact. There was no way to research options before committing to anything.
After: /research <question> classifies and sizes the question, fans research agents out across the codebase, the open web, and provided material, consolidates a numbered evidence/artifact registry, builds an options landscape, recommends one option, runs an adversarial-validation pass that can overturn the recommendation, and writes a fixed-structure report (plain-language Summary → Research Results → Options → Recommendation+evidence-basis → Validation → Artifacts registry → References). It never emits a spec, standard, gap report, architecture assessment, or code. Out-of-scope, hybrid, and compound requests are routed or split rather than forced through. /investigate and 4 other neighbors now route research-shaped requests back to /research.

What to look at first

The shared adversarial-validator 4th-strategy change (plugin/agents/adversarial-validator.md). This is the only edit that changes behavior of an existing, multi-consumer agent. CRIT-001 in the code review found the web-reach threat model was depending on brief text overriding the agent's closed 3-strategy contract; the fix promotes it into the agent. Confirm the "all three"/"minimum 5" wording was updated consistently and that it stays additive for /investigate.
The web-reach trust model (plugin/skills/research/SKILL.md Operating Principles + Step 5, research-analyst.md anti-patterns). The skill reaches the live web, so untrusted content is a first-class input. Decisions D16 (data-not-instruction, context isolation, trust labeling) and D11 (corroboration, retrieval dates) are the defenses. WARN-001 specifically tightened Step 5's brief exclusion to match the stricter Operating Principle — worth confirming the two now agree.
Evidence mode and report structure (D23, D24) — evidence required by default, opt-in "exploratory" mode, and one fixed report structure with inline artifact-ID cross-referencing. Reviewers should weigh whether the strict/exploratory split and the always-present Artifacts/References sections match the spec.
Reviewer note (not a defect to fix here): the new files use em-dashes, which writing-voice.md bans unconditionally — but every existing plugin file already uses them. The code review surfaced this (WARN-004) as a repo-wide standard-vs-practice contradiction and deliberately did not de-em-dash /research in isolation, since that would make it the lone outlier. This wants a repo-wide decision, not a fix in this PR.

Files of interest

plugin/skills/research/SKILL.md — the skill's behavior: classification, sizing, roster, web-reach controls, 8-step flow.
plugin/agents/research-analyst.md — the new agent owning the web/prior-art and option-comparison angles, with the data-not-instruction anti-patterns.
plugin/agents/adversarial-validator.md — the shared-agent 4th-strategy change; the only behavior edit affecting an existing consumer (/investigate).
docs/plans/research-skill/recommendation.md — why /research is a separate skill, with the adversarial validation that corrected the original evidence.
docs/plans/research-skill/artifacts/decision-log.md — D1–D24, where every behavioral tradeoff (web reach, roster, sizing, untrusted-source controls, evidence mode, report structure) was settled.

Evidence-based investigation with adversarial validation. Recommends a separate /research skill scoped to open-ended, output-agnostic research. Includes plain-language summary, for/against evidence table, and four cross-referenced artifacts (investigation angles + adversarial pass).

plan-a-feature Steps 1-5: behavioral spec for a new /research skill (question -> evidence -> options landscape -> recommendation -> adversarial validation), 11 full + 3 trivial decisions. Three forks settled by user: web+codebase reach, new research agent + reuse, and small/medium/large swarm sizing. No tech-notes qualified.

plan-a-feature Steps 5.5-7. Medium-size review team (junior-developer, gap-analyzer, edge-case-explorer, adversarial-security-analyst). Resolved 16 major + 6 minor findings: added untrusted-web-source handling (D16), research sizing signals (D15), compound-question (D17), hybrid-routing (D18), output-collision guard (D19); strengthened evidence sourcing (D11) and validator charter (D7); dropped gap-analyzer from the roster per user (D4). Decision log + findings log updated and cross-referenced.

plan-a-feature Step 8. project-manager (synthesis mode) verified all 22 findings discharged in-file, confirmed cross-reference invariants and no mechanics leak, and fixed a broken anchor (D14 promoted to heading so the spec's #d14-invocation-surface link resolves).

D20: rollout plan owned by plan-implementation, ~14+ files with the count/sizing surfaces enumerated. D21: group /research next to /investigate under a relabeled "Investigation & research" grouping. Spec Open Items, Summary, and Out of Scope updated; decision log and findings log cross-referenced. OI-3 remains, pending the skills-calling-skills investigation.

Full /investigate run (3 evidence-based-investigators + claude-code-guide + adversarial-validator). Adversarial pass overturned the naive "blanket-ban" reading: data-fetch sub-skills are evidenced-unreliable, orchestration is underdetermined (unsupported assertion, no documented failure), recommended pattern is Agent-tool dispatch + inline discovery. Decisive for OI-3 (V8): /research invokes no skills (routing = naming a sibling, not calling it), so it already complies; only build-time check is that SKILL.md allowed-tools omits Skill. Broader six-file guidance contradiction tracked as a separate ADR-worthy Han maintenance item. Investigation artifact added; OI-3 closed; spec/decision-log/findings cross-referenced. All open items resolved.

New swarming skill plugin/skills/research/ (SKILL.md + report template) and plugin/agents/research-analyst.md, implementing the spec at docs/plans/research-skill/feature-specification.md. Sized small/medium/ large with research-specific signals (D15); question -> sourced evidence -> options landscape -> recommendation -> adversarial validation spine (D6/D7); untrusted-web-source controls — data-not-instruction, web/ codebase isolation, corroboration (D11/D16); compound/hybrid/redirect classification (D8/D17/D18); output-collision guard (D19); allowed-tools includes web + Agent and omits Skill per D22.

docs/skills/research.md and docs/agents/research-analyst.md (coverage rule). Reciprocal 'use research' boundary statements added to all five neighbors per D9 — investigate, plan-a-feature, coding-standard, gap-analysis, architectural-analysis — in both SKILL.md descriptions and long-form 'Do not invoke for' sections, completing the bidirectional disambiguation /research's own description already declares.

Counts bumped to 19 skills / 22 agents across CLAUDE.md, README.md, docs/concepts.md, and every long-form doc footer. /research registered as the 7th sizing-aware skill in sizing.md (enumeration + table), concepts.md, skills/README.md, README.md, and quickstart.md. Skills index grouping relabeled 'Investigation & research' per D21 with the /research entry; research-analyst added to the agents index. New quickstart Path E plus a combining-paths example. Bidirectional Related-docs link between investigate and research. No version bump, no CHANGELOG change, manifests auto-discover.

YAGNI is a planning/implementation gate, not a research standard. Drop the See-also breadcrumb, the dedicated ## YAGNI section, and the Related-docs bullet from docs/skills/research.md and docs/agents/research-analyst.md, matching the convention used by other non-YAGNI skill/agent docs (project-discovery, update-pr-description, project-scanner). /research was never registered in yagni.md or the concepts YAGNI list, so no index change is needed.

D23: evidence required by default; operator can opt into exploratory (evidence-optional) mode; report always labels every claim's evidence status and states the recommendation's evidence basis. D24: one fixed report structure — plain-language Summary at top, Research Results (minimal tech detail), indexed Options to Consider, Recommendation with evidence basis, Validation, an indexed Artifacts registry (link + summary per source), and a References section at the very bottom; all cross-referenced inline by artifact ID for full traceability. Spec Outcome/Primary Flow/Edge Cases/User Interactions and decision-log cross-refs (D1->D24, D11->D23) updated; user-input decision count 5->7.

SKILL.md: detect strict (default) vs exploratory evidence mode in Step 1 and thread it through briefs; Step 6 compiles an indexed Artifacts registry (link + summary + trust class + corroboration status) instead of a flat evidence list; Step 7 synthesizes plain Research Results + indexed Options + Recommendation with explicit evidence basis; Step 8 renders the one fixed structure. Report template rebuilt: Summary (plain, top) -> Research Results -> Options to Consider -> Recommendation -> Validation -> Artifacts -> References (bottom), cross-referenced by artifact ID for full traceability. research-analyst agent output format + rules updated to artifacts/results/options with evidence-mode handling. Long-form docs (research.md, research-analyst.md) updated for the new structure and the evidence-mode override.

Full code-review (large; junior-developer + adversarial-security-analyst + manual conformance pass) at docs/plans/research-skill/artifacts/code-review.md. Fixed: - CRIT-001: extend shared adversarial-validator with a 4th, generally- applicable strategy (Challenge the Evidence-Gathering Integrity: injection/astroturfing/staleness/single-source) so D7's web-reach defense is enforced at the agent level, not only via brief text; vocabulary, anti-pattern, rules, and long-form doc updated. Additive and valuable for /investigate and planning consumers too. - WARN-001: Step 5 brief exclusion now also bars operator/CLAUDE context, matching the Operating Principle (closes an exfiltration precondition). - WARN-002: added the missing ## Sizing section to docs/skills/research.md. - WARN-003: CLAUDE.md and docs/sizing.md six-skill enumerations now include /research (seven). - SUGG-001..005: template Summary/cross-ref contradiction; codebase- explorer added to research-analyst Related docs; directory link now targets the file; role identity tightened to the token budget; argument-hint surfaces the evidence-mode opt-in. Surfaced, not fixed: WARN-004 (em-dashes) — writing-voice.md bans them but every plugin file uses them; project-pattern deference makes this a repo-wide reconciliation, not a /research-only correction.

allowed-tools is Level 1 frontmatter, always loaded in every conversation. The list previously enumerated 17 Bash runners covering every major stack. Pruning to the runners most users actually invoke (npm/npx/pnpm/yarn, pytest/python3, go, cargo, make, bundle/rake, plus git and find) drops four entries: mix, dotnet, gradle, mvn. Users on Elixir, .NET, or JVM stacks will see a one-time permission prompt for their build tool. Other stacks lose nothing. The cross-language posture is preserved where it matters most; the always-loaded token cost shrinks.

mxriverlynn added 13 commits May 19, 2026 08:44

mxriverlynn merged commit dbd3cbf into main May 19, 2026

mxriverlynn deleted the research-and-swarm branch May 19, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a /research skill#8

Add a /research skill#8
mxriverlynn merged 13 commits into
mainfrom
research-and-swarm

mxriverlynn commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mxriverlynn commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior changes

What to look at first

Files of interest

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mxriverlynn commented May 19, 2026 •

edited

Loading