Skip to content

Add a /research skill#8

Merged
mxriverlynn merged 13 commits into
mainfrom
research-and-swarm
May 19, 2026
Merged

Add a /research skill#8
mxriverlynn merged 13 commits into
mainfrom
research-and-swarm

Conversation

@mxriverlynn
Copy link
Copy Markdown
Collaborator

@mxriverlynn mxriverlynn commented May 19, 2026

Summary

This PR adds /research, a sizing-aware skill that takes an open-ended question and returns an evidence-backed, adversarially-validated report recommending an option without producing any committed artifact, so that Han has a question-shaped sibling to /investigate instead of overloading the bug pipeline.

  • Adds the new /research skill (Han's 7th sizing-aware skill, 19th skill overall) plus a new research-analyst agent (22nd agent). It reuses codebase-explorer for codebase-grounded evidence and adversarial-validator to attack the result. The investigation behind this (recommendation.md) deliberately rejected expanding /investigate or building a two-mode skill, on Han's single-responsibility rule.
  • Hardens the shared adversarial-validator agent with a new, generally-applicable 4th strategy ("challenge the evidence-gathering integrity"). This is the load-bearing change: it moves the web-reach threat model (indirect prompt injection, astroturfing, single-source laundering) from brief text into the agent's hardcoded contract, and also strengthens /investigate. Reviewers should weigh whether this 4th strategy is correctly scoped as additive and low-risk for existing consumers.
  • Behavior controls worth scrutiny: web content is treated as data never instruction, the web-facing agent brief is isolated from codebase/operator context, evidence is required by default with an opt-in "exploratory" mode, and every report labels what does and does not have evidence.
  • Mechanical fan-out across the repo: bidirectional "Does not X — use research" routing added to 5 neighbor skills (and their long-form docs), skill/agent counts bumped 18→19 / 21→22, sizing lists moved six→seven, and ~36 doc footers updated.

Behavior changes

  • Before: an operator with an open-ended question ("what are my options for X", "should I use A or B", "how does Y work", "what's the prior art") had no skill that owned it. /investigate is a symptom→root-cause→fix pipeline; /plan-a-feature, /coding-standard, /gap-analysis, and /architectural-analysis each do research only as a bounded step toward a fixed artifact. There was no way to research options before committing to anything.
  • After: /research <question> classifies and sizes the question, fans research agents out across the codebase, the open web, and provided material, consolidates a numbered evidence/artifact registry, builds an options landscape, recommends one option, runs an adversarial-validation pass that can overturn the recommendation, and writes a fixed-structure report (plain-language Summary → Research Results → Options → Recommendation+evidence-basis → Validation → Artifacts registry → References). It never emits a spec, standard, gap report, architecture assessment, or code. Out-of-scope, hybrid, and compound requests are routed or split rather than forced through. /investigate and 4 other neighbors now route research-shaped requests back to /research.

What to look at first

  • The shared adversarial-validator 4th-strategy change (plugin/agents/adversarial-validator.md). This is the only edit that changes behavior of an existing, multi-consumer agent. CRIT-001 in the code review found the web-reach threat model was depending on brief text overriding the agent's closed 3-strategy contract; the fix promotes it into the agent. Confirm the "all three"/"minimum 5" wording was updated consistently and that it stays additive for /investigate.
  • The web-reach trust model (plugin/skills/research/SKILL.md Operating Principles + Step 5, research-analyst.md anti-patterns). The skill reaches the live web, so untrusted content is a first-class input. Decisions D16 (data-not-instruction, context isolation, trust labeling) and D11 (corroboration, retrieval dates) are the defenses. WARN-001 specifically tightened Step 5's brief exclusion to match the stricter Operating Principle — worth confirming the two now agree.
  • Evidence mode and report structure (D23, D24) — evidence required by default, opt-in "exploratory" mode, and one fixed report structure with inline artifact-ID cross-referencing. Reviewers should weigh whether the strict/exploratory split and the always-present Artifacts/References sections match the spec.
  • Reviewer note (not a defect to fix here): the new files use em-dashes, which writing-voice.md bans unconditionally — but every existing plugin file already uses them. The code review surfaced this (WARN-004) as a repo-wide standard-vs-practice contradiction and deliberately did not de-em-dash /research in isolation, since that would make it the lone outlier. This wants a repo-wide decision, not a fix in this PR.

Files of interest

  • plugin/skills/research/SKILL.md — the skill's behavior: classification, sizing, roster, web-reach controls, 8-step flow.
  • plugin/agents/research-analyst.md — the new agent owning the web/prior-art and option-comparison angles, with the data-not-instruction anti-patterns.
  • plugin/agents/adversarial-validator.md — the shared-agent 4th-strategy change; the only behavior edit affecting an existing consumer (/investigate).
  • docs/plans/research-skill/recommendation.md — why /research is a separate skill, with the adversarial validation that corrected the original evidence.
  • docs/plans/research-skill/artifacts/decision-log.md — D1–D24, where every behavioral tradeoff (web reach, roster, sizing, untrusted-source controls, evidence mode, report structure) was settled.

Evidence-based investigation with adversarial validation. Recommends a
separate /research skill scoped to open-ended, output-agnostic research.
Includes plain-language summary, for/against evidence table, and four
cross-referenced artifacts (investigation angles + adversarial pass).
plan-a-feature Steps 1-5: behavioral spec for a new /research skill
(question -> evidence -> options landscape -> recommendation ->
adversarial validation), 11 full + 3 trivial decisions. Three forks
settled by user: web+codebase reach, new research agent + reuse, and
small/medium/large swarm sizing. No tech-notes qualified.
plan-a-feature Steps 5.5-7. Medium-size review team (junior-developer,
gap-analyzer, edge-case-explorer, adversarial-security-analyst).
Resolved 16 major + 6 minor findings: added untrusted-web-source
handling (D16), research sizing signals (D15), compound-question (D17),
hybrid-routing (D18), output-collision guard (D19); strengthened
evidence sourcing (D11) and validator charter (D7); dropped gap-analyzer
from the roster per user (D4). Decision log + findings log updated and
cross-referenced.
plan-a-feature Step 8. project-manager (synthesis mode) verified all
22 findings discharged in-file, confirmed cross-reference invariants and
no mechanics leak, and fixed a broken anchor (D14 promoted to heading so
the spec's #d14-invocation-surface link resolves).
D20: rollout plan owned by plan-implementation, ~14+ files with the
count/sizing surfaces enumerated. D21: group /research next to
/investigate under a relabeled "Investigation & research" grouping.
Spec Open Items, Summary, and Out of Scope updated; decision log and
findings log cross-referenced. OI-3 remains, pending the
skills-calling-skills investigation.
Full /investigate run (3 evidence-based-investigators + claude-code-guide
+ adversarial-validator). Adversarial pass overturned the naive
"blanket-ban" reading: data-fetch sub-skills are evidenced-unreliable,
orchestration is underdetermined (unsupported assertion, no documented
failure), recommended pattern is Agent-tool dispatch + inline discovery.
Decisive for OI-3 (V8): /research invokes no skills (routing = naming a
sibling, not calling it), so it already complies; only build-time check
is that SKILL.md allowed-tools omits Skill. Broader six-file guidance
contradiction tracked as a separate ADR-worthy Han maintenance item.
Investigation artifact added; OI-3 closed; spec/decision-log/findings
cross-referenced. All open items resolved.
New swarming skill plugin/skills/research/ (SKILL.md + report template)
and plugin/agents/research-analyst.md, implementing the spec at
docs/plans/research-skill/feature-specification.md. Sized small/medium/
large with research-specific signals (D15); question -> sourced evidence
-> options landscape -> recommendation -> adversarial validation spine
(D6/D7); untrusted-web-source controls — data-not-instruction, web/
codebase isolation, corroboration (D11/D16); compound/hybrid/redirect
classification (D8/D17/D18); output-collision guard (D19); allowed-tools
includes web + Agent and omits Skill per D22.
docs/skills/research.md and docs/agents/research-analyst.md (coverage
rule). Reciprocal 'use research' boundary statements added to all five
neighbors per D9 — investigate, plan-a-feature, coding-standard,
gap-analysis, architectural-analysis — in both SKILL.md descriptions and
long-form 'Do not invoke for' sections, completing the bidirectional
disambiguation /research's own description already declares.
Counts bumped to 19 skills / 22 agents across CLAUDE.md, README.md,
docs/concepts.md, and every long-form doc footer. /research registered
as the 7th sizing-aware skill in sizing.md (enumeration + table),
concepts.md, skills/README.md, README.md, and quickstart.md. Skills
index grouping relabeled 'Investigation & research' per D21 with the
/research entry; research-analyst added to the agents index. New
quickstart Path E plus a combining-paths example. Bidirectional
Related-docs link between investigate and research. No version bump,
no CHANGELOG change, manifests auto-discover.
YAGNI is a planning/implementation gate, not a research standard. Drop
the See-also breadcrumb, the dedicated ## YAGNI section, and the
Related-docs bullet from docs/skills/research.md and
docs/agents/research-analyst.md, matching the convention used by other
non-YAGNI skill/agent docs (project-discovery, update-pr-description,
project-scanner). /research was never registered in yagni.md or the
concepts YAGNI list, so no index change is needed.
D23: evidence required by default; operator can opt into exploratory
(evidence-optional) mode; report always labels every claim's evidence
status and states the recommendation's evidence basis. D24: one fixed
report structure — plain-language Summary at top, Research Results
(minimal tech detail), indexed Options to Consider, Recommendation with
evidence basis, Validation, an indexed Artifacts registry (link +
summary per source), and a References section at the very bottom; all
cross-referenced inline by artifact ID for full traceability. Spec
Outcome/Primary Flow/Edge Cases/User Interactions and decision-log
cross-refs (D1->D24, D11->D23) updated; user-input decision count 5->7.
SKILL.md: detect strict (default) vs exploratory evidence mode in
Step 1 and thread it through briefs; Step 6 compiles an indexed
Artifacts registry (link + summary + trust class + corroboration
status) instead of a flat evidence list; Step 7 synthesizes plain
Research Results + indexed Options + Recommendation with explicit
evidence basis; Step 8 renders the one fixed structure. Report template
rebuilt: Summary (plain, top) -> Research Results -> Options to Consider
-> Recommendation -> Validation -> Artifacts -> References (bottom),
cross-referenced by artifact ID for full traceability. research-analyst
agent output format + rules updated to artifacts/results/options with
evidence-mode handling. Long-form docs (research.md, research-analyst.md)
updated for the new structure and the evidence-mode override.
Full code-review (large; junior-developer + adversarial-security-analyst
+ manual conformance pass) at docs/plans/research-skill/artifacts/code-review.md.

Fixed:
- CRIT-001: extend shared adversarial-validator with a 4th, generally-
  applicable strategy (Challenge the Evidence-Gathering Integrity:
  injection/astroturfing/staleness/single-source) so D7's web-reach
  defense is enforced at the agent level, not only via brief text;
  vocabulary, anti-pattern, rules, and long-form doc updated. Additive
  and valuable for /investigate and planning consumers too.
- WARN-001: Step 5 brief exclusion now also bars operator/CLAUDE
  context, matching the Operating Principle (closes an exfiltration
  precondition).
- WARN-002: added the missing ## Sizing section to docs/skills/research.md.
- WARN-003: CLAUDE.md and docs/sizing.md six-skill enumerations now
  include /research (seven).
- SUGG-001..005: template Summary/cross-ref contradiction; codebase-
  explorer added to research-analyst Related docs; directory link now
  targets the file; role identity tightened to the token budget;
  argument-hint surfaces the evidence-mode opt-in.

Surfaced, not fixed: WARN-004 (em-dashes) — writing-voice.md bans them
but every plugin file uses them; project-pattern deference makes this a
repo-wide reconciliation, not a /research-only correction.
@mxriverlynn mxriverlynn merged commit dbd3cbf into main May 19, 2026
@mxriverlynn mxriverlynn deleted the research-and-swarm branch May 19, 2026 16:49
mxriverlynn added a commit that referenced this pull request May 26, 2026
allowed-tools is Level 1 frontmatter, always loaded in every
conversation. The list previously enumerated 17 Bash runners covering
every major stack. Pruning to the runners most users actually invoke
(npm/npx/pnpm/yarn, pytest/python3, go, cargo, make, bundle/rake, plus
git and find) drops four entries: mix, dotnet, gradle, mvn.

Users on Elixir, .NET, or JVM stacks will see a one-time permission
prompt for their build tool. Other stacks lose nothing. The
cross-language posture is preserved where it matters most; the
always-loaded token cost shrinks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant