Runbook skill by mxriverlynn · Pull Request #21 · testdouble/han

mxriverlynn · 2026-05-28T15:38:05Z

Summary

This PR adds the /runbook skill so engineers can capture a single operational procedure from a real trigger — a fired alert, an incident, a recurring task, or a live failure mode — without writing speculative on-call docs.

New deterministic skill that produces one runbook file per invocation at docs/runbooks/{slug}.md using a cross-format core template (symptom-first metadata block with Severity, Triggers, Reversible, Last validated, Last edited, Owner, Origin; then Symptoms, Prerequisites, imperative Resolve steps with expected output per step, Verify, Escalate, Rollback).
Hard YAGNI preflight gates the work: the user must point at a real alert that has fired, a documented incident or post-mortem, a scheduled recurring task, a live failure mode on a service receiving traffic, or a stakeholder commitment. With no evidence the skill recommends deferral and names the trigger that would justify revisiting; the user can override and the override is recorded in the runbook's Origin field.
Skill catalog grows from 20 to 21; new "Operations" group introduced in the skills index. About fifteen sibling skill docs gained one-to-three-line "this is not /runbook — use /runbook for that" cross-links in their negative-space sections.
Documentation-only PR. No runtime code paths in the plugin shift; the change is a new skill definition plus a template, a long-form operator doc, the underlying research report, and index updates.

Behavior changes

Before: users had no operations-category skill; runbook-shaped requests landed in /project-documentation (wrong shape — describes systems, not procedures) or /architectural-decision-record (wrong shape — records decisions, not procedures). After: /runbook exists, refuses to write speculative runbooks by default, and discovers the project's existing runbook directory and filename convention (flat, per-service, or alert-keyed) instead of imposing one.

What to look at first

The YAGNI preflight in SKILL.md Step 2. This is the central design decision and the place the skill will most often surprise users. Worth checking whether the five accepted evidence types and the override-recording behavior match how the team wants on-call documentation to be gated.
The choice of a deterministic template installer over a multi-agent interview. The research report walks six options and explains why the elaborate interview-plus-review design did not survive adversarial validation. If the conclusion feels under-justified, that's the file to push back on.
Filename-convention discovery in Step 3 (flat / per-service / alert-keyed). The skill matches existing convention when the project already has more than two runbooks. Worth confirming the "two-runbook" threshold and the fallback order are sensible.
The negative-space cross-links scattered across sibling skill docs. They are short, but there are about fifteen of them; check that the wording is consistent and that none of them oversell what /runbook does.

Files of interest

plugin/skills/runbook/SKILL.md — The skill's seven-step process, including the YAGNI preflight and the mode-selection table. Primary decision surface.
plugin/...runbook/references/runbook-template.md — The output template every runbook will copy from; shapes every future runbook in any project that installs this plugin.
docs/research/runbook-skill-research.md — Evidence and adversarial validation behind picking a deterministic template installer over an interview-driven design.
docs/skills/runbook.md — Operator-facing long-form doc; the "When to use it" and "Do not invoke for" lists are where users will form their mental model.
CLAUDE.md — Skill count moves 20 to 21 and a new "Operations" group lands in the catalog; confirms the index header alignment fix.

Research for the new `/runbook` skill (issue #18). Surveys industry formats (Google SRE, GitLab, OpenShift, PagerDuty, Atlassian, Rootly, OneUptime, FireHydrant, incident.io, Nobl9, et al.) and Han codebase patterns. Recommends a deterministic template installer with a YAGNI preflight and mandatory staleness metadata, after adversarial validation collapsed an earlier interview-driven design. Refs #18

Adds a new `/runbook` skill that creates or updates a runbook for a single operational scenario (alert that has fired, incident, recurring task, known failure mode) using a consistent symptom-first template. The template was reviewed by information-architect and junior-developer passes to land progressive disclosure on what problem the runbook resolves (Symptoms surfaced above secondary disambiguation) and how to resolve it (Quick fix → Resolve with imperative commands and expected output → Verify the fix landed → If a step fails → Escalate with condition-then-channel → Rollback). Metadata block carries the trust signals an on-call engineer needs at 2am: Severity, Triggers, Reversible, Last validated vs. Last edited (distinct), Owner, Origin. Applies the project's YAGNI rule as a preflight: requires real evidence (alert that has fired, documented incident, recurring task, live failure mode on a service receiving traffic, or stakeholder commitment) before producing the runbook. Speculative runbooks are deferred; the user can override and the override is recorded in the runbook's Origin field. Includes plugin/skills/runbook/SKILL.md, the reviewed references/runbook-template.md, docs/skills/runbook.md long-form operator doc, and index updates across CLAUDE.md, docs/skills/README.md, docs/concepts.md, README.md, and every long-form skill doc's "All N skills" reference. Bumps the skill count from 20 to 21 and adds an "Operations" purpose grouping in the skills index. Refs #18

Add reverse boundary statements and Related documentation pointers so the new /runbook skill is discoverable from the skills it sits next to: - project-documentation, architectural-decision-record, coding-standard now point users to /runbook for operational scenarios, in both the SKILL.md frontmatter and the long-form "Do not invoke for" section. - investigate's Related documentation now names the investigate -> runbook composition pair.

The CLAUDE.md skill catalog described the last group as 'operational', while docs/skills/README.md uses 'Operations' and docs/concepts.md uses 'operations'. Align CLAUDE.md to match.

mxriverlynn added 4 commits May 28, 2026 09:28

docs(CLAUDE.md): align skill catalog group name with index header

09a84bb

The CLAUDE.md skill catalog described the last group as 'operational', while docs/skills/README.md uses 'Operations' and docs/concepts.md uses 'operations'. Align CLAUDE.md to match.

mxriverlynn marked this pull request as ready for review May 28, 2026 15:53

mxriverlynn merged commit 489eeb3 into main May 28, 2026

mxriverlynn deleted the runbook-skill branch May 28, 2026 15:54

mxriverlynn mentioned this pull request May 28, 2026

Add /runbook skill for writing runbooks in a consistent format #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runbook skill#21

Runbook skill#21
mxriverlynn merged 4 commits into
mainfrom
runbook-skill

mxriverlynn commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mxriverlynn commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior changes

What to look at first

Files of interest

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mxriverlynn commented May 28, 2026 •

edited

Loading