Runbook skill#21
Merged
Merged
Conversation
Research for the new `/runbook` skill (issue #18). Surveys industry formats (Google SRE, GitLab, OpenShift, PagerDuty, Atlassian, Rootly, OneUptime, FireHydrant, incident.io, Nobl9, et al.) and Han codebase patterns. Recommends a deterministic template installer with a YAGNI preflight and mandatory staleness metadata, after adversarial validation collapsed an earlier interview-driven design. Refs #18
Adds a new `/runbook` skill that creates or updates a runbook for a single operational scenario (alert that has fired, incident, recurring task, known failure mode) using a consistent symptom-first template. The template was reviewed by information-architect and junior-developer passes to land progressive disclosure on what problem the runbook resolves (Symptoms surfaced above secondary disambiguation) and how to resolve it (Quick fix → Resolve with imperative commands and expected output → Verify the fix landed → If a step fails → Escalate with condition-then-channel → Rollback). Metadata block carries the trust signals an on-call engineer needs at 2am: Severity, Triggers, Reversible, Last validated vs. Last edited (distinct), Owner, Origin. Applies the project's YAGNI rule as a preflight: requires real evidence (alert that has fired, documented incident, recurring task, live failure mode on a service receiving traffic, or stakeholder commitment) before producing the runbook. Speculative runbooks are deferred; the user can override and the override is recorded in the runbook's Origin field. Includes plugin/skills/runbook/SKILL.md, the reviewed references/runbook-template.md, docs/skills/runbook.md long-form operator doc, and index updates across CLAUDE.md, docs/skills/README.md, docs/concepts.md, README.md, and every long-form skill doc's "All N skills" reference. Bumps the skill count from 20 to 21 and adds an "Operations" purpose grouping in the skills index. Refs #18
Add reverse boundary statements and Related documentation pointers so the new /runbook skill is discoverable from the skills it sits next to: - project-documentation, architectural-decision-record, coding-standard now point users to /runbook for operational scenarios, in both the SKILL.md frontmatter and the long-form "Do not invoke for" section. - investigate's Related documentation now names the investigate -> runbook composition pair.
The CLAUDE.md skill catalog described the last group as 'operational', while docs/skills/README.md uses 'Operations' and docs/concepts.md uses 'operations'. Align CLAUDE.md to match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the
/runbookskill so engineers can capture a single operational procedure from a real trigger — a fired alert, an incident, a recurring task, or a live failure mode — without writing speculative on-call docs.docs/runbooks/{slug}.mdusing a cross-format core template (symptom-first metadata block with Severity, Triggers, Reversible, Last validated, Last edited, Owner, Origin; then Symptoms, Prerequisites, imperative Resolve steps with expected output per step, Verify, Escalate, Rollback).Behavior changes
Before: users had no operations-category skill; runbook-shaped requests landed in
/project-documentation(wrong shape — describes systems, not procedures) or/architectural-decision-record(wrong shape — records decisions, not procedures). After:/runbookexists, refuses to write speculative runbooks by default, and discovers the project's existing runbook directory and filename convention (flat, per-service, or alert-keyed) instead of imposing one.What to look at first
SKILL.mdStep 2. This is the central design decision and the place the skill will most often surprise users. Worth checking whether the five accepted evidence types and the override-recording behavior match how the team wants on-call documentation to be gated./runbookdoes.Files of interest
plugin/skills/runbook/SKILL.md— The skill's seven-step process, including the YAGNI preflight and the mode-selection table. Primary decision surface.plugin/...runbook/references/runbook-template.md— The output template every runbook will copy from; shapes every future runbook in any project that installs this plugin.docs/research/runbook-skill-research.md— Evidence and adversarial validation behind picking a deterministic template installer over an interview-driven design.docs/skills/runbook.md— Operator-facing long-form doc; the "When to use it" and "Do not invoke for" lists are where users will form their mental model.CLAUDE.md— Skill count moves 20 to 21 and a new "Operations" group lands in the catalog; confirms the index header alignment fix.