skilldoc

Auto-generate agent-optimized CLI docs from --help output — verified, compressed, ready for AGENTS.md

The Problem

AI agents guess at CLI flags from training data instead of reading accurate docs. Hand-written tool docs go stale as CLIs change. A typical --help page is 48KB — that's ~12K tokens per context load.

The Solution

A three-stage pipeline turns raw --help output into a verified, compressed skill doc:

CLI --help
    ↓
extract     →  raw docs + structured JSON  (~48KB)
    ↓
distill     →  agent-optimized SKILL.md    (~2KB)
    ↓
validate    →  multi-model score 9/10+
    ↓
SKILL.md    →  drop into AGENTS.md, CLAUDE.md, OpenClaw skills

Quick Start

Note: The distill and validate steps require an LLM — either a coding CLI (Claude Code, Codex, or Gemini CLI) logged in, or an API key set. See LLM Setup for details.

Install

# Homebrew (macOS / Linux)
brew tap tychohq/tap && brew install skilldoc

# bun
bunx skilldoc run railway

# pnpm
pnpx skilldoc run railway

# npm
npx skilldoc run railway

Generate a skill

# Full pipeline in one shot: generate → distill → validate
skilldoc run railway

# Your agent-optimized skill is at ~/.agents/skills/railway/SKILL.md

Drop ~/.agents/skills/railway/SKILL.md into your AGENTS.md, CLAUDE.md, or OpenClaw skills directory. Your agent has verified docs instead of guessing from training data.

You can also run each step individually:

skilldoc generate railway    # extract raw docs from --help
skilldoc distill railway     # compress into agent-optimized SKILL.md
skilldoc validate railway    # score quality with multi-model evaluation

LLM Setup

The generate step works without an LLM. The distill and validate steps require one.

If you have a coding CLI installed and logged in, it just works — no config needed. The tool auto-detects Claude Code → Codex → Gemini CLI (first found on PATH wins).

If you prefer API keys, set any of these:

export ANTHROPIC_API_KEY=sk-ant-...     # → Anthropic API
export OPENAI_API_KEY=sk-...            # → OpenAI API
export GEMINI_API_KEY=...               # → Google Gemini API
export OPENROUTER_API_KEY=sk-or-...     # → OpenRouter API

API keys are checked only if no CLI is found. Each provider uses a sensible default model.

CLI verification commands

echo 'say ok' | claude -p --output-format text  # should print "ok"
echo 'say ok' | codex exec                       # should print "ok"
gemini -p 'say ok'                                # should print "ok"

Persistent config (pin provider/model)

Create ~/.skilldoc/config.yaml:

provider: claude-cli    # claude-cli | codex-cli | gemini-cli | anthropic | openai | gemini | openrouter
model: claude-opus-4-6 # optional — overrides the provider's default model
apiKey: sk-ant-...      # optional — overrides env var for this provider

Config file takes priority over auto-detection. You can also override per-run with --model <model>.

For validation, --models <m1,m2> accepts a comma-separated list to test across multiple models.

Example Output

Railway v4 overhauled its CLI — models trained on v3 still hallucinate railway run for deployments and miss the new variable set subcommand syntax. Here's the generated SKILL.md (~1.5KB, distilled from 52KB of --help):

# Railway CLI

Deploy and manage cloud applications with projects, services, environments, and databases.

## Critical Distinctions
- `up` uploads and deploys your code from the current directory
- `deploy` provisions a *template* (e.g., Postgres, Redis) — NOT for deploying your code
- `run` executes a local command with Railway env vars injected — it does NOT deploy anything

## Quick Reference
railway up                          # Deploy current directory
railway up -s my-api                # Deploy to specific service
railway logs -s my-api              # View deploy logs
railway variable set KEY=VAL        # Set env var
railway connect                     # Open database shell (psql, mongosh, etc.)

## Key Commands
| Command | Purpose |
|---------|---------|
| `up [-s service] [-d]` | Deploy from current dir; `-d` to detach from log stream |
| `variable set KEY=VAL` | Set env var; add `--skip-deploys` to skip redeployment |
| `variable list [-s svc]` | List variables; `--json` for JSON output |
| `link [-p project] [-s svc]` | Link current directory to a project/service |
| `service status` | Show deployment status across services |
| `logs [-s service]` | View build/deploy logs |
| `connect` | Open database shell (auto-detects Postgres, MongoDB, Redis) |
| `domain` | Add custom domain or generate a Railway-provided domain |

## Common Patterns
Deploy with message: `railway up -m "fix auth bug"`
Set var without redeploying: `railway variable set API_KEY=sk-123 --skip-deploys`
Stream build logs then exit: `railway up --ci`
Run local dev with Railway env: `railway run npm start`

See examples/ for real generated output for railway, jq, gh, curl, ffmpeg, and rg.

How It Works

1. Extract (`generate`)

Runs each tool's --help (and subcommand help) with LANG=C NO_COLOR=1 PAGER=cat for stable, deterministic output. Parses usage lines, flags, subcommands, examples, and env vars into structured JSON + Markdown. Stores a SHA-256 hash for change detection.

~/.agents/docs/skilldoc/<tool-id>/
  tool.json        # structured parse
  tool.md          # rendered markdown
  commands/        # per-subcommand docs
    <command>/
      command.json
      command.md

2. Distill (`distill`)

Passes raw docs to an LLM with a task-focused prompt. Output is a SKILL.md optimized for agents: quick reference, key flags, common patterns. Target size ~2KB. Skips re-distillation if help output is unchanged.

3. Validate (`validate`)

Runs scenario-based evaluation across multiple LLM models. Each model attempts realistic tasks using only the SKILL.md, then scores itself 1–10 on accuracy, completeness, and absence of hallucinations. Threshold: 9/10.

skilldoc validate railway --models claude-sonnet-4-6,claude-opus-4-6 --threshold 9

4. Refresh (`refresh`)

Re-runs generate + distill only for tools whose --help output has changed (by hash). Use --diff to see what changed in the SKILL.md.

skilldoc refresh --diff

Validation

Skills are evaluated by asking an LLM to complete realistic tasks using only the generated SKILL.md. Each scenario is graded 1–10 for correctness and absence of hallucinations.

Example report for railway:

validate railway (claude-sonnet-4-6, claude-opus-4-6)

claude-sonnet-4-6  average: 9.3/10
  Scenario 1: "deploy the current directory to a specific service" → 10/10
  Scenario 2: "set an env var without triggering a redeploy" → 9/10
  Scenario 3: "connect to the project's Postgres database" → 9/10

claude-opus-4-6    average: 9.7/10
  Scenario 1: "deploy the current directory to a specific service" → 10/10
  Scenario 2: "set an env var without triggering a redeploy" → 10/10
  Scenario 3: "connect to the project's Postgres database" → 9/10

overall: 9.5/10 — PASSED (threshold: 9)

If validation fails, --auto-redist re-runs distillation with feedback and you can re-validate.

Output Format

~/.agents/skills/<tool-id>/
  SKILL.md          # compressed, agent-optimized (drop into AGENTS.md)
  docs/
    advanced.md     # extended reference
    recipes.md      # common patterns
    troubleshooting.md

SKILL.md is the primary file — small enough to include inline in any agent system prompt. The docs/ subfolder holds overflow content for tools with complex help text.

Configuration

For batch operations across all installed tools, register tools with skilldoc add. The lock file at ~/.skills/skilldoc-lock.yaml is the single source of truth:

skilldoc add jq       # register jq in the lock file and generate its skill
skilldoc run          # full pipeline for all installed tools

You can also run individual steps for all installed tools:

skilldoc generate     # extract docs for all installed tools
skilldoc distill      # distill all into agent-optimized skills

Use --only <tool> to process a single tool from the lock file:

skilldoc generate --only jq

Contributing

Add a tool

skilldoc run <binary>   # full pipeline, score must be ≥ 9/10

Or use skilldoc add <binary> to register it in the lock file for batch operations.

Run tests

bun test

Build

bun run build   # outputs bin/skilldoc.js

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github/workflows		.github/workflows
.ralphy		.ralphy
PRDs		PRDs
bin		bin
examples		examples
research		research
src		src
test		test
.gitignore		.gitignore
.tldrignore		.tldrignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skilldoc

The Problem

The Solution

Quick Start

Install

Generate a skill

LLM Setup

Example Output

How It Works

1. Extract (`generate`)

2. Distill (`distill`)

3. Validate (`validate`)

4. Refresh (`refresh`)

Validation

Output Format

Configuration

Contributing

Add a tool

Run tests

Build

License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

tychohq/skilldoc

Folders and files

Latest commit

History

Repository files navigation

skilldoc

The Problem

The Solution

Quick Start

Install

Generate a skill

LLM Setup

Example Output

How It Works

1. Extract (generate)

2. Distill (distill)

3. Validate (validate)

4. Refresh (refresh)

Validation

Output Format

Configuration

Contributing

Add a tool

Run tests

Build

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

1. Extract (`generate`)

2. Distill (`distill`)

3. Validate (`validate`)

4. Refresh (`refresh`)

Packages