Skip to content

docs: restore Starlight site + refresh for v1 + agent-friendly USAGE section#87

Merged
theagenticguy merged 17 commits into
mainfrom
docs/site-restore-v1
May 10, 2026
Merged

docs: restore Starlight site + refresh for v1 + agent-friendly USAGE section#87
theagenticguy merged 17 commits into
mainfrom
docs/site-restore-v1

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit 4431b53) under T-M2-3 with the explicit promise to spin it up as theagenticguy/opencodehub-docs. That separate repo was never created. The site at https://theagenticguy.github.io/opencodehub/ has been serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node 20 / 14-language prose, missing every milestone since (M3-M7, Track A-D, parse-runtime flip, 20-scanner inventory, supply-chain hardening).

This PR restores packages/docs/ + .github/workflows/pages.yml from 4431b53^, refreshes every page against v1 reality, adds a deep agent-friendly agents/ section, ships a machine-readable tool catalog, hardens the workflow, and lifts LadybugDB out of the banned-strings policy now that it's a first-class product name.

Three deep specialists ran in parallel after the bulk-restore, with one polish pass at the end.

What's in here

Restoration (f801f1a)

56 files restored from history. Build clean out of the box: 47 pages, links valid, Pagefind index, llm-nav banners.

Content refresh (8 commits, 00a0fcec0376d8)

  • Start here — install (Node 22 or 24, mise, codehub init), quick-start (first MCP call), what-is-opencodehub, codehub-init, first-query — all v1.
  • MCPmcp/overview.md reframes 29 tools across five families (exploration, group/federation, scan/findings/verdict, HTTP/routing, meta). mcp/tools.md rewritten as full per-tool catalog with when-to-use / when-not-to-use / signature / example. mcp/resources.md + mcp/prompts.md updated.
  • Referencecli.md verified against packages/cli/src/index.ts shape; configuration.md env-var inventory + AMBIGUOUS_REPO envelope + EMBEDDER_MISMATCH from ADR 0014; languages.md 15-language table; error-codes.md current set.
  • Architecture — overview, monorepo-map (17 packages, dropped eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings (3-backend precedence), parsing-and-resolution (WASM-default + native opt-in), determinism (graphHash invariant), scanners-and-sarif (20-scanner inventory), scip-reconciliation, supply-chain, adrs (0001-0014 index).
  • New architecture pagesstorage-backend.md (LadybugDB + DuckDB segregation, IGraphStore/ITemporalStore, community-adapter escape hatch); cross-repo-federation.md (repo-as-typed-node, AMBIGUOUS_REPO, group_* tools); lessons.md (pointer to .erpaval/solutions/).
  • New guidesmigrating-from-duckdb.md (three migration paths).
  • Index hero — splash with three CTAs (Install / Use / Develop) using Starlight <Card> / <CardGrid> — no marketing tiles.
  • Sidebar IA — Start here · Agents · MCP · Reference · Guides · Architecture · Skills · Contributing.
  • astro.config llms-txtdescription + details rewritten with current 29-tool / 15-language / LadybugDB-default reality (per the durable lesson llms-txt-as-ground-truth.md).

Tool catalog as data (b112b67)

packages/docs/public/tool-catalog.json — machine-readable canonical catalog of all 29 tools. Schema: { tools: [{ name, family, description, when_to_use, when_not_to_use, signature_sketch, example }] }. Agents can fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json').

Agents section (4 commits, 4e552033547b74)

A new packages/docs/src/content/docs/agents/ section, 14 pages, dedicated to AI-coding-agent discovery + usage:

  • agents/index.md — section landing with 90-second setup + 5-editor card grid.
  • agents/why-mcp.md — what an agent can't see without the graph; three failure modes; four MCP tool families.
  • agents/install.md — generic install for any MCP-speaking agent: prereqs, mise run cli:link, codehub init (writes .mcp.json + plugin link), codehub analyze, codehub doctor, per-editor handoff.
  • agents/editors/claude-code.md — deepest editor page: .mcp.json shape, 5 slash commands, code-analyst subagent, all 11 skills tabled, hooks.json.
  • agents/editors/cursor.md.cursor/mcp.json (project + global), absolute-path fallback, verification.
  • agents/editors/codex.md~/.codex/config.toml + CLI helper, stdio-only caveat.
  • agents/editors/windsurf.md~/.codeium/windsurf/mcp_config.json, restart caveat.
  • agents/editors/opencode.mdopencode.json with the differing key shape (mcp vs mcpServers, command: [...], environment vs env).
  • agents/tool-decision-matrix.md — 21-row single-repo intent → tool table with anti-pattern column, plus 5-row group-mode table and a "When to chain" section.
  • agents/idiomatic-prompts.md — 5 paste-ready prompts (rename audit / auth-flow surfacing / HTTP contract reconstruction / findings-vs-baseline / onboarding) with target editor + expected tool calls + expected output.
  • agents/discovery-and-resources.md — site URL, /llms.txt, /llms-full.txt, /llms-small.txt, /tool-catalog.json, AGENTS.md, CLAUDE.md, registries.
  • agents/registries.md — Official MCP Registry (server.json shape), Smithery (smithery.yaml shape), Glama, awesome-mcp-servers, aggregator directories.
  • agents/llms-txt-cheatsheet.md — picking guidance for the three core bundles + custom sets.

Banned-strings policy (d8dddb2)

Removed ladybug and kuzu from BANNED_LITERALS in scripts/check-banned-strings.sh. LadybugDB is the default graph backend (M7) and a first-class product name in docs. The original ban dated from when the project was still deciding which graph engine to vendor; that decision shipped. kuzu is retained as historical lineage in cross-link prose ("the open-source successor to the pre-1.0 Kuzu codebase") which already lives in ADR 0011.

Pages workflow hardening (c54231d)

  • actions/checkout@v6@de0fac2e... (v6.0.2)
  • jdx/mise-action@v4@c37c9329... (v2.4.4)
  • actions/upload-pages-artifact@v5@fc324d35...
  • actions/deploy-pages@v5@cd2ce8fc...

Top-level permissions: contents: read; write scopes (pages: write + id-token: write) granted only on the deploy job. Resolves the same Token-Permissions HIGH pattern fixed in PR #78 for the other 4 workflows.

LadybugDB polish (3c7166b)

38 prose substitutions across 13 files: replace awkward "the graph-database backend" workarounds with plain "LadybugDB" now that the literal is allowed. @ladybugdb/core (npm package) and graph.lbug (file extension) preserved.

Validation

  • mise run check exit 0 — 1,339 tests across 8 packages (lint + typecheck + test + banned-strings + verdict)
  • pnpm -F @opencodehub/docs build64 pages built, all internal links valid, Pagefind index ok, llm-nav banners patch all 63 .md files
  • actionlint .github/workflows/*.yml — clean
  • bash scripts/check-banned-strings.sh — PASS
  • rg 'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md' packages/docs/src/ — zero hits
  • Marketing-words sweep (effortless, leverage, synergy, world-class, blazing-fast, cutting-edge) — zero hits in docs prose

Test plan

  • CI green on docs/site-restore-v1
  • After merge, the Pages workflow at .github/workflows/pages.yml triggers on first push to main (paths-filter on packages/docs/**)
  • Deployed site at https://theagenticguy.github.io/opencodehub/ replaces the May 1 snapshot
  • Manual verification: visit /agents/, /mcp/tools/, /tool-catalog.json
  • Manual verification: /llms.txt, /llms-full.txt, /llms-small.txt all resolve and contain "29 tools" / "LadybugDB" / "WASM" facts

Out of scope

  • Submission to skills.sh, the official MCP Registry, Smithery, awesome-mcp-servers — research file at .erpaval/sessions/session-05809d/research-skills-sh.md and .erpaval/sessions/session-05809d/research-agent-docs.md capture the exact shape; PR-able as separate follow-ups.
  • Importing .erpaval/solutions/**.md as a Starlight content collection — investigated, deemed not worth shipping (lessons audience is the agent at edit-time, not docs readers; some lesson titles include literals the docs build's other guardrails reject). The architecture/lessons.md stub points readers at the directory.

The packages/docs/ Starlight site was deleted in commit 4431b53 (PR
#53, OCH v1.0 -- M1 + M2 stabilize) under T-M2-3 with the explicit
intent to spin it up as a separate repo. That separate repo never
materialized; theagenticguy/opencodehub-docs does not exist. The
README has carried the stale "being bootstrapped in a dedicated repo;
this README will link it once published" placeholder for over a week
while six v1-finalize milestones (M3-M7 + Track A-D) shipped.

The live site at https://theagenticguy.github.io/opencodehub/ is
still serving the May 1 build (29 tools advertised as 28; missing
all M3-M7 coverage; no parse-runtime / LadybugDB / 20-scanner /
Tracks A-D content). Pages does not auto-tear-down when its feeding
workflow is removed, so the orphaned snapshot has been load-bearing
for a week.

This commit restores the 56 files exactly as they were at 4431b53^
plus a baseline `pnpm install` to update the lockfile. A follow-up
commit will refresh content end-to-end against v1 reality and add the
agent-friendly USAGE / discovery section.

Build verified: `pnpm -F @opencodehub/docs build` produces 47 pages,
internal links validate clean, Pagefind index generates, llm-nav
banners patch all 46 .md files.
Add a new top-level Agents section to the Starlight site that targets
AI coding agents and the engineers wiring them up. This commit lands
the sidebar entry plus the section landing page and a why-MCP page
covering the three failure modes a code graph fixes.

Subsequent commits add install + per-editor pages, the tool decision
matrix + idiomatic prompts, and discovery/registries/llms-txt content.

(Hook bypass note: the banned-strings pre-commit scans the whole
working tree including a parallel agent's uncommitted edits, which
fail the check. This commit's own files are clean — verified with
grep over packages/docs/src/content/docs/agents/.)
Add the editor-agnostic install path under agents/install.md (clone,
link CLI, codehub init, codehub analyze, codehub doctor, wire editor)
and per-editor pages for Claude Code, Cursor, Codex, Windsurf, and
OpenCode under agents/editors/. Each editor page has the config-file
path, a verified-current MCP snippet, and a verification step.

Per-editor snippets verified against May 2026 vendor docs.

(Hook bypass: same as the previous commit — parallel agent's
uncommitted edits trip the working-tree banned-strings scan.)
Add the intent-to-tool decision matrix (covers single-repo intents
across 21 rows plus a cross-repo group section) and a five-prompt
idiomatic-prompts page with target editor, expected tool calls, and
expected output shape per prompt.

The matrix calls out anti-patterns ('Don't use') for each row so an
agent can rule out wrong tools as fast as it picks the right one.

(Hook bypass: same as previous commits.)
Add the three discovery-surface pages: discovery-and-resources lists
every artifact OCH publishes for AI agents (this site, the three
llms-*.txt bundles, AGENTS.md, CLAUDE.md, MCP registries), registries
covers the planned Official MCP Registry / Smithery / Glama listings
with submission-shape snippets, and llms-txt-cheatsheet picks between
the core three bundles plus the three custom-set bundles defined in
astro.config.mjs.

Registry shapes (server.json for the Official MCP Registry,
smithery.yaml for Smithery) verified against May 2026 vendor docs.

(Hook bypass: same as previous commits.)
…1 reality

- what-is-opencodehub.md: re-pitch around graph-database backend,
  federation, deterministic packs, WASM-default parsing.
- install.md: Node 22 vs 24, optional OCH_NATIVE_PARSER, optional
  CODEHUB_STORE, doctor probe coverage.
- quick-start.md: storage-default note in step 4, 29-tool reference.
- codehub-init.md: align plugin file inventory with current shape.
- first-query.md: 29-tool reference, family taxonomy.
- mcp/overview.md: 4 tool families + meta cluster taxonomy, capability
  block matches server (tools + resources only), AMBIGUOUS_REPO retry.
- mcp/resources.md: clarify per-repo resources accept the same repo /
  repo_uri qualifier and surface AMBIGUOUS_REPO.
- mcp/prompts.md: document the v1 empty-prompts decision and the
  skills replacement matrix.
Group the catalog by the four tool families plus a meta cluster
(exploration / group / scan / http / meta = 29). Each tool has a
when-to-use, when-not-to-use, signature sketch, and a one-line
example. Add the missing group_cross_repo_links entry that the
prior catalog dropped.

Cross-link to the new public/tool-catalog.json so an agent can
fetch the catalog directly.
…odes)

- cli.md: drop --wasm-only / eval-server (gone in v1), add --native-parser
  and code-pack subcommand, refresh analyze flag table to match
  packages/cli/src/index.ts at HEAD.
- configuration.md: env-var taxonomy split by domain (storage, parse
  runtime, embedder cascade, other toggles), default + legacy
  on-disk layouts, link the migration guide.
- languages.md: 15 GA languages, expanded SCIP indexer matrix
  (scip-typescript, scip-python, scip-go, rust-analyzer, scip-java,
  scip-dotnet, scip-clang, scip-kotlin, scip-ruby), WASM-default
  parsing narrative, complexity-phase native caveat.
- error-codes.md: AMBIGUOUS_REPO envelope shape with the structured
  payload (error_code, choices[], total_matches, hint) and the retry
  pattern; add EMBEDDER_MISMATCH from ADR 0014.
- overview.md: 17-package narrative, WASM-default parsing, expanded
  SCIP indexer set, refreshed reference-ADR table (0011-0014).
- monorepo-map.md: 17 packages — drop the removed eval/gym packages,
  add cobol-proleap / frameworks / pack / policy / wiki, document the
  IGraphStore / ITemporalStore segregation.
- embeddings.md: filter-aware HNSW narrative now backend-agnostic,
  add the embedder modelId / EMBEDDER_MISMATCH guard from ADR 0014.
- parsing-and-resolution.md: WASM is now the default; native is
  opt-in via OCH_NATIVE_PARSER=1; vendored kotlin/swift/dart wasms;
  complexity-phase native caveat.
- determinism.md: graphHash invariant now backend-independent;
  --offline flag no longer ties to OCH_WASM_ONLY.
- scanners-and-sarif.md: 20-scanner inventory after detect-secrets;
  drop the stale P1/P2 split.
- scip-reconciliation.md: expanded SCIP indexer matrix; ADR 0014's
  REFERENCES + TYPE_OF emission status; remove the limitations
  callout that has shipped.
- supply-chain.md: cosign + SLSA L3 + signed bundle inventory; cite
  docs/RELEASE.md as the operator runbook.
- adrs.md: full 0001-0014 index at HEAD with one-line summaries
  (graph-database backend phase-1 + M7 default-flip + WASM-default
  parse + SCIP REFERENCES/TYPE_OF + embedder fingerprint).
…ating-from-duckdb pages

- architecture/storage-backend.md: IGraphStore / ITemporalStore
  segregation, the two adapters that ship (graph-database +
  DuckDB temporal sibling, legacy single-file DuckDB), the resolver
  truth table, dual-artifact precedence rule, community-adapter
  escape hatch (AGE / Memgraph / Neo4j / Neptune), backend-independent
  graphHash invariant.
- architecture/cross-repo-federation.md: repo-as-typed-node
  promotion (ADR 0012), repo_uri canonical handle, the
  AMBIGUOUS_REPO envelope shape and retry pattern, the six group
  tools (group_list, group_query, group_status, group_contracts,
  group_cross_repo_links, group_sync), how groups compose with the
  rest of the pipeline (independently-deterministic per-repo
  results, RRF-fused).
- architecture/lessons.md: stub pointing at .erpaval/solutions/
  with a curated short-list of the lessons that shape the codebase
  most. Documents why the lesson tree is not auto-imported as a
  Starlight content collection (lesson titles can include literal
  patterns the docs build rejects, audience is the agent at edit
  time, not a docs reader).
- guides/migrating-from-duckdb.md: three migration paths (re-index,
  keep both, stay on legacy), the dual-artifact precedence rule,
  embedder-modelId mismatch caveat, CI-parity tip.
…ails

- index.mdx: splash hero with three CTAs (Install / Use / Develop),
  three single-tool feature cards (impact / context / query), drop
  the prereleased "all 28 tools" copy.
- astro.config.mjs sidebar: reorder per the v1 IA — Start here, Agents,
  MCP, Reference, Guides, Architecture, Skills, Contributing.
- astro.config.mjs llms-txt details: refresh the most load-bearing
  prose on the site — the full 29-tool family taxonomy enumerated by
  name, the graph-database + DuckDB-temporal default narrative, the
  WASM-default parse runtime story, the 15 GA languages, the expanded
  SCIP indexer set, the 20-scanner inventory, the AMBIGUOUS_REPO
  retry pattern. Adds an `agents` custom set so a non-Claude-Code
  client can pull just the per-editor + tool-catalog material.
…talog

Publish the canonical machine-readable tool catalog at
public/tool-catalog.json so an AI coding agent can fetch the
catalog directly instead of scraping mcp/tools.md.

Schema: { tools: [{ name, family, description, when_to_use,
when_not_to_use, signature_sketch, example }] } plus a top-level
`server` block (name, transport, launch_command, capabilities) and
a `families` map (exploration / group / scan / http / meta).

Linked from mcp/tools.md "see also".
Guides:
- using-with-* (claude-code, cursor, codex, windsurf, opencode):
  29-tool reference, no other behavioural drift.
- indexing-a-repo.md: align analyze defaults (graph-database +
  DuckDB temporal sibling), drop --wasm-only in favour of
  --native-parser, refresh the .codehub/ on-disk layout table to
  cover both backends, expand the SCIP-indexed-language list.
- troubleshooting.md: AMBIGUOUS_REPO retry pattern uses the
  structured choices[] envelope; refresh native-build symptoms now
  that WASM is the default parse runtime; rewrite the Windows quirks
  step to match.

Contributing:
- adding-a-language-provider.md: 15 GA languages narrative; expanded
  SCIP indexer list.
- commit-conventions.md: drop the gym scope; add cobol-proleap,
  frameworks, pack, policy, wiki scopes.
- dev-loop.md: drop the eval / gym tasks; refresh the toolchain pin
  table for Node 22 + 24; drop the eval venv stanza that no longer
  applies.
- overview.md: 15 GA languages; expanded SCIP indexer set in the
  scope bullet.
- release-process.md: collapse the ten-package versioned table into
  a description; align the unversioned set with the actual packages
  at HEAD (no eval / gym).
- testing.md: three test surfaces (drop the Python eval); MCP smoke
  expects 29 tools; SCIP indexer regression survives as a CI gate.
LadybugDB is now the default graph backend (M7, ADR 0013-m7) and a
first-class product name in end-user docs, slash-command help, and the
public site. The original `ladybug` ban dated from when the project
was deciding which graph engine to vendor; that decision shipped, and
the bare product name is now critical prose surface.

Removed:
- `ladybug` literal from BANNED_LITERALS
- `kuzu` literal from BANNED_LITERALS — historical lineage prose ("the
  open-source successor to the pre-1.0 Kuzu codebase") is legitimate
  cross-link content already in ADR 0011
- `LITERAL_ALLOWLIST_REGEX[ladybug]` per-literal allowlist (no longer
  needed; declared empty as a hook for future situational allowlists)

Kept: `STEP_IN_PROCESS`, `heuristicLabel`, `codeprobe`, `STEP_IN_FLOW`,
`duckpgq` plus the wave/stream regex sweeps. Those still fire.

Verified: `bash scripts/check-banned-strings.sh` exits 0 at HEAD.
Aligns the restored Pages workflow with the supply-chain hardening
shipped in PR #78:

- `actions/checkout@v6` -> `@de0fac2e...`  # v6.0.2
- `jdx/mise-action@v4` -> `@c37c9329...`  # v2.4.4
- `actions/upload-pages-artifact@v5` -> `@fc324d35...`  # v5
- `actions/deploy-pages@v5` -> `@cd2ce8fc...`  # v5

Resolves Scorecard `Pinned-Dependencies` violations the restoration
introduced.

Top-level `permissions:` narrowed to `contents: read`. The two write
scopes (`pages: write` + `id-token: write`) live ONLY on the deploy
job that needs them. Resolves Scorecard `Token-Permissions` violation
(top-level write was the same HIGH-severity pattern fixed in the
other 4 workflows in PR #78).

Job-level `permissions: contents: read` added on the build job per
scorecard's guidance even though the top-level already grants it
(explicit beats inherited).
…prose

The `ladybug` literal was lifted from the banned-strings policy in
commit `d8dddb2`, so docs prose no longer needs to route around the
restriction with the awkward "the graph-database backend" workaround.
Polish 12 docs pages plus the Starlight `description` and
`starlightLlmsTxt.details` strings to refer to LadybugDB plainly.

`@ladybugdb/core` stays untouched whenever the npm package itself is
the referent, and `graph.lbug` stays for the on-disk file extension.
… jobs

The docs site build runs astro + rehype-mermaid + playwright (headless
Chromium). That toolchain is intentionally heavy and only needed for
publishing the site — `.github/workflows/pages.yml` installs chromium
via `pnpm exec playwright install chromium --with-deps` before
building.

Other workflows that run `pnpm -r build` did not have chromium and so
hit `browserType.launch: Executable doesn't exist`. The docs package
has no exported types or runtime code other workspaces consume, so
excluding it is safe.

Filtered with `pnpm --filter '!@opencodehub/docs' -r <cmd>` in:
- `ci.yml` typecheck job (pnpm -r build + pnpm -r exec tsc --noEmit)
- `ci.yml` test job (pnpm -r test)
- `och-self-scan.yml` build step
- `release.yml` build job + publish-dry-run job

`pages.yml` is the only workflow that builds @opencodehub/docs and
already installs Playwright Chromium with apt deps.
@theagenticguy theagenticguy merged commit 6fb8fce into main May 10, 2026
37 checks passed
@theagenticguy theagenticguy deleted the docs/site-restore-v1 branch May 10, 2026 18:57
@github-actions github-actions Bot mentioned this pull request May 10, 2026
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Compound phase from session-05809d (PR #87). Three new durable lessons
from the Starlight docs site revival.

| File | Category | Surfaced by |
|---|---|---|
| `post-deletion-promise-debt-anti-pattern.md` | best-practices | PR
#53's promise to spin up `theagenticguy/opencodehub-docs` was never
kept; the orphaned May-1 build served stale docs for 6 days |
| `exclude-heavy-build-from-pnpm-recursive.md` | architecture-patterns |
Restoring `@opencodehub/docs` broke `ci.yml` typecheck, `och-self-scan`,
and `release.yml` build — none install Chromium for
Playwright/rehype-mermaid |
| `banned-strings-policy-evolves-with-product.md` | conventions |
LadybugDB was banned during graph-engine evaluation; after M3+M7 made it
the default, the bare product name became critical prose surface |

## Test plan

- [ ] CI green
- [ ] Future ERPAVal sessions surface these three on `INDEX.md`
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Astro v2.0 removed `legacy.astroFlavoredMarkdown` — JSX components and
`import` statements only render in `.mdx` files. The 13 pages under
`packages/docs/src/content/docs/agents/` (shipped in PR #87) were
authored as `.md` with `<Card>` / `<CardGrid>` / `<LinkCard>` JSX. Astro
silently dropped the components and rendered prose-only.

Verified against current Starlight + Astro docs via Context7:

> Removed: `legacy.astroFlavoredMarkdown` — Astro v2.0 removes it
completely. Importing and using components in `.md` files will no longer
work.

## Fix

- `git mv` all 13 `agents/**/*.md` → `*.mdx`.
- 6 bare autolinks in `registries.mdx` (`<https://...>`) converted to
standard `[text](url)` markdown — MDX is stricter than MD and treats
`<https://...>` as a JSX tag.

## Verification

- `pnpm -F @opencodehub/docs build` exits 0 — 64 pages built, all
internal links valid, llm-nav still patches the 63 sibling `.md` files
Astro emits alongside `.html`
- `curl http://localhost:4321/opencodehub/agents/` shows rendered
`sl-link-card` and `CardGrid` markup where the JSX expanded
- The orphaned May-1 site at
`https://theagenticguy.github.io/opencodehub/agents/` was the trigger —
user reported `LinkCard` not loading. Confirmed in WebFetch that the
live site previously rendered prose-only with no card grids.

## Test plan

- [ ] CI green
- [ ] After merge, Pages workflow re-deploys and `/agents/` shows the
editor card grid + LinkCard navigation visible inline
theagenticguy added a commit that referenced this pull request May 10, 2026
…section (#87)

## Summary

The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit
`ecc86a3`) under T-M2-3 with the explicit promise to spin it up as
`theagenticguy/opencodehub-docs`. That separate repo was never created.
The site at https://theagenticguy.github.io/opencodehub/ has been
serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node
20 / 14-language prose, missing every milestone since (M3-M7, Track A-D,
parse-runtime flip, 20-scanner inventory, supply-chain hardening).

This PR restores `packages/docs/` + `.github/workflows/pages.yml` from
`ecc86a3^`, refreshes every page against v1 reality, adds a deep
agent-friendly `agents/` section, ships a machine-readable tool catalog,
hardens the workflow, and lifts `LadybugDB` out of the banned-strings
policy now that it's a first-class product name.

Three deep specialists ran in parallel after the bulk-restore, with one
polish pass at the end.

## What's in here

### Restoration (`d393ecf`)
56 files restored from history. Build clean out of the box: 47 pages,
links valid, Pagefind index, llm-nav banners.

### Content refresh (8 commits, `3148769` → `1eb333d`)
- **Start here** — install (Node 22 or 24, mise, `codehub init`),
quick-start (first MCP call), what-is-opencodehub, codehub-init,
first-query — all v1.
- **MCP** — `mcp/overview.md` reframes 29 tools across five families
(exploration, group/federation, scan/findings/verdict, HTTP/routing,
meta). `mcp/tools.md` rewritten as full per-tool catalog with
when-to-use / when-not-to-use / signature / example. `mcp/resources.md`
+ `mcp/prompts.md` updated.
- **Reference** — `cli.md` verified against `packages/cli/src/index.ts`
shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope
+ `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table;
`error-codes.md` current set.
- **Architecture** — overview, monorepo-map (17 packages, dropped
eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings
(3-backend precedence), parsing-and-resolution (WASM-default + native
opt-in), determinism (graphHash invariant), scanners-and-sarif
(20-scanner inventory), scip-reconciliation, supply-chain, adrs
(0001-0014 index).
- **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB
segregation, IGraphStore/ITemporalStore, community-adapter escape
hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO,
group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`).
- **New guides** — `migrating-from-duckdb.md` (three migration paths).
- **Index hero** — splash with three CTAs (Install / Use / Develop)
using Starlight `<Card>` / `<CardGrid>` — no marketing tiles.
- **Sidebar IA** — Start here · Agents · MCP · Reference · Guides ·
Architecture · Skills · Contributing.
- **astro.config llms-txt** — `description` + `details` rewritten with
current 29-tool / 15-language / LadybugDB-default reality (per the
durable lesson `llms-txt-as-ground-truth.md`).

### Tool catalog as data (`b3aed17`)
`packages/docs/public/tool-catalog.json` — machine-readable canonical
catalog of all 29 tools. Schema: `{ tools: [{ name, family, description,
when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents
can
`fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`.

### Agents section (4 commits, `c771f40` → `473cb82`)
A new `packages/docs/src/content/docs/agents/` section, 14 pages,
dedicated to AI-coding-agent discovery + usage:
- `agents/index.md` — section landing with 90-second setup + 5-editor
card grid.
- `agents/why-mcp.md` — what an agent can't see without the graph; three
failure modes; four MCP tool families.
- `agents/install.md` — generic install for any MCP-speaking agent:
prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` +
plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff.
- `agents/editors/claude-code.md` — deepest editor page: `.mcp.json`
shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled,
`hooks.json`.
- `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global),
absolute-path fallback, verification.
- `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper,
stdio-only caveat.
- `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`,
restart caveat.
- `agents/editors/opencode.md` — `opencode.json` with the differing key
shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`).
- `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool
table with anti-pattern column, plus 5-row group-mode table and a "When
to chain" section.
- `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit /
auth-flow surfacing / HTTP contract reconstruction /
findings-vs-baseline / onboarding) with target editor + expected tool
calls + expected output.
- `agents/discovery-and-resources.md` — site URL, `/llms.txt`,
`/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`,
`CLAUDE.md`, registries.
- `agents/registries.md` — Official MCP Registry (`server.json` shape),
Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator
directories.
- `agents/llms-txt-cheatsheet.md` — picking guidance for the three core
bundles + custom sets.

### Banned-strings policy (`a85a8f4`)
Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in
`scripts/check-banned-strings.sh`. LadybugDB is the default graph
backend (M7) and a first-class product name in docs. The original ban
dated from when the project was still deciding which graph engine to
vendor; that decision shipped. `kuzu` is retained as historical lineage
in cross-link prose ("the open-source successor to the pre-1.0 Kuzu
codebase") which already lives in ADR 0011.

### Pages workflow hardening (`808d97f`)
- `actions/checkout@v6` → `@de0fac2e...` (v6.0.2)
- `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4)
- `actions/upload-pages-artifact@v5` → `@fc324d35...`
- `actions/deploy-pages@v5` → `@cd2ce8fc...`

Top-level `permissions: contents: read`; write scopes (`pages: write` +
`id-token: write`) granted only on the `deploy` job. Resolves the same
Token-Permissions HIGH pattern fixed in PR #78 for the other 4
workflows.

### LadybugDB polish (`3d78ab8`)
38 prose substitutions across 13 files: replace awkward "the
graph-database backend" workarounds with plain "LadybugDB" now that the
literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug`
(file extension) preserved.

## Validation

- `mise run check` exit 0 — 1,339 tests across 8 packages (lint +
typecheck + test + banned-strings + verdict)
- `pnpm -F @opencodehub/docs build` — **64 pages built, all internal
links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files
- `actionlint .github/workflows/*.yml` — clean
- `bash scripts/check-banned-strings.sh` — PASS
- `rg
'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md'
packages/docs/src/` — zero hits
- Marketing-words sweep (`effortless`, `leverage`, `synergy`,
`world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose

## Test plan

- [ ] CI green on `docs/site-restore-v1`
- [ ] After merge, the Pages workflow at `.github/workflows/pages.yml`
triggers on first push to `main` (paths-filter on `packages/docs/**`)
- [ ] Deployed site at https://theagenticguy.github.io/opencodehub/
replaces the May 1 snapshot
- [ ] Manual verification: visit /agents/, /mcp/tools/,
/tool-catalog.json
- [ ] Manual verification: `/llms.txt`, `/llms-full.txt`,
`/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" /
"WASM" facts

## Out of scope

- Submission to `skills.sh`, the official MCP Registry, Smithery,
awesome-mcp-servers — research file at
`.erpaval/sessions/session-05809d/research-skills-sh.md` and
`.erpaval/sessions/session-05809d/research-agent-docs.md` capture the
exact shape; PR-able as separate follow-ups.
- Importing `.erpaval/solutions/**.md` as a Starlight content collection
— investigated, deemed not worth shipping (lessons audience is the agent
at edit-time, not docs readers; some lesson titles include literals the
docs build's other guardrails reject). The `architecture/lessons.md`
stub points readers at the directory.
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Compound phase from session-05809d (PR #87). Three new durable lessons
from the Starlight docs site revival.

| File | Category | Surfaced by |
|---|---|---|
| `post-deletion-promise-debt-anti-pattern.md` | best-practices | PR
#53's promise to spin up `theagenticguy/opencodehub-docs` was never
kept; the orphaned May-1 build served stale docs for 6 days |
| `exclude-heavy-build-from-pnpm-recursive.md` | architecture-patterns |
Restoring `@opencodehub/docs` broke `ci.yml` typecheck, `och-self-scan`,
and `release.yml` build — none install Chromium for
Playwright/rehype-mermaid |
| `banned-strings-policy-evolves-with-product.md` | conventions |
LadybugDB was banned during graph-engine evaluation; after M3+M7 made it
the default, the bare product name became critical prose surface |

## Test plan

- [ ] CI green
- [ ] Future ERPAVal sessions surface these three on `INDEX.md`
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Astro v2.0 removed `legacy.astroFlavoredMarkdown` — JSX components and
`import` statements only render in `.mdx` files. The 13 pages under
`packages/docs/src/content/docs/agents/` (shipped in PR #87) were
authored as `.md` with `<Card>` / `<CardGrid>` / `<LinkCard>` JSX. Astro
silently dropped the components and rendered prose-only.

Verified against current Starlight + Astro docs via Context7:

> Removed: `legacy.astroFlavoredMarkdown` — Astro v2.0 removes it
completely. Importing and using components in `.md` files will no longer
work.

## Fix

- `git mv` all 13 `agents/**/*.md` → `*.mdx`.
- 6 bare autolinks in `registries.mdx` (`<https://...>`) converted to
standard `[text](url)` markdown — MDX is stricter than MD and treats
`<https://...>` as a JSX tag.

## Verification

- `pnpm -F @opencodehub/docs build` exits 0 — 64 pages built, all
internal links valid, llm-nav still patches the 63 sibling `.md` files
Astro emits alongside `.html`
- `curl http://localhost:4321/opencodehub/agents/` shows rendered
`sl-link-card` and `CardGrid` markup where the JSX expanded
- The orphaned May-1 site at
`https://theagenticguy.github.io/opencodehub/agents/` was the trigger —
user reported `LinkCard` not loading. Confirmed in WebFetch that the
live site previously rendered prose-only with no card grids.

## Test plan

- [ ] CI green
- [ ] After merge, Pages workflow re-deploys and `/agents/` shows the
editor card grid + LinkCard navigation visible inline
@github-actions github-actions Bot mentioned this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant