From 5b81e782630a97ee47d5421a5895d85c6f15a99d Mon Sep 17 00:00:00 2001 From: Laith Al-Saadoon Date: Sun, 10 May 2026 19:01:03 +0000 Subject: [PATCH] =?UTF-8?q?docs(repo):=20compound=20=E2=80=94=20durable=20?= =?UTF-8?q?lessons=20from=20docs=20site=20revival?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three new durable lessons extracted from session-05809d (PR #87): 1. post-deletion-promise-debt-anti-pattern (best-practices) — deleting an in-tree asset with the explicit intent to recreate elsewhere creates load-bearing orphans. The recreation almost never happens without a draft follow-up PR; the deleted artifact's last build keeps serving (Pages doesn't auto-tear-down) and silently rots. Surfaced when the May-1 docs snapshot served stale prose for 6 days after PR #53 deleted packages/docs/ with the promise to spin up theagenticguy/opencodehub-docs that never happened. 2. exclude-heavy-build-from-pnpm-recursive (architecture-patterns) — packages whose build pulls in heavy tooling (Playwright, browser binaries, model weights) should be filtered out of `pnpm -r build/test` in workflows that don't own that build. Surfaced when restoring @opencodehub/docs broke ci.yml typecheck + och-self-scan + release.yml build because none of them install Chromium. 3. banned-strings-policy-evolves-with-product (conventions) — a banned-string allowlist that was correct during decision-making becomes wrong once the decision ships. LadybugDB was banned during graph-engine evaluation; after M3+M7 made it the default backend, the bare product name became critical prose surface. Re-audit the ban list at every release. INDEX.md updated with the three new entries. --- .erpaval/INDEX.md | 3 + ...exclude-heavy-build-from-pnpm-recursive.md | 73 +++++++++++++++++++ ...post-deletion-promise-debt-anti-pattern.md | 58 +++++++++++++++ ...ned-strings-policy-evolves-with-product.md | 64 ++++++++++++++++ 4 files changed, 198 insertions(+) create mode 100644 .erpaval/solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md create mode 100644 .erpaval/solutions/best-practices/post-deletion-promise-debt-anti-pattern.md create mode 100644 .erpaval/solutions/conventions/banned-strings-policy-evolves-with-product.md diff --git a/.erpaval/INDEX.md b/.erpaval/INDEX.md index a98e481..a5be4b8 100644 --- a/.erpaval/INDEX.md +++ b/.erpaval/INDEX.md @@ -38,6 +38,9 @@ development sessions. Solutions are reusable; specs are per-feature. - [Bench dashboard ↔ acceptance script banner-text parity](solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md) — when a dashboard parses banners by exact-string match, the two artifacts must be edited together; add a roster-shape test that pulls the banner list from the script directly. Surfaced 9-of-17 gates rendering by Bug #4 in 2026-05-10 smoke campaign. - [Test env hermeticity for backend-precedence libraries](solutions/conventions/test-env-hermeticity-for-backend-precedence.md) — when an SDK picks a backend by env presence, tests must scope-stash every key in the chain via prefix glob, not just the one they assert on. Per Bug #7 in 2026-05-10 smoke: `CODEHUB_EMBEDDING_*` chain. - [Parallel docs subagents over-scrub ADR coordinates](solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md) — PR #74's carve-out for ADR text isn't visible in the durable lesson; brief docs subagents explicitly that `docs/adr/*` retains spec coordinates. +- [Post-deletion-promise debt creates load-bearing orphans](solutions/best-practices/post-deletion-promise-debt-anti-pattern.md) — when a milestone-PR deletes an in-tree asset with intent to recreate elsewhere, the recreation almost never happens; the deleted artifact's last build keeps serving and silently rots. PR #53 deleted `packages/docs/`; the orphaned May-1 Pages snapshot served stale prose for 6 days until PR #87 restored. +- [Exclude heavy-build packages from pnpm-recursive in non-owner workflows](solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md) — packages whose build pulls in Playwright / browser binaries / native model weights should be filtered out of `pnpm -r build/test` in workflows that don't own that build. Use `pnpm --filter '!@scope/heavy' -r `. +- [Banned-strings policy evolves with the product](solutions/conventions/banned-strings-policy-evolves-with-product.md) — a banned literal that worked during decision-making becomes a barrier when the decision ships and the banned name becomes the official product term. Re-evaluate per release; remove literals that became the product. ## Specs diff --git a/.erpaval/solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md b/.erpaval/solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md new file mode 100644 index 0000000..0c0e117 --- /dev/null +++ b/.erpaval/solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md @@ -0,0 +1,73 @@ +--- +name: "Exclude heavy-build packages (docs, e2e fixtures) from pnpm-recursive build/test in non-owner workflows" +description: When a workspace contains a package whose build pulls in heavy tooling (Playwright, headless Chromium, browser binaries), exclude it from `pnpm -r build/test` in workflows that don't own that build — use the dedicated workflow's --filter scope. +type: architecture-patterns +--- + +The `@opencodehub/docs` Starlight package's `build` script runs +`astro build` which invokes `rehype-mermaid` which boots Playwright + +headless Chromium to render mermaid fences to inline SVG. That +toolchain is intentionally heavy and only required when the docs site +is actually being published. + +`pages.yml` installs Chromium with apt deps: +```yaml +- run: pnpm --filter @opencodehub/docs exec playwright install chromium --with-deps +``` + +Other workflows (CI typecheck, OCH self-scan, release build, publish +dry-run) ran `pnpm -r build` which iterated INTO the docs package +with no Chromium installed. Failure mode: + +``` +browserType.launch: Executable doesn't exist at +/home/runner/.cache/ms-playwright/chromium_headless_shell-1217/... +``` + +The error was confusing because the docs build worked LOCALLY (dev had +Chromium cached) and the docs build worked in `pages.yml` (which +explicitly installs it). The non-docs workflows hit the gap. + +**Why:** pnpm's recursive flag `pnpm -r ` defaults to running the +command in every workspace package. Most consumers want this for +"build all the TS, run all the tests" style cross-package checks. But +it's the wrong default for cross-package tasks that don't actually +need the heavy package's output. + +**How to apply:** + +For every workflow other than the docs publish workflow: + +```yaml +- run: pnpm --filter '!@opencodehub/docs' -r build +- run: pnpm --filter '!@opencodehub/docs' -r exec tsc --noEmit +- run: pnpm --filter '!@opencodehub/docs' -r test +``` + +The `!` syntax negates a pnpm filter. Combined with `-r`, this +applies the command to every workspace EXCEPT the named one. Document +the why in a comment on the line so a future maintainer doesn't undo +it: + +```yaml +# Skip @opencodehub/docs — its build runs astro + rehype-mermaid + +# playwright (headless Chromium dep) and is exercised on the +# dedicated `pages.yml` workflow with --with-deps installed. +run: pnpm --filter '!@opencodehub/docs' -r build +``` + +This pattern generalizes to any "heavy package" that satisfies all of: +- The package builds successfully on its own dedicated workflow. +- The package exports nothing other workspaces consume (no shared + types, no runtime imports). +- The package's build pulls in browsers, simulators, native binaries, + large model weights, or any other heavy artifact. + +If the package DOES export shared types, narrow the exclusion: build +the types-only entry (`pnpm -r exec tsc --noEmit` works on every +package's `tsconfig.json`, including a hypothetical +`tsconfig.types-only.json`) without invoking the full `build` script. + +OCH applied the pattern to: `ci.yml` typecheck + test, `release.yml` +build + publish-dry-run, `och-self-scan.yml` build. `pages.yml` is the +sole owner of the docs build. diff --git a/.erpaval/solutions/best-practices/post-deletion-promise-debt-anti-pattern.md b/.erpaval/solutions/best-practices/post-deletion-promise-debt-anti-pattern.md new file mode 100644 index 0000000..dc59091 --- /dev/null +++ b/.erpaval/solutions/best-practices/post-deletion-promise-debt-anti-pattern.md @@ -0,0 +1,58 @@ +--- +name: "Post-deletion-promise debt: deleting an asset with a promise to spin it up elsewhere creates load-bearing orphans" +description: When a milestone-PR deletes an in-tree asset (docs site, package, fixture) with the explicit intent to recreate it elsewhere, the recreation almost never happens. The deleted asset's last build keeps serving and silently rots. +type: best-practices +--- + +OCH PR #53 (commit `4431b53`) deleted `packages/docs/` Starlight site +under T-M2-3 with the body line: + +> `packages/docs` removed from monorepo (moved to separate repo; +> `docs/adr/` at root stays) + +`theagenticguy/opencodehub-docs` was never created. Six milestones +shipped without docs coverage. The site at +`https://theagenticguy.github.io/opencodehub/` was still serving the +May 1 build — Pages does not auto-tear-down when its feeding workflow +is removed; the last successful deploy keeps serving until something +overwrites it. End users hitting the old URL got six-week-old prose +("28 tools", "DuckDB-default", "Node 20+"). Caching at search engines +amplified the problem. + +**Why:** PR deletion of an asset feels like cleanup; the user perceives +"that's done." The "we'll recreate in a separate repo" promise lives +in the PR body, which gets archived, never re-encountered. The +recreation is an unscheduled task with no owner. The deleted-but-still- +serving artifact is invisible in the working tree — `git log` shows +the deletion is final, but the URL is alive. + +**How to apply:** + +- **If you're going to move an asset out, make the recreation a + blocking task on the same PR or its immediate follow-up.** Open the + follow-up PR (even as a draft) before merging the deletion. A + draft-PR is harder to forget than a comment in a PR body. +- **If the asset has a public surface (URL, npm, registry listing), + redirect or sunset it explicitly.** For Pages: the deletion PR + should also empty the published artifact (`echo "Moved to ..." > + index.html` or push a redirect-only build). For npm: `npm deprecate`. + For registries: file a removal/migration ticket. +- **Add a "promised" field to the deletion PR's checklist.** "We + promised to recreate `` at `` by ``." The + date forces a follow-up. +- **For docs specifically: prefer in-monorepo over "separate repo" + unless you have a specific reason.** A docs site that lives in + `packages/docs/` or `apps/docs/` shares CI, lockfile, license sweep, + Dependabot, banned-strings, and PR review with the code it documents. + A separate repo is its own project to maintain — typically only + worth it for white-label or cross-product docs. + +When you do encounter a post-deletion-promise debt situation: restore +from `git show ^ -- ` is the fastest path. The +working scaffold is in history; only the content needs refreshing +against current reality. + +Pattern observed: 2026-05-04 deletion (PR #53) → 2026-05-10 restoration +(PR #87). Six days of silent rot. Restoration via `git checkout +4431b53^ -- packages/docs/ .github/workflows/pages.yml` recovered 56 +files in under a second. diff --git a/.erpaval/solutions/conventions/banned-strings-policy-evolves-with-product.md b/.erpaval/solutions/conventions/banned-strings-policy-evolves-with-product.md new file mode 100644 index 0000000..cc7a733 --- /dev/null +++ b/.erpaval/solutions/conventions/banned-strings-policy-evolves-with-product.md @@ -0,0 +1,64 @@ +--- +name: "Banned-strings policy must evolve when a banned literal becomes the product" +description: A banned-string allowlist that worked during decision-making becomes a barrier when the decision ships and the banned name becomes the official product term. Re-evaluate per release; remove literals when they cease to be prior-art references. +type: conventions +--- + +OCH's `scripts/check-banned-strings.sh` originally banned `ladybug` to +prevent prior-art prose from leaking in while the team evaluated graph +backends. After M3 (graph-DB phase 1) and M7 (default-flip), LadybugDB +became the actual default backend. The banned-strings policy still +held: `LadybugDB` in prose was a CI fail. + +The workaround agents reached for: write `the graph-database backend` +or `@ladybugdb/core` (the package, allowlisted) instead of the bare +product name. This produced awkward prose: + +> The default backend is the graph-database backend for the graph +> half + DuckDB for temporal. + +vs the polished form after the policy update: + +> The default backend is LadybugDB for the graph half + DuckDB for +> temporal. + +The policy was correctly aggressive when the team was deciding what to +vendor. It was incorrectly aggressive once the decision shipped and +the product name needed to be plain in end-user docs. + +**Why:** "Banned literal" is an aggressive guardrail. It assumes the +literal is never legitimate. That assumption holds during clean-room +evaluation (you don't want to inadvertently copy from a product +you're studying). It breaks once you adopt the product — the same +literal IS your product term. + +**How to apply:** + +- **At every release, audit `scripts/check-banned-strings.sh`** against + the actual product. Has any banned literal become the product? If + yes, remove it (and its allowlist regex if any). +- **Don't conflate "rejected as prior art" with "permanently banned."** + Decision history goes in ADRs (`docs/adr/`), not in a CI guardrail. + Once a vendoring decision is made, the literal's status flips from + "watch out" to "use plainly." +- **If a literal must remain banned but is occasionally legitimate, + prefer per-path exclusions over per-literal allowlists.** Already + done in OCH for `docs/adr/` (path-excluded — ADRs document + history). The path-exclusion is more robust than the + literal-allowlist because it honors the editorial role of the file. +- **Update the script's comment block** when removing a banned + literal. Future maintainers should read why a literal was once + banned and why it isn't anymore. + +In OCH, the same audit removed `kuzu` (legitimate as a historical +lineage citation: "LadybugDB is the open-source successor to the +pre-1.0 Kuzu codebase") and noted in the script comment that +`@ladybugdb/...` (npm package form) and `lbug` (env-var/file-extension +form) need no allowlist because they don't appear in the banned set +anymore. + +Counter-example: `STEP_IN_PROCESS`, `heuristicLabel`, `codeprobe`, +`STEP_IN_FLOW`, `duckpgq` are still banned because OCH does NOT use +those — they ARE prior-art references the project deliberately +doesn't copy from. The policy only removes literals that became the +product.