Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ development sessions. Solutions are reusable; specs are per-feature.
- [Bench dashboard ↔ acceptance script banner-text parity](solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md) — when a dashboard parses banners by exact-string match, the two artifacts must be edited together; add a roster-shape test that pulls the banner list from the script directly. Surfaced 9-of-17 gates rendering by Bug #4 in 2026-05-10 smoke campaign.
- [Test env hermeticity for backend-precedence libraries](solutions/conventions/test-env-hermeticity-for-backend-precedence.md) — when an SDK picks a backend by env presence, tests must scope-stash every key in the chain via prefix glob, not just the one they assert on. Per Bug #7 in 2026-05-10 smoke: `CODEHUB_EMBEDDING_*` chain.
- [Parallel docs subagents over-scrub ADR coordinates](solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md) — PR #74's carve-out for ADR text isn't visible in the durable lesson; brief docs subagents explicitly that `docs/adr/*` retains spec coordinates.
- [Post-deletion-promise debt creates load-bearing orphans](solutions/best-practices/post-deletion-promise-debt-anti-pattern.md) — when a milestone-PR deletes an in-tree asset with intent to recreate elsewhere, the recreation almost never happens; the deleted artifact's last build keeps serving and silently rots. PR #53 deleted `packages/docs/`; the orphaned May-1 Pages snapshot served stale prose for 6 days until PR #87 restored.
- [Exclude heavy-build packages from pnpm-recursive in non-owner workflows](solutions/architecture-patterns/exclude-heavy-build-from-pnpm-recursive.md) — packages whose build pulls in Playwright / browser binaries / native model weights should be filtered out of `pnpm -r build/test` in workflows that don't own that build. Use `pnpm --filter '!@scope/heavy' -r <cmd>`.
- [Banned-strings policy evolves with the product](solutions/conventions/banned-strings-policy-evolves-with-product.md) — a banned literal that worked during decision-making becomes a barrier when the decision ships and the banned name becomes the official product term. Re-evaluate per release; remove literals that became the product.

## Specs

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
name: "Exclude heavy-build packages (docs, e2e fixtures) from pnpm-recursive build/test in non-owner workflows"
description: When a workspace contains a package whose build pulls in heavy tooling (Playwright, headless Chromium, browser binaries), exclude it from `pnpm -r build/test` in workflows that don't own that build — use the dedicated workflow's --filter scope.
type: architecture-patterns
---

The `@opencodehub/docs` Starlight package's `build` script runs
`astro build` which invokes `rehype-mermaid` which boots Playwright +
headless Chromium to render mermaid fences to inline SVG. That
toolchain is intentionally heavy and only required when the docs site
is actually being published.

`pages.yml` installs Chromium with apt deps:
```yaml
- run: pnpm --filter @opencodehub/docs exec playwright install chromium --with-deps
```

Other workflows (CI typecheck, OCH self-scan, release build, publish
dry-run) ran `pnpm -r build` which iterated INTO the docs package
with no Chromium installed. Failure mode:

```
browserType.launch: Executable doesn't exist at
/home/runner/.cache/ms-playwright/chromium_headless_shell-1217/...
```

The error was confusing because the docs build worked LOCALLY (dev had
Chromium cached) and the docs build worked in `pages.yml` (which
explicitly installs it). The non-docs workflows hit the gap.

**Why:** pnpm's recursive flag `pnpm -r <cmd>` defaults to running the
command in every workspace package. Most consumers want this for
"build all the TS, run all the tests" style cross-package checks. But
it's the wrong default for cross-package tasks that don't actually
need the heavy package's output.

**How to apply:**

For every workflow other than the docs publish workflow:

```yaml
- run: pnpm --filter '!@opencodehub/docs' -r build
- run: pnpm --filter '!@opencodehub/docs' -r exec tsc --noEmit
- run: pnpm --filter '!@opencodehub/docs' -r test
```

The `!<name>` syntax negates a pnpm filter. Combined with `-r`, this
applies the command to every workspace EXCEPT the named one. Document
the why in a comment on the line so a future maintainer doesn't undo
it:

```yaml
# Skip @opencodehub/docs — its build runs astro + rehype-mermaid +
# playwright (headless Chromium dep) and is exercised on the
# dedicated `pages.yml` workflow with --with-deps installed.
run: pnpm --filter '!@opencodehub/docs' -r build
```

This pattern generalizes to any "heavy package" that satisfies all of:
- The package builds successfully on its own dedicated workflow.
- The package exports nothing other workspaces consume (no shared
types, no runtime imports).
- The package's build pulls in browsers, simulators, native binaries,
large model weights, or any other heavy artifact.

If the package DOES export shared types, narrow the exclusion: build
the types-only entry (`pnpm -r exec tsc --noEmit` works on every
package's `tsconfig.json`, including a hypothetical
`tsconfig.types-only.json`) without invoking the full `build` script.

OCH applied the pattern to: `ci.yml` typecheck + test, `release.yml`
build + publish-dry-run, `och-self-scan.yml` build. `pages.yml` is the
sole owner of the docs build.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
name: "Post-deletion-promise debt: deleting an asset with a promise to spin it up elsewhere creates load-bearing orphans"
description: When a milestone-PR deletes an in-tree asset (docs site, package, fixture) with the explicit intent to recreate it elsewhere, the recreation almost never happens. The deleted asset's last build keeps serving and silently rots.
type: best-practices
---

OCH PR #53 (commit `4431b53`) deleted `packages/docs/` Starlight site
under T-M2-3 with the body line:

> `packages/docs` removed from monorepo (moved to separate repo;
> `docs/adr/` at root stays)

`theagenticguy/opencodehub-docs` was never created. Six milestones
shipped without docs coverage. The site at
`https://theagenticguy.github.io/opencodehub/` was still serving the
May 1 build — Pages does not auto-tear-down when its feeding workflow
is removed; the last successful deploy keeps serving until something
overwrites it. End users hitting the old URL got six-week-old prose
("28 tools", "DuckDB-default", "Node 20+"). Caching at search engines
amplified the problem.

**Why:** PR deletion of an asset feels like cleanup; the user perceives
"that's done." The "we'll recreate in a separate repo" promise lives
in the PR body, which gets archived, never re-encountered. The
recreation is an unscheduled task with no owner. The deleted-but-still-
serving artifact is invisible in the working tree — `git log` shows
the deletion is final, but the URL is alive.

**How to apply:**

- **If you're going to move an asset out, make the recreation a
blocking task on the same PR or its immediate follow-up.** Open the
follow-up PR (even as a draft) before merging the deletion. A
draft-PR is harder to forget than a comment in a PR body.
- **If the asset has a public surface (URL, npm, registry listing),
redirect or sunset it explicitly.** For Pages: the deletion PR
should also empty the published artifact (`echo "Moved to ..." >
index.html` or push a redirect-only build). For npm: `npm deprecate`.
For registries: file a removal/migration ticket.
- **Add a "promised" field to the deletion PR's checklist.** "We
promised to recreate `<thing>` at `<location>` by `<date>`." The
date forces a follow-up.
- **For docs specifically: prefer in-monorepo over "separate repo"
unless you have a specific reason.** A docs site that lives in
`packages/docs/` or `apps/docs/` shares CI, lockfile, license sweep,
Dependabot, banned-strings, and PR review with the code it documents.
A separate repo is its own project to maintain — typically only
worth it for white-label or cross-product docs.

When you do encounter a post-deletion-promise debt situation: restore
from `git show <deletion-commit>^ -- <path>` is the fastest path. The
working scaffold is in history; only the content needs refreshing
against current reality.

Pattern observed: 2026-05-04 deletion (PR #53) → 2026-05-10 restoration
(PR #87). Six days of silent rot. Restoration via `git checkout
4431b53^ -- packages/docs/ .github/workflows/pages.yml` recovered 56
files in under a second.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
name: "Banned-strings policy must evolve when a banned literal becomes the product"
description: A banned-string allowlist that worked during decision-making becomes a barrier when the decision ships and the banned name becomes the official product term. Re-evaluate per release; remove literals when they cease to be prior-art references.
type: conventions
---

OCH's `scripts/check-banned-strings.sh` originally banned `ladybug` to
prevent prior-art prose from leaking in while the team evaluated graph
backends. After M3 (graph-DB phase 1) and M7 (default-flip), LadybugDB
became the actual default backend. The banned-strings policy still
held: `LadybugDB` in prose was a CI fail.

The workaround agents reached for: write `the graph-database backend`
or `@ladybugdb/core` (the package, allowlisted) instead of the bare
product name. This produced awkward prose:

> The default backend is the graph-database backend for the graph
> half + DuckDB for temporal.

vs the polished form after the policy update:

> The default backend is LadybugDB for the graph half + DuckDB for
> temporal.

The policy was correctly aggressive when the team was deciding what to
vendor. It was incorrectly aggressive once the decision shipped and
the product name needed to be plain in end-user docs.

**Why:** "Banned literal" is an aggressive guardrail. It assumes the
literal is never legitimate. That assumption holds during clean-room
evaluation (you don't want to inadvertently copy from a product
you're studying). It breaks once you adopt the product — the same
literal IS your product term.

**How to apply:**

- **At every release, audit `scripts/check-banned-strings.sh`** against
the actual product. Has any banned literal become the product? If
yes, remove it (and its allowlist regex if any).
- **Don't conflate "rejected as prior art" with "permanently banned."**
Decision history goes in ADRs (`docs/adr/`), not in a CI guardrail.
Once a vendoring decision is made, the literal's status flips from
"watch out" to "use plainly."
- **If a literal must remain banned but is occasionally legitimate,
prefer per-path exclusions over per-literal allowlists.** Already
done in OCH for `docs/adr/` (path-excluded — ADRs document
history). The path-exclusion is more robust than the
literal-allowlist because it honors the editorial role of the file.
- **Update the script's comment block** when removing a banned
literal. Future maintainers should read why a literal was once
banned and why it isn't anymore.

In OCH, the same audit removed `kuzu` (legitimate as a historical
lineage citation: "LadybugDB is the open-source successor to the
pre-1.0 Kuzu codebase") and noted in the script comment that
`@ladybugdb/...` (npm package form) and `lbug` (env-var/file-extension
form) need no allowlist because they don't appear in the banned set
anymore.

Counter-example: `STEP_IN_PROCESS`, `heuristicLabel`, `codeprobe`,
`STEP_IN_FLOW`, `duckpgq` are still banned because OCH does NOT use
those — they ARE prior-art references the project deliberately
doesn't copy from. The policy only removes literals that became the
product.
Loading