chore(repo): scrub ERPAVal spec coordinates from source#74
Merged
Conversation
5 tasks
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary V1-launch readiness sweep: cherry-picks three known-good upstream bug fixes from the post-filter testbed, closes two residual smoke gaps, and deeply refreshes the v1 docs against current reality. ### Bug fixes (5 of 7 from UPSTREAM_BUGS.md) | Severity | Bug | Fix | |---|---|---| | HIGH (data corruption) | #2 — `codehub scan <path>` ingested SARIF into operator's CWD instead of the scanned repo | `c43c5aa fix(cli): scan ingests SARIF into the scanned repo, not CWD` | | HIGH (CI gate) | #3 — `scripts/smoke-mcp.sh` asserted EXPECTED_TOOLS=19; server registers 29 | `433f684 fix(repo): smoke-mcp asserts 29 tools, matching the v1.0 server` | | HIGH (CI dashboard) | #4 — `codehub bench` surfaced 9 of 17 acceptance gates (some titles also stale) | `c5f9047 fix(cli): bench dashboard surfaces all 17 acceptance gates` | | MEDIUM | #1 + #6 — `codehub doctor` false-WARN on tree-sitter / @duckdb / @LadybugDB under pnpm strict isolation; `duckdb close()` undefined on `@duckdb/node-api@1.x` | `c218c31 fix(cli): doctor resolves native bindings from owner workspaces` | | LOW (test hygiene) | #7 — `http-embedder.test.ts` cases failed when `CODEHUB_EMBEDDING_*` env was set in operator's shell | `317bdf1 fix(embedder): isolate http-embedder tests from operator env` | Bug #5 (testbed-only pytest-timeout) does not apply upstream. Bug fixes #1+#6, #2, #3 are direct cherry-picks of `def988b`, `6924b1b`, `ec66d4a` from the post-filter sibling — every changed file:line coordinate verified to match upstream HEAD before pick. ### Spec-coordinate hygiene - `fad766f` — scrub `AC-A-7` / `AC-A-10` from `scripts/m7-parity-audit.sh` header (per the durable lesson; scripts are not ADRs). - `e186aea` — restore ADR-permanent spec coordinates in `docs/adr/0013-m7-default-flip-and-abstraction.md` and `docs/adr/0014-scip-references-and-embedder-fingerprint.md` after an earlier docs-sweep commit over-scrubbed them. Per PR #74's carve-out, ADR text is the explicit place where coordinates ARE allowed. Final sweep: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero hits. ### Docs refresh - `898192e` — README: status flipped from "v0.1.0 initial public release" to "v1 — feature-complete on M1–M7" (the prerelease caveat stays since `package.json` is still `0.1.x`); 28 → 29 MCP tools across the mermaid diagram, table heading, and mcp-package row; new "Parse runtime — WASM default" section cross-linking ADR `0013-parse-runtime-wasm-default.md`; Repository Layout regenerated against `ls packages/` (now 17 packages — adds `cobol-proleap`, `frameworks`, `pack`, `policy`, `wiki`; drops `eval` and `gym` with a sibling-testbed note); 14 → 15 GA languages (COBOL via regex provider); requirements bumped to Node 22-or-24; tool table expanded to enumerate the cross-repo federation tools and `pack_codebase`. - `69eac8f` — ADR 0011 `Proposed → Accepted`; ADR 0013-m7 `Proposed → Accepted`; sibling-ADR cross-link banner on the duplicate-0013 collision (`0013-parse-runtime-wasm-default.md` and `0013-m7-default-flip-and-abstraction.md` both landed concurrently); ADR 0014 References block swapped from `.erpaval/specs/...` (gitignored, will rot once packet graduates) to durable code-path citations. - `edb362e` — CHANGELOG `[Unreleased]` entry summarizing this PR; AGENTS.md 28 → 29 tools and a divergence banner where it intentionally drops session-local coordinates that CLAUDE.md still carries; OBJECTIVES.md tool count + language count + sibling-testbed note. ## Validation - `pnpm install --frozen-lockfile` ✅ - `mise run check` (lint + typecheck + test + banned-strings + verdict) ✅ - `pnpm -F @opencodehub/cli test` — **236/236** pass (was 235; +1 from the new `[SKIP]` parsing case in `bench.test.ts`) - `pnpm -F @opencodehub/embedder test` — 79 pass / 0 fail / 1 skipped - `bash scripts/smoke-mcp.sh` — **PASS (29 tools listed)** - `node packages/cli/dist/index.js doctor` — `tree-sitter native binding: OK`, `duckdb native binding: OK`, `graph-db native binding: FAIL` (real opt-in build status — the `@ladybugdb/core` binding is not installed on this dev box, which is what `doctor` is supposed to surface; the false-WARN this PR fixes is gone) - `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` — zero hits ## Test plan - [ ] CI green on `chore/v1-upstream-bug-sweep` - [ ] `codehub doctor` reports OK on tree-sitter + duckdb in CI matrix (Node 22 + Node 24) - [ ] `codehub scan /tmp/<fixture>` ingests into `<fixture>` not CWD (manual verification on a downstream repo) - [ ] `codehub bench` table now renders all 17 rows, none stuck on "skipped — script crashed" - [ ] License audit / banned-strings / commitlint stay green ## Out of scope - Bug #5 (testbed-only pytest-timeout). Listed for reference in UPSTREAM_BUGS.md; does not affect upstream.
3 tasks
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary Compound phase from session-6c091d (PR #76). Four new durable lessons extracted from the v1 upstream bug sweep, plus a clarification of the existing leakage lesson's sweep scope. ### New lessons | File | Category | Surfaced by | |---|---|---| | `cherry-pick-from-sibling-testbed.md` | best-practices | Whole campaign — fetched the post-filter sibling, picked 3 fix commits directly | | `bench-dashboard-acceptance-script-parity.md` | architecture-patterns | Bug #4 — dashboard parsed banners by exact-string match; 9-of-17 gates rendered | | `test-env-hermeticity-for-backend-precedence.md` | conventions | Bug #7 — `CODEHUB_EMBEDDING_*` precedence chain leaked from operator's shell | | `parallel-docs-subagent-overscrubs-adrs.md` | best-practices | The docs subagent stripped AC-* from `docs/adr/0013-m7` and `0014` despite PR #74's ADR carve-out — required a follow-up restore commit | ### Updated - `no-spec-coordinate-leakage-into-source.md` — added a "Sweep scope is `packages/` and `scripts/`, NOT `docs/adr/*`" rule that names PR #74's carve-out, so future subagents reading the lesson see the constraint without PR archaeology. - `INDEX.md` — pointers for the four new lessons. ## Test plan - [ ] CI green on `chore/v1-compound-lessons` - [ ] No spec-coordinate leakage in source: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero hits. - [ ] Future ERPAVal sessions that load `INDEX.md` at session start surface these four lessons.
theagenticguy
pushed a commit
that referenced
this pull request
May 10, 2026
## Summary
Standalone scrub PR called for by the durable lesson at
`.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md`.
ERPAVal session-local prefixes (`AC-*`, `S-*`, `W-*`, `E-*`, `T-*`,
`CL-*`,
`SUM-*`, `DOC-*`) plus references to `architecture-revised.md` and
`.erpaval/sessions/` / `.erpaval/specs/` paths leaked into production
source,
JSDoc, CLI flag help, MCP tool option descriptions, and test names. The
spec packets that name those coordinates are gitignored, so once the
packet graduates the source citations rot — and LLM clients pick the
leakage up and start citing it back.
This PR replaces every leaked coordinate with the underlying invariant
or
behavior the comment / test / JSDoc actually documents. **147 files**
covering every workspace package plus plugin SKILL.md files, the
determinism-contract reference doc, and shell-level acceptance scripts.
Two pairs of source/test runtime strings updated in lockstep:
- `generatePack` production-store error in `pack/src/index.ts` ↔
`pack/src/index.test.ts`.
- COBOL fixture author line in `hello.cbl` ↔ inline test fixture in
`cobol-regex.test.ts` (`T-M4-5` → `INGESTION-FIXTURE`, still valid
COBOL syntax).
ADR text and `docs/adr/*` files retain coordinates where they cite the
permanent decision rationale; \`P0[1-9]\` packet IDs stay since they're
documented in ADRs.
## Test plan
- [x] \`rg\` for ERPAVal coordinate patterns across \`packages/\`,
\`plugins/\`, \`scripts/\` — zero hits.
- [x] \`mise run lint\` (biome) — clean.
- [x] \`pnpm -r exec tsc --noEmit\` — clean.
- [x] \`pnpm -r test\` — all 1438 tests pass across every workspace
package.
- [x] \`bash scripts/check-banned-strings.sh\` — PASS.
Co-authored-by: bonk-ai[bot] <269762587+bonk-ai[bot]@users.noreply.github.com>
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary V1-launch readiness sweep: cherry-picks three known-good upstream bug fixes from the post-filter testbed, closes two residual smoke gaps, and deeply refreshes the v1 docs against current reality. ### Bug fixes (5 of 7 from UPSTREAM_BUGS.md) | Severity | Bug | Fix | |---|---|---| | HIGH (data corruption) | #2 — `codehub scan <path>` ingested SARIF into operator's CWD instead of the scanned repo | `c43c5aa fix(cli): scan ingests SARIF into the scanned repo, not CWD` | | HIGH (CI gate) | #3 — `scripts/smoke-mcp.sh` asserted EXPECTED_TOOLS=19; server registers 29 | `433f684 fix(repo): smoke-mcp asserts 29 tools, matching the v1.0 server` | | HIGH (CI dashboard) | #4 — `codehub bench` surfaced 9 of 17 acceptance gates (some titles also stale) | `c5f9047 fix(cli): bench dashboard surfaces all 17 acceptance gates` | | MEDIUM | #1 + #6 — `codehub doctor` false-WARN on tree-sitter / @duckdb / @LadybugDB under pnpm strict isolation; `duckdb close()` undefined on `@duckdb/node-api@1.x` | `c218c31 fix(cli): doctor resolves native bindings from owner workspaces` | | LOW (test hygiene) | #7 — `http-embedder.test.ts` cases failed when `CODEHUB_EMBEDDING_*` env was set in operator's shell | `317bdf1 fix(embedder): isolate http-embedder tests from operator env` | Bug #5 (testbed-only pytest-timeout) does not apply upstream. Bug fixes #1+#6, #2, #3 are direct cherry-picks of `def988b`, `6924b1b`, `ec66d4a` from the post-filter sibling — every changed file:line coordinate verified to match upstream HEAD before pick. ### Spec-coordinate hygiene - `fad766f` — scrub `AC-A-7` / `AC-A-10` from `scripts/m7-parity-audit.sh` header (per the durable lesson; scripts are not ADRs). - `e186aea` — restore ADR-permanent spec coordinates in `docs/adr/0013-m7-default-flip-and-abstraction.md` and `docs/adr/0014-scip-references-and-embedder-fingerprint.md` after an earlier docs-sweep commit over-scrubbed them. Per PR #74's carve-out, ADR text is the explicit place where coordinates ARE allowed. Final sweep: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero hits. ### Docs refresh - `898192e` — README: status flipped from "v0.1.0 initial public release" to "v1 — feature-complete on M1–M7" (the prerelease caveat stays since `package.json` is still `0.1.x`); 28 → 29 MCP tools across the mermaid diagram, table heading, and mcp-package row; new "Parse runtime — WASM default" section cross-linking ADR `0013-parse-runtime-wasm-default.md`; Repository Layout regenerated against `ls packages/` (now 17 packages — adds `cobol-proleap`, `frameworks`, `pack`, `policy`, `wiki`; drops `eval` and `gym` with a sibling-testbed note); 14 → 15 GA languages (COBOL via regex provider); requirements bumped to Node 22-or-24; tool table expanded to enumerate the cross-repo federation tools and `pack_codebase`. - `69eac8f` — ADR 0011 `Proposed → Accepted`; ADR 0013-m7 `Proposed → Accepted`; sibling-ADR cross-link banner on the duplicate-0013 collision (`0013-parse-runtime-wasm-default.md` and `0013-m7-default-flip-and-abstraction.md` both landed concurrently); ADR 0014 References block swapped from `.erpaval/specs/...` (gitignored, will rot once packet graduates) to durable code-path citations. - `edb362e` — CHANGELOG `[Unreleased]` entry summarizing this PR; AGENTS.md 28 → 29 tools and a divergence banner where it intentionally drops session-local coordinates that CLAUDE.md still carries; OBJECTIVES.md tool count + language count + sibling-testbed note. ## Validation - `pnpm install --frozen-lockfile` ✅ - `mise run check` (lint + typecheck + test + banned-strings + verdict) ✅ - `pnpm -F @opencodehub/cli test` — **236/236** pass (was 235; +1 from the new `[SKIP]` parsing case in `bench.test.ts`) - `pnpm -F @opencodehub/embedder test` — 79 pass / 0 fail / 1 skipped - `bash scripts/smoke-mcp.sh` — **PASS (29 tools listed)** - `node packages/cli/dist/index.js doctor` — `tree-sitter native binding: OK`, `duckdb native binding: OK`, `graph-db native binding: FAIL` (real opt-in build status — the `@ladybugdb/core` binding is not installed on this dev box, which is what `doctor` is supposed to surface; the false-WARN this PR fixes is gone) - `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` — zero hits ## Test plan - [ ] CI green on `chore/v1-upstream-bug-sweep` - [ ] `codehub doctor` reports OK on tree-sitter + duckdb in CI matrix (Node 22 + Node 24) - [ ] `codehub scan /tmp/<fixture>` ingests into `<fixture>` not CWD (manual verification on a downstream repo) - [ ] `codehub bench` table now renders all 17 rows, none stuck on "skipped — script crashed" - [ ] License audit / banned-strings / commitlint stay green ## Out of scope - Bug #5 (testbed-only pytest-timeout). Listed for reference in UPSTREAM_BUGS.md; does not affect upstream.
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary Compound phase from session-6c091d (PR #76). Four new durable lessons extracted from the v1 upstream bug sweep, plus a clarification of the existing leakage lesson's sweep scope. ### New lessons | File | Category | Surfaced by | |---|---|---| | `cherry-pick-from-sibling-testbed.md` | best-practices | Whole campaign — fetched the post-filter sibling, picked 3 fix commits directly | | `bench-dashboard-acceptance-script-parity.md` | architecture-patterns | Bug #4 — dashboard parsed banners by exact-string match; 9-of-17 gates rendered | | `test-env-hermeticity-for-backend-precedence.md` | conventions | Bug #7 — `CODEHUB_EMBEDDING_*` precedence chain leaked from operator's shell | | `parallel-docs-subagent-overscrubs-adrs.md` | best-practices | The docs subagent stripped AC-* from `docs/adr/0013-m7` and `0014` despite PR #74's ADR carve-out — required a follow-up restore commit | ### Updated - `no-spec-coordinate-leakage-into-source.md` — added a "Sweep scope is `packages/` and `scripts/`, NOT `docs/adr/*`" rule that names PR #74's carve-out, so future subagents reading the lesson see the constraint without PR archaeology. - `INDEX.md` — pointers for the four new lessons. ## Test plan - [ ] CI green on `chore/v1-compound-lessons` - [ ] No spec-coordinate leakage in source: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero hits. - [ ] Future ERPAVal sessions that load `INDEX.md` at session start surface these four lessons.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Standalone scrub PR called for by the durable lesson at
.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md.ERPAVal session-local prefixes (
AC-*,S-*,W-*,E-*,T-*,CL-*,SUM-*,DOC-*) plus references toarchitecture-revised.mdand.erpaval/sessions//.erpaval/specs/paths leaked into production source,JSDoc, CLI flag help, MCP tool option descriptions, and test names. The
spec packets that name those coordinates are gitignored, so once the
packet graduates the source citations rot — and LLM clients pick the
leakage up and start citing it back.
This PR replaces every leaked coordinate with the underlying invariant or
behavior the comment / test / JSDoc actually documents. 147 files
covering every workspace package plus plugin SKILL.md files, the
determinism-contract reference doc, and shell-level acceptance scripts.
Two pairs of source/test runtime strings updated in lockstep:
generatePackproduction-store error inpack/src/index.ts↔pack/src/index.test.ts.hello.cbl↔ inline test fixture incobol-regex.test.ts(T-M4-5→INGESTION-FIXTURE, still validCOBOL syntax).
ADR text and
docs/adr/*files retain coordinates where they cite thepermanent decision rationale; `P0[1-9]` packet IDs stay since they're
documented in ADRs.
Test plan
`plugins/`, `scripts/` — zero hits.