Skip to content

chore(repo): scrub ERPAVal spec coordinates from source#74

Merged
theagenticguy merged 1 commit into
mainfrom
chore/scrub-spec-coordinates
May 9, 2026
Merged

chore(repo): scrub ERPAVal spec coordinates from source#74
theagenticguy merged 1 commit into
mainfrom
chore/scrub-spec-coordinates

Conversation

@bonk-ai
Copy link
Copy Markdown
Contributor

@bonk-ai bonk-ai Bot commented May 9, 2026

Summary

Standalone scrub PR called for by the durable lesson at
.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md.

ERPAVal session-local prefixes (AC-*, S-*, W-*, E-*, T-*, CL-*,
SUM-*, DOC-*) plus references to architecture-revised.md and
.erpaval/sessions/ / .erpaval/specs/ paths leaked into production source,
JSDoc, CLI flag help, MCP tool option descriptions, and test names. The
spec packets that name those coordinates are gitignored, so once the
packet graduates the source citations rot — and LLM clients pick the
leakage up and start citing it back.

This PR replaces every leaked coordinate with the underlying invariant or
behavior the comment / test / JSDoc actually documents. 147 files
covering every workspace package plus plugin SKILL.md files, the
determinism-contract reference doc, and shell-level acceptance scripts.

Two pairs of source/test runtime strings updated in lockstep:

  • generatePack production-store error in pack/src/index.ts
    pack/src/index.test.ts.
  • COBOL fixture author line in hello.cbl ↔ inline test fixture in
    cobol-regex.test.ts (T-M4-5INGESTION-FIXTURE, still valid
    COBOL syntax).

ADR text and docs/adr/* files retain coordinates where they cite the
permanent decision rationale; `P0[1-9]` packet IDs stay since they're
documented in ADRs.

Test plan

  • `rg` for ERPAVal coordinate patterns across `packages/`,
    `plugins/`, `scripts/` — zero hits.
  • `mise run lint` (biome) — clean.
  • `pnpm -r exec tsc --noEmit` — clean.
  • `pnpm -r test` — all 1438 tests pass across every workspace package.
  • `bash scripts/check-banned-strings.sh` — PASS.

@theagenticguy theagenticguy merged commit f09d804 into main May 9, 2026
17 checks passed
@theagenticguy theagenticguy deleted the chore/scrub-spec-coordinates branch May 9, 2026 23:40
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

V1-launch readiness sweep: cherry-picks three known-good upstream bug
fixes from the post-filter testbed, closes two residual smoke gaps, and
deeply refreshes the v1 docs against current reality.

### Bug fixes (5 of 7 from UPSTREAM_BUGS.md)

| Severity | Bug | Fix |
|---|---|---|
| HIGH (data corruption) | #2 — `codehub scan <path>` ingested SARIF
into operator's CWD instead of the scanned repo | `c43c5aa fix(cli):
scan ingests SARIF into the scanned repo, not CWD` |
| HIGH (CI gate) | #3 — `scripts/smoke-mcp.sh` asserted
EXPECTED_TOOLS=19; server registers 29 | `433f684 fix(repo): smoke-mcp
asserts 29 tools, matching the v1.0 server` |
| HIGH (CI dashboard) | #4 — `codehub bench` surfaced 9 of 17 acceptance
gates (some titles also stale) | `c5f9047 fix(cli): bench dashboard
surfaces all 17 acceptance gates` |
| MEDIUM | #1 + #6 — `codehub doctor` false-WARN on tree-sitter /
@duckdb / @LadybugDB under pnpm strict isolation; `duckdb close()`
undefined on `@duckdb/node-api@1.x` | `c218c31 fix(cli): doctor resolves
native bindings from owner workspaces` |
| LOW (test hygiene) | #7 — `http-embedder.test.ts` cases failed when
`CODEHUB_EMBEDDING_*` env was set in operator's shell | `317bdf1
fix(embedder): isolate http-embedder tests from operator env` |

Bug #5 (testbed-only pytest-timeout) does not apply upstream. Bug fixes
#1+#6, #2, #3 are direct cherry-picks of `def988b`, `6924b1b`, `ec66d4a`
from the post-filter sibling — every changed file:line coordinate
verified to match upstream HEAD before pick.

### Spec-coordinate hygiene
- `fad766f` — scrub `AC-A-7` / `AC-A-10` from
`scripts/m7-parity-audit.sh` header (per the durable lesson; scripts are
not ADRs).
- `e186aea` — restore ADR-permanent spec coordinates in
`docs/adr/0013-m7-default-flip-and-abstraction.md` and
`docs/adr/0014-scip-references-and-embedder-fingerprint.md` after an
earlier docs-sweep commit over-scrubbed them. Per PR #74's carve-out,
ADR text is the explicit place where coordinates ARE allowed.

Final sweep: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero
hits.

### Docs refresh
- `898192e` — README: status flipped from "v0.1.0 initial public
release" to "v1 — feature-complete on M1–M7" (the prerelease caveat
stays since `package.json` is still `0.1.x`); 28 → 29 MCP tools across
the mermaid diagram, table heading, and mcp-package row; new "Parse
runtime — WASM default" section cross-linking ADR
`0013-parse-runtime-wasm-default.md`; Repository Layout regenerated
against `ls packages/` (now 17 packages — adds `cobol-proleap`,
`frameworks`, `pack`, `policy`, `wiki`; drops `eval` and `gym` with a
sibling-testbed note); 14 → 15 GA languages (COBOL via regex provider);
requirements bumped to Node 22-or-24; tool table expanded to enumerate
the cross-repo federation tools and `pack_codebase`.
- `69eac8f` — ADR 0011 `Proposed → Accepted`; ADR 0013-m7 `Proposed →
Accepted`; sibling-ADR cross-link banner on the duplicate-0013 collision
(`0013-parse-runtime-wasm-default.md` and
`0013-m7-default-flip-and-abstraction.md` both landed concurrently); ADR
0014 References block swapped from `.erpaval/specs/...` (gitignored,
will rot once packet graduates) to durable code-path citations.
- `edb362e` — CHANGELOG `[Unreleased]` entry summarizing this PR;
AGENTS.md 28 → 29 tools and a divergence banner where it intentionally
drops session-local coordinates that CLAUDE.md still carries;
OBJECTIVES.md tool count + language count + sibling-testbed note.

## Validation

- `pnpm install --frozen-lockfile` ✅
- `mise run check` (lint + typecheck + test + banned-strings + verdict)
✅
- `pnpm -F @opencodehub/cli test` — **236/236** pass (was 235; +1 from
the new `[SKIP]` parsing case in `bench.test.ts`)
- `pnpm -F @opencodehub/embedder test` — 79 pass / 0 fail / 1 skipped
- `bash scripts/smoke-mcp.sh` — **PASS (29 tools listed)**
- `node packages/cli/dist/index.js doctor` — `tree-sitter native
binding: OK`, `duckdb native binding: OK`, `graph-db native binding:
FAIL` (real opt-in build status — the `@ladybugdb/core` binding is not
installed on this dev box, which is what `doctor` is supposed to
surface; the false-WARN this PR fixes is gone)
- `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` — zero hits

## Test plan

- [ ] CI green on `chore/v1-upstream-bug-sweep`
- [ ] `codehub doctor` reports OK on tree-sitter + duckdb in CI matrix
(Node 22 + Node 24)
- [ ] `codehub scan /tmp/<fixture>` ingests into `<fixture>` not CWD
(manual verification on a downstream repo)
- [ ] `codehub bench` table now renders all 17 rows, none stuck on
"skipped — script crashed"
- [ ] License audit / banned-strings / commitlint stay green

## Out of scope

- Bug #5 (testbed-only pytest-timeout). Listed for reference in
UPSTREAM_BUGS.md; does not affect upstream.
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Compound phase from session-6c091d (PR #76). Four new durable lessons
extracted from the v1 upstream bug sweep, plus a clarification of the
existing leakage lesson's sweep scope.

### New lessons

| File | Category | Surfaced by |
|---|---|---|
| `cherry-pick-from-sibling-testbed.md` | best-practices | Whole
campaign — fetched the post-filter sibling, picked 3 fix commits
directly |
| `bench-dashboard-acceptance-script-parity.md` | architecture-patterns
| Bug #4 — dashboard parsed banners by exact-string match; 9-of-17 gates
rendered |
| `test-env-hermeticity-for-backend-precedence.md` | conventions | Bug
#7 — `CODEHUB_EMBEDDING_*` precedence chain leaked from operator's shell
|
| `parallel-docs-subagent-overscrubs-adrs.md` | best-practices | The
docs subagent stripped AC-* from `docs/adr/0013-m7` and `0014` despite
PR #74's ADR carve-out — required a follow-up restore commit |

### Updated

- `no-spec-coordinate-leakage-into-source.md` — added a "Sweep scope is
`packages/` and `scripts/`, NOT `docs/adr/*`" rule that names PR #74's
carve-out, so future subagents reading the lesson see the constraint
without PR archaeology.
- `INDEX.md` — pointers for the four new lessons.

## Test plan

- [ ] CI green on `chore/v1-compound-lessons`
- [ ] No spec-coordinate leakage in source: `rg -n 'AC-[A-Z]-[0-9]'
packages/ scripts/` returns zero hits.
- [ ] Future ERPAVal sessions that load `INDEX.md` at session start
surface these four lessons.
theagenticguy pushed a commit that referenced this pull request May 10, 2026
## Summary

Standalone scrub PR called for by the durable lesson at

`.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md`.

ERPAVal session-local prefixes (`AC-*`, `S-*`, `W-*`, `E-*`, `T-*`,
`CL-*`,
`SUM-*`, `DOC-*`) plus references to `architecture-revised.md` and
`.erpaval/sessions/` / `.erpaval/specs/` paths leaked into production
source,
JSDoc, CLI flag help, MCP tool option descriptions, and test names. The
spec packets that name those coordinates are gitignored, so once the
packet graduates the source citations rot — and LLM clients pick the
leakage up and start citing it back.

This PR replaces every leaked coordinate with the underlying invariant
or
behavior the comment / test / JSDoc actually documents. **147 files**
covering every workspace package plus plugin SKILL.md files, the
determinism-contract reference doc, and shell-level acceptance scripts.

Two pairs of source/test runtime strings updated in lockstep:
- `generatePack` production-store error in `pack/src/index.ts` ↔
  `pack/src/index.test.ts`.
- COBOL fixture author line in `hello.cbl` ↔ inline test fixture in
  `cobol-regex.test.ts` (`T-M4-5` → `INGESTION-FIXTURE`, still valid
  COBOL syntax).

ADR text and `docs/adr/*` files retain coordinates where they cite the
permanent decision rationale; \`P0[1-9]\` packet IDs stay since they're
documented in ADRs.

## Test plan

- [x] \`rg\` for ERPAVal coordinate patterns across \`packages/\`,
      \`plugins/\`, \`scripts/\` — zero hits.
- [x] \`mise run lint\` (biome) — clean.
- [x] \`pnpm -r exec tsc --noEmit\` — clean.
- [x] \`pnpm -r test\` — all 1438 tests pass across every workspace
package.
- [x] \`bash scripts/check-banned-strings.sh\` — PASS.

Co-authored-by: bonk-ai[bot] <269762587+bonk-ai[bot]@users.noreply.github.com>
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

V1-launch readiness sweep: cherry-picks three known-good upstream bug
fixes from the post-filter testbed, closes two residual smoke gaps, and
deeply refreshes the v1 docs against current reality.

### Bug fixes (5 of 7 from UPSTREAM_BUGS.md)

| Severity | Bug | Fix |
|---|---|---|
| HIGH (data corruption) | #2 — `codehub scan <path>` ingested SARIF
into operator's CWD instead of the scanned repo | `c43c5aa fix(cli):
scan ingests SARIF into the scanned repo, not CWD` |
| HIGH (CI gate) | #3 — `scripts/smoke-mcp.sh` asserted
EXPECTED_TOOLS=19; server registers 29 | `433f684 fix(repo): smoke-mcp
asserts 29 tools, matching the v1.0 server` |
| HIGH (CI dashboard) | #4 — `codehub bench` surfaced 9 of 17 acceptance
gates (some titles also stale) | `c5f9047 fix(cli): bench dashboard
surfaces all 17 acceptance gates` |
| MEDIUM | #1 + #6 — `codehub doctor` false-WARN on tree-sitter /
@duckdb / @LadybugDB under pnpm strict isolation; `duckdb close()`
undefined on `@duckdb/node-api@1.x` | `c218c31 fix(cli): doctor resolves
native bindings from owner workspaces` |
| LOW (test hygiene) | #7 — `http-embedder.test.ts` cases failed when
`CODEHUB_EMBEDDING_*` env was set in operator's shell | `317bdf1
fix(embedder): isolate http-embedder tests from operator env` |

Bug #5 (testbed-only pytest-timeout) does not apply upstream. Bug fixes
#1+#6, #2, #3 are direct cherry-picks of `def988b`, `6924b1b`, `ec66d4a`
from the post-filter sibling — every changed file:line coordinate
verified to match upstream HEAD before pick.

### Spec-coordinate hygiene
- `fad766f` — scrub `AC-A-7` / `AC-A-10` from
`scripts/m7-parity-audit.sh` header (per the durable lesson; scripts are
not ADRs).
- `e186aea` — restore ADR-permanent spec coordinates in
`docs/adr/0013-m7-default-flip-and-abstraction.md` and
`docs/adr/0014-scip-references-and-embedder-fingerprint.md` after an
earlier docs-sweep commit over-scrubbed them. Per PR #74's carve-out,
ADR text is the explicit place where coordinates ARE allowed.

Final sweep: `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` returns zero
hits.

### Docs refresh
- `898192e` — README: status flipped from "v0.1.0 initial public
release" to "v1 — feature-complete on M1–M7" (the prerelease caveat
stays since `package.json` is still `0.1.x`); 28 → 29 MCP tools across
the mermaid diagram, table heading, and mcp-package row; new "Parse
runtime — WASM default" section cross-linking ADR
`0013-parse-runtime-wasm-default.md`; Repository Layout regenerated
against `ls packages/` (now 17 packages — adds `cobol-proleap`,
`frameworks`, `pack`, `policy`, `wiki`; drops `eval` and `gym` with a
sibling-testbed note); 14 → 15 GA languages (COBOL via regex provider);
requirements bumped to Node 22-or-24; tool table expanded to enumerate
the cross-repo federation tools and `pack_codebase`.
- `69eac8f` — ADR 0011 `Proposed → Accepted`; ADR 0013-m7 `Proposed →
Accepted`; sibling-ADR cross-link banner on the duplicate-0013 collision
(`0013-parse-runtime-wasm-default.md` and
`0013-m7-default-flip-and-abstraction.md` both landed concurrently); ADR
0014 References block swapped from `.erpaval/specs/...` (gitignored,
will rot once packet graduates) to durable code-path citations.
- `edb362e` — CHANGELOG `[Unreleased]` entry summarizing this PR;
AGENTS.md 28 → 29 tools and a divergence banner where it intentionally
drops session-local coordinates that CLAUDE.md still carries;
OBJECTIVES.md tool count + language count + sibling-testbed note.

## Validation

- `pnpm install --frozen-lockfile` ✅
- `mise run check` (lint + typecheck + test + banned-strings + verdict)
✅
- `pnpm -F @opencodehub/cli test` — **236/236** pass (was 235; +1 from
the new `[SKIP]` parsing case in `bench.test.ts`)
- `pnpm -F @opencodehub/embedder test` — 79 pass / 0 fail / 1 skipped
- `bash scripts/smoke-mcp.sh` — **PASS (29 tools listed)**
- `node packages/cli/dist/index.js doctor` — `tree-sitter native
binding: OK`, `duckdb native binding: OK`, `graph-db native binding:
FAIL` (real opt-in build status — the `@ladybugdb/core` binding is not
installed on this dev box, which is what `doctor` is supposed to
surface; the false-WARN this PR fixes is gone)
- `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` — zero hits

## Test plan

- [ ] CI green on `chore/v1-upstream-bug-sweep`
- [ ] `codehub doctor` reports OK on tree-sitter + duckdb in CI matrix
(Node 22 + Node 24)
- [ ] `codehub scan /tmp/<fixture>` ingests into `<fixture>` not CWD
(manual verification on a downstream repo)
- [ ] `codehub bench` table now renders all 17 rows, none stuck on
"skipped — script crashed"
- [ ] License audit / banned-strings / commitlint stay green

## Out of scope

- Bug #5 (testbed-only pytest-timeout). Listed for reference in
UPSTREAM_BUGS.md; does not affect upstream.
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Compound phase from session-6c091d (PR #76). Four new durable lessons
extracted from the v1 upstream bug sweep, plus a clarification of the
existing leakage lesson's sweep scope.

### New lessons

| File | Category | Surfaced by |
|---|---|---|
| `cherry-pick-from-sibling-testbed.md` | best-practices | Whole
campaign — fetched the post-filter sibling, picked 3 fix commits
directly |
| `bench-dashboard-acceptance-script-parity.md` | architecture-patterns
| Bug #4 — dashboard parsed banners by exact-string match; 9-of-17 gates
rendered |
| `test-env-hermeticity-for-backend-precedence.md` | conventions | Bug
#7 — `CODEHUB_EMBEDDING_*` precedence chain leaked from operator's shell
|
| `parallel-docs-subagent-overscrubs-adrs.md` | best-practices | The
docs subagent stripped AC-* from `docs/adr/0013-m7` and `0014` despite
PR #74's ADR carve-out — required a follow-up restore commit |

### Updated

- `no-spec-coordinate-leakage-into-source.md` — added a "Sweep scope is
`packages/` and `scripts/`, NOT `docs/adr/*`" rule that names PR #74's
carve-out, so future subagents reading the lesson see the constraint
without PR archaeology.
- `INDEX.md` — pointers for the four new lessons.

## Test plan

- [ ] CI green on `chore/v1-compound-lessons`
- [ ] No spec-coordinate leakage in source: `rg -n 'AC-[A-Z]-[0-9]'
packages/ scripts/` returns zero hits.
- [ ] Future ERPAVal sessions that load `INDEX.md` at session start
surface these four lessons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant