chore: release main#30
Closed
github-actions[bot] wants to merge 79 commits into
Closed
Conversation
Apache-2.0 code intelligence for AI coding agents. Graph-based impact analysis, blast radius, and execution-flow tracing over MCP.
Outsider-review pass capturing: - SPECS.md: EARS-format requirements across ingestion, storage, search, impact analysis, MCP, CLI, and quality/security gates. - USECASE.md: customer problem statement, How-Might-We prompts, and three persona storyboards. - OBJECTIVES.md: project objectives with justifications, success criteria, and non-goals.
e1a15b7 to
bd8e99d
Compare
…kets Additive-only type changes used as the base commit for parallel worktree implementation of packets P01-P09. No behavioral change. - LanguageId union extracted to core-types/src/language-id.ts (shared source of truth; ingestion copies can re-export in later packets). - CallableShape gains optional halsteadVolume, description, deadness for P07 (Halstead), P07 (@doc capture), and P08 (dead-code persistence). - ProjectProfileNode.frameworksDetected (optional) introduced as the v1.1 structured shape for P05 framework detection. flat string[] `frameworks` retained for backward compat. - FrameworkDetection + FrameworkCategory typed for P05 consumers. All shape additions are optional fields; existing graphs and v1.0 consumers continue to type-check and hash deterministically.
…ullEmbedder throw Pull the MCP `query` tool's embedder-resolution helpers into `@opencodehub/search` so the CLI can share the same smart-path logic (probe → try-open → warn-on-fail → BM25 fallback). `tryOpenEmbedder` is generic over the returned embedder type so MCP (richer `@opencodehub/embedder.Embedder`) and CLI paths both bind cleanly. `NullEmbedder.embed()` now warns once and returns a zero-vector in production; the throw is retained under `NODE_ENV === "test"`. Retire the "use --embeddings at v1.0" MVP tripwire wording across the package — search is production-ready after v1.0.
…rom @opencodehub/search Delete the local copies of `embeddingsPopulated` and `tryOpenEmbedder` from the MCP `query` tool — they now live in `@opencodehub/search/open-embedder.js` so CLI and MCP share one source of truth. The MCP call site passes a `[mcp:query]` log prefix so existing stderr warnings retain their identifier. No behavioural change — the extracted helpers are byte-equivalent to the originals (warn-on-fail, return null on any throw, probe returns false on schema mismatch). The MCP `query.test.ts` suite still passes, including the `EMBEDDER_NOT_SETUP → BM25 fallback` fixture.
…op-k `codehub query` now mirrors the MCP `query` tool's smart path: probe `embeddingsPopulated`, try to open the embedder (HTTP env vars first, then local ONNX), call `hybridSearch` on success, collapse to BM25 with a single `[cli:query]` stderr warning on failure. Hits carry a `sources: ["bm25"|"vector"]` tag in `--json` mode so agents can tell which ranker contributed each result. New flags: --bm25-only Skip the embedder probe entirely. --rerank-top-k <n> Number of fused hits RRF returns (default 50). Tests: three new cases exercise the hybrid-by-default branch, the BM25 fallback-with-warn branch, and the `--bm25-only` skip-the-probe branch. The fake store now implements `.query()` so the `embeddings` count probe + node metadata hydration both work without DuckDB. Unblocks P03 (hierarchical embeddings) and P04 (summarize enrichment); both assume the CLI query path is production-ready.
Gym matrix was red on every leg. Two independent failures: - gym (go) installed gopls@v0.21.0 which requires Go 1.26 (per live golang/tools go.mod); we pin Go 1.23. v0.18.x is the newest line that still builds on Go 1.23.4+. - Every non-Go leg failed `pnpm install` with `spawn node-gyp ENOENT`. pnpm v10 disables dependency lifecycle scripts by default; the Gym workflow had no global `node-gyp` on PATH and the workspace had no `pnpm.onlyBuiltDependencies` allowlist. Both are required. Adds ADR 0003 with the gopls/Go matrix and the onlyBuiltDependencies policy so the next drift is self-diagnosing.
Surfaces `ScanOutput.submodulePaths` — the Linguist-canonical filter input for later phases that must avoid traversing into submodules. - Primary path: `git ls-tree -r -z HEAD` filtered on mode 160000 (gitlink). Works on bare repos, detached worktrees, and partially-initialised submodules, unlike `git submodule` which depends on `.git/config` being populated via `init`. - skipGit fallback: textual parse of `.gitmodules` (ING-S-001). Fixes the required-field propagation through existing test scan-output builders (ownership, temporal, structure, incremental-scope).
Before this fix, `ownership` emitted 280+ `git blame failed` warnings per run on submodule contents under `packages/gym/corpus/repos/**`, drowning out real signal. Adds `OwnershipOptions.excludeSubmodules` (default true) and filters `sortedPaths` against `ScanOutput.submodulePaths` before `batchBlame`. Exports `filterOutSubmodules` as a pure helper for testability. Opt-out via `excludeSubmodules: false` preserves the old behaviour.
Two false-negatives on `codehub doctor` outside the monorepo layout: - `resolveFromRoot` walked four dirs up from the CLI bin and asked that path to resolve dependencies. Under a global `pnpm i -g` install, tarball download, or symlinked `./node_modules/.bin/codehub`, that path has no `node_modules`. Now we resolve from the CLI's own `node_modules` first (`createRequire(import.meta.url)`) and fall back to the supplied `repoRoot`. - `embedder weights` probed `model-int8.onnx` (hyphen). The embedder writes `model_int8.onnx` (underscore) per `embedder/src/paths.ts:49`. Fixed to match. Exposes the existing `DoctorOptions.repoRoot` through a new `--repoRoot` flag so a user with a non-standard layout can force the fallback path.
Introduces structured framework detection covering 20 numbered entries from the research packet (23 distinct frameworks across the rows), profile-gated on ecosystem. - `frameworks-catalog.ts` — typed 23-entry `FrameworkRule` table with per-entry manifest keys, file markers, variant axes, and parent linkages (e.g. React as parent of Next.js / React Native). - `variant-detectors.ts` — pure resolvers for 9 variant axes: React scaffold (CRA/Vite/custom), Next.js router (app/pages/hybrid), NestJS adapter (Express/Fastify), FastAPI ORM (SQLAlchemy/SQLModel/ Beanie/Tortoise), Spring Boot style (MVC/WebFlux), Tauri version (v1/v2), React Native flavor (bare/expo-managed/expo-prebuild), Rails style (api-only/standard), ASP.NET Core style (minimal-apis/ MVC/Razor Pages). - `framework-detector.ts` — catalog dispatcher that runs each rule, resolves variant + version + parent, and emits a sorted, deterministic `FrameworkDetection[]`. Skips entries whose ecosystem is absent from `ProjectProfile.languages` (FRM-S-001/002). Implements requirements FRM-U-001..004, FRM-E-001..003, FRM-S-001/002, FRM-UN-001 (parent linkage for mutual-exclusion wrapping) and FRM-UN-002 (malformed-manifest tolerance). Non-goals preserved for follow-ups: `--framework-profile` external YAML loader, framework-aware scanning rules, impact-analysis weighting changes.
Replaces the 13-rule legacy matcher with a delegator that calls the new `framework-detector.ts` dispatcher while keeping the legacy `detectFrameworks` entrypoint and its flat `string[]` return shape (backward compat). `profile.ts` now emits both fields on the `ProjectProfile` node: - `frameworks: readonly string[]` — projected from the detection names for v1.0 consumers. - `frameworksDetected: readonly FrameworkDetection[]` — structured output with variant / version / confidence / parent. The profile phase threads `detectedLanguages` to the detector so ecosystem gating kicks in; JS/Python/Ruby/Go/Rust/Java/PHP/C# can all be skipped when their language is not detected. Determinism preserved: detections are sorted alphabetically by name, manifests are read in a fixed order, and variant resolvers are pure functions of the scanned snapshot.
Teaches the DuckDB write path to serialize `ProjectProfileNode`'s new
`frameworksDetected` field alongside the legacy flat array in the
`frameworks_json` TEXT column. The column becomes polymorphic:
- Legacy v1.0 rows — flat `string[]`, untouched.
- v2.0 rows — `{ flat: string[], detected: FrameworkDetection[] }`
when structured detections exist; falls back to the legacy flat
shape when the array is empty so no schema version bump is needed.
Read-back in `packages/mcp/src/tools/project-profile.ts` sniffs the
shape and surfaces both forms to clients. Existing round-trip tests
continue to pass because the flat-only shape is preserved when
`frameworksDetected` is absent.
Updates the `project_profile` MCP tool to decode the polymorphic `frameworks_json` column. The tool now returns both: - `frameworks: string[]` — backward-compatible flat names. - `frameworksDetected: FrameworkDetection[]` — structured detections with variant / version / confidence / parentName / signals. Text rendering prefers the structured form when available, showing each framework with its variant inline (e.g. `nextjs:app-router`, `fastapi:sqlmodel`). Falls back to the flat list for legacy graphs.
…sm tests for framework detection 47 tests covering: - Positive fixture per catalog entry (23 tests across the 20 numbered framework rows), confirming name / category / version / parent / confidence. - 13 variant-axis tests across Next.js (app / pages / hybrid), React (CRA / Vite), NestJS (Express / Fastify), FastAPI (SQLModel / Beanie), Spring Boot (MVC / WebFlux), Tauri (v1 / v2), React Native (bare / expo-managed), Rails (api-only / standard). - False-positive corpus with 3 repos (plain Node library, plain Python CLI, static HTML) that must detect zero frameworks. - Profile-gating tests — Python/JS detectors skipped when their language is absent. - Parent-linkage (FRM-UN-001) — Next.js carries `parentName: "react"` when both are emitted. - Determinism — two runs on identical input produce byte-identical JSON; output is sorted alphabetically by name. - Malformed-manifest guardrail (FRM-UN-002) — broken package.json does not abort and yields zero detections. Catalog-coverage sanity test asserts the catalog names match the 23 expected entries, and every variant discriminator binds to a resolver.
Wire write paths for every v1.2 reserved column in the DuckDB adapter: - `halstead_volume`, `input_schema_json`, `partial_fingerprint`, `baseline_state`, `suppressed_json` added to schema DDL + NODE_COLUMNS (append-only; column ordering preserved). - `nodeToRow` now reads `halsteadVolume`, `inputSchemaJson`, `partialFingerprint`, `baselineState`, `suppressedJson`, `coveredLinesJson` off the GraphNode shape. - `normalizeDeadness` rewrites the hyphenated "unreachable-export" that the analysis helper emits into the underscored enum the column stores so consumers query a single spelling. - `coveredLinesOrNull` prefers a caller-supplied JSON string over the numeric array so callable-scoped coverage can ride alongside the File-level array without double-stringification. - `CallableShape` gains `coveragePercent` and `coveredLinesJson` so the callable-scoped overlay compiles cleanly. `FindingNode` and `ToolNode` doc comments drop the "v2.0 deferral" language now that every reserved field has a live writer.
`detectMcpTools` now scans for `inputSchema:` literals within a 10-line
window of each `{name, description}` pair and normalises the relaxed
JS-ish object literal (single quotes, unquoted keys, trailing commas)
into canonical, key-sorted JSON via `canonicalizeObjectLiteral`. The
`tools` phase threads the extracted string onto `ToolNode.inputSchemaJson`
so the storage adapter persists it to the `input_schema_json` column.
…Json `codehub ingest-sarif` now enriches every SARIF result with `opencodehub/v1` + `primaryLocationLineHash` partial fingerprints via `enrichWithFingerprints` before building the graph, and overlays `baselineState` from `<repo>/.codehub/baseline.sarif` when one exists. Both values round-trip onto `FindingNode.partialFingerprint` / `FindingNode.baselineState`, alongside the existing `suppressedJson`, so the `partial_fingerprint` / `baseline_state` / `suppressed_json` columns are populated on every persisted Finding. `codehub scan` was updated in lock-step: results are enriched before the baseline overlay so the write-through chain (scan → enrich → baseline → SARIF file → ingest-sarif) produces identical fingerprint values on both sides of the disk boundary.
When `scan.sarif` already carries per-result `baselineState` tags (the scan + ingest-sarif chain writes them today), reuse the stored tags directly instead of re-running the full `diffSarif` against the frozen baseline. This mirrors reading from the `baseline_state` DuckDB column without introducing a new SQL path — the SARIF file on disk is the source of truth for that column. Adds round-trip + backward-compat + graphHash-determinism tests for the v1.2 reserved column set on the DuckDB adapter: - every reserved column round-trips on a populated Function / Tool / Finding row. - nodes that don't own any of the new fields read back NULL (v1.0 graph opened with a v1.2 reader). - the adapter rewrites "unreachable-export" into "unreachable_export" on write so the column stores a single canonical spelling. - `graphHash` stays stable when the new fields are populated in different literal orders.
…lper Extends SearchResult with optional summary/signatureSummary fields that query surfaces populate after joining symbol_summaries. Adds the Map- returning getSymbolSummariesByNodeIds wrapper on DuckDbStore so the MCP and CLI query paths can fetch many rows in one round trip and pick the newest prompt version per node deterministically via the documented ORDER BY contract. P04 deliverable 7 / storage-side support.
…maryModel
Recognizes the AWS SDK credential-missing error family
(CredentialsProviderError, NoCredentialsError, ExpiredTokenException,
and the matching message shapes from `from*` providers) at both factory
construction and first-call stages; converts them into a graceful
{enabled:false, skippedReason:"no-credentials"} phase output so analyze
stays green for contributors without Bedrock access (SUM-S-002 /
SUM-UN-001). Any non-credential error on later candidates continues to
surface through the per-hit `failed` counter as before.
Adds a `summaryModel` option on PipelineOptions and threads it into the
phase so `--summary-model <id>` can route to a non-default Bedrock
model. Defaults to DEFAULT_MODEL_ID when absent so production deploys
stay pinned.
Tests: credential-error soft-fail at factory and first-call sites, and
summaryModel override round-trip onto emitted rows.
P04 deliverable 3 / 4.
After the hybrid / BM25 ranking runs, both the MCP `query` tool and the CLI `codehub query` command now fetch `symbol_summaries` rows for the top-K hits in a single `IN (...)` round trip and attach `summary` and `signatureSummary` to each result. Hits without a summary row stay unchanged, and any lookup failure (missing table, schema drift, I/O error) degrades silently — summaries are enrichment, not load-bearing. The CLI text formatter gains a SUMMARY column (capped at 120 chars with ellipsis) that renders only when at least one hit carries a summary, so older indexes that haven't run the summarize phase see byte-identical output. `--json` always carries the new fields when present. Tests: MCP + CLI both verify summaries appear on hits when rows exist, stay absent when the table probe reports none, and the CLI formatter suppresses the column when no hit has a summary. P04 deliverable 6.
Flips the summarize phase default ON with commander's --no-summaries negation. --max-summaries accepts either a non-negative integer or the literal string "auto" (the default); auto resolves at run time to min(floor(lspConfirmedCallableCount * 0.1), 500) by counting prior-run Function/Method/Class nodes in the DuckDB index, falling back to a conservative cap of 50 when no prior index exists. Adds --summary-model <id> to override the Bedrock model id used by the phase. Adds the CODEHUB_BEDROCK_DISABLED=1 env kill-switch (SUM-S-001), which forces the phase off regardless of flags. The resolution helpers (resolveSummariesEnabled + resolveMaxSummariesCap) are exported for direct unit testing. Tests cover the auto cap math (10% floor, 500 clamp, first-run fallback, 0-clamp when disabled), the env kill-switch truth table, and the passthrough of explicit numeric caps. P04 deliverables 1, 2, 5, 8 (cap + env tests).
Replaces the hardcoded per-language switch in `phases/complexity.ts` with a lookup into `LanguageProvider.complexityDefinitionKinds`, adds `halsteadOperatorKinds` blocks to every provider, and wires a Halstead volume computation over the body of each callable. Populates the new `halsteadVolume` field on the CallableShape (core-types) alongside the existing cyclomatic / nesting / NLOC fields; the storage column is present but the field is silently dropped when the schema lacks it. Also unescapes the backtick-in-template-literal comment in storage/schema-ddl.ts (wave-0 breakage) so the TypeScript build passes. Retires the v1.0 "Halstead volume deferred" TODO and the "six v1.1 additions skipped" comment in the complexity phase.
Adds four new framework-gated route detectors and wires them into the routes phase based on `ProjectProfile.frameworks`: - FastAPI / Starlette (@app.get, @router.post, @app.api_route(methods=...)) - Spring MVC + WebFlux (@RequestMapping, @GetMapping / @PostMapping / @PutMapping / @DeleteMapping / @PatchMapping, with class-level path prefix composition) - NestJS (@controller prefix + @get / @post / @put / @delete / @patch / @options / @Head / @ALL method decorators) - Rails `config/routes.rb` — verb-level helpers, `resources` / `resource` expansion, `namespace` / `scope` prefix tracking Each detector ships two+ positive fixtures and one negative fixture. Routes phase now depends on the profile phase so detected-framework gating is race-free. Extends `ExtractedRoute.framework` with `fastapi` | `spring` | `nestjs` | `rails`. Retires the v1.0 "FastAPI / NestJS / Spring deferred" comment in extract/route-detector.ts.
Each per-ecosystem dep-parser now carries an optional `license` on the
ParsedDependency tuple, set directly from the manifest field when
present:
- npm: harvests v2/v3 package-lock.json + legacy v1 + pnpm lockfile
snapshots; handles string / `{type}` / array shapes.
- pypi: pyproject / uv.lock — PEP 621 string, PEP 621 `{text,file}`
table, PEP 639 `license-expression`, trove-classifier fallback.
- cargo: Cargo.lock v3+ `[[package]].license` when declared.
- maven: non-standard `<license>` child under `<dependency>`.
- nuget: non-standard `<License>` attribute / child on
`<PackageReference>`.
- go: left undefined (no manifest source without a network call;
documented).
The dependencies phase merges the parser-supplied license into the
DependencyNode, preferring a real value over `"UNKNOWN"` when the same
coordinate appears in multiple sources. `dedupAndSort` now keeps the
defined-license copy when two records collide. Retires the v1.0 "license
detection deferred" comment in dep-parsers/types.ts.
Go: extractGoHeritage now builds the method set of every declared type, reads each interface's method set directly off the captured `@definition.interface` text, and emits an IMPLEMENTS edge when the type's method set is a superset of the interface's. Conservative guard: same-file is the proxy for same-package, and cross-export visibility is required when the file boundary separates declarations. C++: extractCppImports now also captures C++20 `import` declarations: named modules (`import std;`), system header-units (`import <vector>;`), and user header-units (`import "utility.hpp";`). `export import` is treated as a regular import for graph purposes. Retires the v1.0 deferral comments in providers/go.ts:230 and providers/cpp.ts:37.
Unified queries now emit @doc captures for TypeScript / TSX / JavaScript (JSDoc), Python (docstring as first body expression), Go (godoc comment group), and Rust (rustdoc line_comment + block_comment). The parse phase consumes those captures and populates `node.description` via language-specific heuristics: - Python: doc capture within the function body range. - TS/JS/TSX: JSDoc block whose end line is 0-2 lines before the decl. - Go: contiguous `//` group ending on the line above the decl. - Rust: contiguous `///` / rustdoc block group ending above the decl. Text is stripped of comment markers and leading `*` decorations before storage so downstream consumers can render it directly. Retires the v1.0 "no @doc capture at MVP" note in unified-queries.ts.
c33140f to
4c04577
Compare
Third Embedder backend alongside local ONNX and OpenAI-compatible HTTP. Opt-in via CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT; reuses the default AWS credential chain with soft-fail matching the Bedrock pattern in summarize.ts. Chunked batching ≤64 per TEI endpoint caps, 413 split-retry, response shape + dim validation, modelId stamped as gte-modernbert-base/sagemaker:<endpoint> so backend drift is visible in index metadata.
4c04577 to
8ea37c1
Compare
Three SCIP-ingest correctness fixes that together lift graph SCIP CALLS from 237 to 2,665 on this repo and resolve caller attribution for anonymous arrow-callback patterns (server.registerTool, withStore, etc). 1. Walk past local N defs in innermostEnclosing. When the tightest enclosing SCIP def at a call site is a local (e.g. the body of an anonymous arrow passed as a callback), fall through to the next tightest non-local enclosure. Callers inside arrow callbacks now attribute to the enclosing top-level function, not to nothing. 2. buildSymbolDefIndex now aliases src/<p>.ts defs under dist/<p>.d.ts. In a TS monorepo where each package declares types: "./dist/*.d.ts", cross-package refs carry dist-shape descriptors while definitions carry src-shape descriptors. Without the alias, ~8k of ~12k derived edges fail to resolve their callee. 3. Add +1 at the scip-ingest -> OCH boundary in emitEdges. SCIP ranges are 0-indexed; graph node startLine/endLine are 1-indexed (tree-sitter row + 1). Callee lookups were falling one line above the function span and silently dropping. The emitEdges signature now takes the symbolDef map, replacing the previous first-call-site heuristic that silently routed cross-file name collisions to whichever call site was seen first. Tests: scip-ingest 10/10, ingestion 576/576.
Replace the generic BM25 top-1 in runContext with a three-outcome resolver (resolved / ambiguous / not_found). Filter external re-export stubs (file_path = '<external>', kind = 'CodeElement') out of the SQL WHERE clause so they cannot out-rank the real function node on length-normalized BM25. Fall back to the old store.search path only when the exact-name query returns zero rows, preserving concept-phrase queries. Add --target-uid, --file-path, --kind flags to the context command, matching the disambiguation surface already on impact. Ambiguity emits a candidate list (JSON or TTY) and exits 1 instead of silently picking. Tests: cli 189/189 (4 new — stub rejection, ambiguity branch, targetUid short-circuit, BM25 fallback).
Six files across ingestion, analysis, and gym redeclared const CODEHUB_DIR = '.codehub' locally or hardcoded the literal in path.join calls, bypassing the META_DIR_NAME constant exported from @opencodehub/storage. Migrate them to import and compose from the shared constant so a future rename only touches one site. Also: - Add @opencodehub/storage: workspace:* to packages/gym so scip-factory can import META_DIR_NAME. - Fix two stale .opencodehub/scip doc-comments in scip-ingest/src/runners/index.ts to say .codehub/scip. Scope is limited to repo-local (per-project) paths. The user-home CODEHUB_HOME_DIR constant in registry.ts / group-resolver.ts / group-sync.ts / embedder/paths.ts is a semantically separate concept and is intentionally left alone. The sibling .opencodehub -> META_DIR_NAME migration in ingestion/src/pipeline/phases/scip-index.ts landed in the prior fix(scip-ingest) commit because it was entangled with the symbolDef threading and was not cleanly separable.
- CLAUDE.md / AGENTS.md: remove the pre-existing ## Always Do /
## Never Do / ## Resources / ## CLI / <!-- gitnexus:end --> block
inherited when the repo was forked from a GitNexus template. The
tool-name references (gitnexus_impact / gitnexus_query / etc) are
wrong — this project ships codehub CLI + mcp__opencodehub__* MCP
tools. Keeps only the auto-regenerated ## OpenCodeHub MCP Tools
stanza at the top.
- Delete .claude/skills/gitnexus/ (not tracked, goes away silently) —
shadowed the live opencodehub-* skills that plugins/opencodehub/
already ships.
- .gitignore: add .gitnexus (local cache-dir left behind by an earlier
tool — no in-tree code writes there, entry is defensive).
- .erpaval/solutions/: four lessons capturing non-obvious findings
worth keeping across sessions.
- scip-callee-definition-site.md: SCIP ingest must resolve callees
from DEFINITION occurrences, not first call sites (prior session).
- scip-monorepo-dist-src-alias.md: cross-package TS refs carry dist/
descriptors while defs carry src/ — alias required.
- scip-0-indexed-vs-graph-1-indexed.md: +1 at the SCIP/OCH boundary.
- bm25-over-node-id-favors-stubs.md: length-normalized BM25 over a
nodes FTS index systematically picks synthetic re-export stubs
over real Function nodes; fix with exact-name SQL + stub filter +
disambiguation flags.
- .erpaval/INDEX.md: new pointer lines for the four lessons.
8ea37c1 to
a407c1a
Compare
## Summary Consolidates all 12 open dependabot PRs into a single branch so they can land together with one CI cycle. ### npm deps - `@aws-sdk/client-bedrock-runtime` 3.1035.0 → 3.1040.0 — closes #50 - `@commitlint/cli` 20.5.0 → 20.5.3 — closes #49 - `fast-xml-parser` 5.7.1 → 5.7.2 — closes #48 - `sharp` ^0.34.1 → ^0.34.5 — closes #47 - `astro` ^6.1.9 → ^6.2.1 — closes #46 - `ts-morph` ^25.0.1 → ^28.0.0 — closes #45 - `@bufbuild/protobuf` 2.11.0 → 2.12.0 — closes #44 - `ajv` 8.18.0 → 8.20.0 — closes #43 - typescript-tooling group (`@biomejs/biome` 2.4.12 → 2.4.13, `@types/node` 24.12.2 → 25.6.0) — closes #42 ### GitHub Actions - `actions/cache` v4 → v5 — closes #41 - `googleapis/release-please-action` v4 → v5 — closes #40 - `actions/github-script` v7 → v9 — closes #39 ### Drive-by fixes - Bump `biome.json` `$schema` URL to 2.4.13 to match the new CLI version. - Remove `.gitnexus` from `.gitignore` — it was tripping the banned-strings guardrail (added in b848c2f) and blocking every local commit via lefthook. ## Test plan - [x] `pnpm install` — no peer warnings introduced beyond those already present on main - [x] `pnpm run build` — all 15 packages build - [x] `pnpm run typecheck` — clean - [x] `pnpm -r test` — 1627 pass, 0 fail across all packages - [x] lefthook pre-commit (biome + banned-strings) passes Does NOT touch the `pnpm.onlyBuiltDependencies` list — verified by diff, because prior sessions saw `pnpm approve-builds` destructively rewrite it.
Replace 6 bespoke doc-* subagents with 17 ERPAVal-style packet skeletons, one per output file. Orchestrator seeds each packet from templates/agents/, substitutes placeholders, and spawns general-purpose subagents that edit the packet in place per the write protocol. Phase 0 now runs as two parallel MCP waves (0a: independent calls in one message, 0b: schema/profile-dependent calls in one message) + inline Write. Phases AB + CD collapse into Phase 1 with priority-slice batching under Claude Code's ~10-concurrent-subagent ceiling. File-level fan-out shrinks blast radius from role to file, makes --refresh / --section trivial single-subagent dispatches, and cuts estimated wall-clock ~50% (single-repo ~3-4min to ~2min).
a407c1a to
bbf8373
Compare
Sweep across every CLAUDE.md / AGENTS.md / README.md plus the Starlight
site to fix factual drift, clunky install flow, and missing content.
Fixed drifts:
- MCP tool count 27 → 28 (README, OBJECTIVES, CLAUDE.md, AGENTS.md,
Starlight monorepo-map + mcp/tools pages).
- Eval harness 49 → 98 cases / 7 → 14 languages / 9 → 15 gates.
- Plugin README: "six doc-* subagents" → 17 with family grouping;
"two PostToolUse hooks" → PreToolUse + two PostToolUse.
- Gym README: removed dangling reference/metric-choice.md link; fixed
layout block to match actual dirs; clarified Java deferral.
- thiserror version drift: v2.0.0 → 2.0.17 across rust + repos READMEs.
- Added missing monorepo/electron-ws-python row to corpus/repos index.
Install flow cleanup:
- Root README quick-start now cli:link → codehub init → codehub analyze
(was: absolute-path `claude mcp add ... node /path/to/dist/index.js`).
- 5 editor guides (Claude Code, Codex, Cursor, OpenCode, Windsurf) now
show `"command": "codehub", "args": ["mcp"]` as primary, long-form
as fallback for unlinked checkouts.
- start-here/{quick-start,install,first-query}.md flipped to
`codehub <cmd>` primary with single fallback callout per page.
New content:
- architecture/overview.md — rewritten as six-phase index with mermaid
pipeline + DuckDB-table diagrams.
- architecture/parsing-and-resolution.md (new) — tree-sitter →
ParseCapture → per-language resolvers + sequence diagram.
- architecture/scip-reconciliation.md (new) — .scip ingest,
confidence-demote, provenance tagging, flagged REFERENCES/heritage
edges as unwired (future work).
- architecture/scanners-and-sarif.md (new) — P1/P2 tier membership,
license-incompatible wrappers, findings delta bucket math.
- architecture/embeddings.md (new) — ONNX/HTTP/SageMaker cascade,
three-tier + single filter-aware HNSW + RaBitQ.
- architecture/summarization-and-fusion.md (new) — Haiku 4.5 + Zod +
Bedrock cache, fusion formula at ingestion, cache-key discriminator.
Infra:
- rehype-mermaid@^3 + playwright@^1.59 wired with strategy: 'img-svg'.
39 inline SVGs rendered at build time. `syntaxHighlight.excludeLangs:
['mermaid']` required so Shiki doesn't claim the fence first.
- pages.yml: dropped --ignore-scripts (was blocking sharp's native
binary) and added `playwright install chromium --with-deps` step.
- Fixed 3 pre-existing mermaid parse errors surfaced once rendering
actually ran (node named `graph` collides with keyword; `;`/`{}` in
sequenceDiagram labels).
- Fixed 3 broken /opencodehub/reference/mcp-tools/ links (real path is
/opencodehub/mcp/tools/).
- Pagefind 404 on GH Pages diagnosed as NODE_ENV=development baked into
a local build, not a Starlight config issue — 0.38.4 already emits
bundlePath "/opencodehub/pagefind/" correctly in production. CI has
no NODE_ENV override, so the --ignore-scripts fix alone unblocks it.
ERPAVal meta:
- Persisted a novel lesson: solutions/conventions/llms-txt-as-ground-
truth.md — in a Starlight site with starlight-llms-txt, astro.config
description/details strings are more load-bearing than prose docs.
- Logged 4 follow-ups under .erpaval/debt.md — missing READMEs
(cli/mcp/ingestion/scanners), .gitmodules pin comment, dead eval
fallback code, unwired SCIP REFERENCES+heritage edges.
Final build: 47 pages, 653ms pagefind index, zero link errors, 46
llms-nav pages patched.
bbf8373 to
e4b5ad3
Compare
The prior commit added rehype-mermaid + playwright to the docs site, which broke CI jobs that run `pnpm -r build` without provisioning the Playwright chromium binary. Only pages.yml has the install step. - ci.yml typecheck: filter out docs from the pre-typecheck build (`pnpm -r --filter='!@opencodehub/docs'`). The docs site doesn't export types to anything else, so skipping it is safe. - gym.yml (matrix + monorepo jobs): same filter. - package.json: add `pnpm.overrides` for `dompurify@<3.4.0`. Mermaid pulls in dompurify 3.3.3 which has 4 OSV findings (GHSA-39q2-94rc-95cp + 3 siblings, all Medium). Fixed version 3.4.0 exists; override cleans `mise run osv` to zero findings.
The previous commit added --filter='!@opencodehub/docs' to the CI and Gym workflow build steps, but pnpm -r still included the repo root in the scope. The root package.json had a build script 'pnpm -r build' which re-ran recursively without the filter, sweeping docs back in. Dropping the root script — pnpm -r build on 14 packages suffices.
Root tsconfig.json still referenced ./packages/lsp-oracle from the pre-SCIP era. The package was deleted in the SCIP-replaces-LSP migration but the project reference wasn't cleaned up. Surfaced by the CI filter change that put tsc --noEmit under pnpm -r scope, which now walks the root tsconfig.
e4b5ad3 to
d6b92cb
Compare
Surfaced by the CI hotfix chain today — root tsconfig.json still had a ./packages/lsp-oracle reference 8 days after the package was deleted in the SCIP-replaces-LSP migration. Failure only surfaced once the docs-filter CI change put tsc under pnpm -r scope including the root. Lesson captures the why (pnpm -r root inclusion + package-level tsconfig hiding) and the how-to-apply (grep before delete, check for TS6053 root causes before chasing surface errors). Added INDEX.md pointer.
d6b92cb to
53ab363
Compare
mise-action@v4 exports NODE_ENV=development by default. Astro reads NODE_ENV to decide dev vs production, and Starlight's Search component returns the 'Search is only available in production builds' stub when DEV=true. The live site at github.io/opencodehub/ was shipping that stub, so the search box opened a message instead of Pagefind. Setting NODE_ENV=production on the build job overrides mise-action's default and restores production behavior.
Job-level env: NODE_ENV: production didn't stick because mise-action@v4 writes NODE_ENV=development to $GITHUB_ENV after the job starts, and $GITHUB_ENV beats job-level env for subsequent steps. Move the override to the step level — step env beats $GITHUB_ENV — so astro build actually sees NODE_ENV=production and ships Pagefind instead of the dev stub.
Step-level env apparently isn't enough — deployed HTML still shows the dev-mode Search stub. Try inline via /usr/bin/env and log the value that actually reached the shell so we can diagnose if it's still being overridden somewhere.
mise-action@v4 populates $GITHUB_ENV with NODE_ENV=development at setup time, which beats step-level env: and inline env VAR=val when Vite is compiling Starlight's Search component (reads process.env at bundle time, not via step overrides). Fix: write NODE_ENV=production to $GITHUB_ENV in its own step before the build. That overrides mise-action's contribution for every subsequent step in the job.
Despite multiple CI-level NODE_ENV overrides (step-env, inline env VAR=val wrapper, $GITHUB_ENV writes), the deployed Starlight Search component still ships the dev-mode stub. Something in the CI toolchain is leaking NODE_ENV=development to astro/Vite's import.meta.env.DEV resolution at compile time. Set NODE_ENV=production literally in the package.json build script. This is the 'just make it correct' hammer that bypasses all upstream CI env plumbing.
…n CI Captured the 5-commit diagnostic chase we just did. Four CI-level NODE_ENV overrides failed (job env, step env, inline env, $GITHUB_ENV) before hard-coding NODE_ENV in package.json 'build' script finally made Starlight's Search component wire Pagefind in production. Adds entry to .erpaval/INDEX.md pointing to the solution.
53ab363 to
8846c15
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Automated release via release-please
analysis: 0.2.0
0.2.0 (2026-05-01)
Features
Refactoring
Dependencies
cli: 1.0.0
1.0.0 (2026-05-01)
⚠ BREAKING CHANGES
Features
Bug Fixes
Performance
Dependencies
core-types: 1.0.0
1.0.0 (2026-05-01)
⚠ BREAKING CHANGES
Features
Refactoring
embedder: 0.2.0
0.2.0 (2026-05-01)
Features
Dependencies
ingestion: 1.0.0
1.0.0 (2026-05-01)
⚠ BREAKING CHANGES
Features
Bug Fixes
Performance
Refactoring
Dependencies
mcp: 1.0.0
1.0.0 (2026-05-01)
⚠ BREAKING CHANGES
Features
Refactoring
Dependencies
sarif: 0.2.0
0.2.0 (2026-05-01)
Features
scanners: 0.2.0
0.2.0 (2026-05-01)
Features
Dependencies
search: 0.2.0
0.2.0 (2026-05-01)
Features
Dependencies
storage: 0.2.0
0.2.0 (2026-05-01)
Features
Dependencies
root: 1.0.0
1.0.0 (2026-05-01)
⚠ BREAKING CHANGES
Features
Bug Fixes
Performance
Documentation
Refactoring
This PR was generated with Release Please. See documentation.