theagenticguy · theagenticguy · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
@@ -44,6 +44,8 @@ development sessions. Solutions are reusable; specs are per-feature.
 - [Smoke-testing a workspace cli requires packing every publishable workspace dep](solutions/best-practices/workspace-tarball-pack-all-publishables.md) — `npm install -g <cli.tgz>` falls back to registry for un-packed transitive workspace deps, dragging in the previously-published versions and masking install-graph regressions. Pack everything publishable, every time.
 - [GitHub Actions top-level permissions cap every job](solutions/conventions/workflow-call-permissions-ceiling.md) — workflow's top-level `permissions:` is a ceiling; per-job blocks can narrow but not grant. `id-token: write` declared at job level silently no-ops if missing from top level. Diagnostic: read the "GITHUB_TOKEN Permissions" group in the failing job's log.
 
+- [lbug COPY FROM (subquery) bulk-load pattern](solutions/conventions/lbug-copy-from-subquery-bulk-load.md) — type-safe bulk inserts via COPY subquery; sentinel row, STRING[] never null + sentinel STRING[] never empty (LIST(ANY) trap), maxDBSize cap (8 TiB default exhausts VA), readOnly cannot run CREATE_FTS_INDEX, src/dst not from/to, eid not id alias.
+
 ## Specs
 
 - [001-scip-replaces-lsp](specs/001-scip-replaces-lsp/spec.md) — rip-and-replace LSP with SCIP for TS/Py/Go/Rust/Java. Task map: [tasks.md](specs/001-scip-replaces-lsp/tasks.md).
@@ -0,0 +1,102 @@
+---
+name: lbug-copy-from-subquery-bulk-load
+description: lbug v0.16.1 COPY FROM (subquery) pattern for type-safe bulk node+edge inserts — avoids INT64/DOUBLE confusion, sentinel pattern, and from/to keyword collision
+metadata:
+  type: project
+---
+
+## Pattern: `COPY <Table> FROM (UNWIND $rows AS r ... RETURN ...)`
+
+lbug v0.16.1's `UNWIND` + `CREATE/MERGE` path infers struct-field types per-row
+from JS values (`Number.isInteger(v) → INT64`, else `DOUBLE`). Any confidence=1.0
+edge lands as INT64 bit-pattern in a DOUBLE column and round-trips as garbage.
+
+**Fix**: `COPY <Table> FROM (UNWIND $rows AS r WITH r WHERE r.id <> '<SENTINEL>' RETURN ...)`.
+COPY reads column types from the DDL; CAST in the RETURN clause converts string-encoded
+numerics to the correct type; per-row inference never runs.
+
+### Node inserts
+
+```cypher
+COPY CodeNode FROM (UNWIND $rows AS r
+  WITH r WHERE r.id <> '__SENTINEL__'
+  RETURN r.id, r.kind, ..., CAST(r.start_line AS INT32), ..., CAST(r.cohesion AS DOUBLE), ...)
+```
+
+Row encoding rules:
+- INT32 columns: pass value as `String(Math.trunc(v))` or `null`
+- DOUBLE columns: pass value as `String(v)` or `null`  
+- BOOL columns: pass `true`/`false`/`null`
+- STRING[] columns: **never pass `null`** — pass `[]` for absent arrays, or lbug infers
+  LIST(ANY) and throws "Trying to create a vector with ANY type"
+- STRING columns: pass string value or `null`
+
+Sentinel row requirement: prepend a row with `id = SENTINEL_ID` and concrete typed values
+for every column (strings for numerics, `false` for bools, `[]` for string arrays, `""` for
+strings). The `WITH r WHERE r.id <> SENTINEL_ID` filters it before any storage write.
+Purpose: seeds struct-field type inference for lbug's binder so all-null batches don't fail.
+
+### Edge inserts
+
+```cypher
+COPY DEFINES(id, confidence, reason, step) FROM (UNWIND $rows AS r
+  WITH r WHERE r.eid <> '__EDGE_SENTINEL__'
+  RETURN r.src, r.dst, r.eid, CAST(r.confidence AS DOUBLE), r.reason, CAST(r.step AS INT32))
+```
+
+Critical rules:
+- Use `src`/`dst` (not `from`/`to`) as struct field names — `from` and `to` are Cypher
+  reserved keywords; lbug silently misinterprets `r.from`/`r.to` in a RETURN clause
+- Use `eid` (not `id`) for the edge id field in the row struct — lbug matches column name
+  `id` against the CodeNode primary key and tries to do a node lookup instead of treating
+  it as a rel property. Alias `r.eid` maps to the `id` rel property via positional column list
+- The `COPY E(id, confidence, reason, step)` column list is required to bind positional
+  RETURN columns to rel properties correctly
+- Sentinel's `src`/`dst` can be empty strings `""` — filtered by `WHERE r.eid <> SENTINEL`
+
+### Compatibility notes
+
+- `COPY FROM (subquery)` requires the subquery to have a RETURN clause
+- `WITH r WHERE` inside UNWIND is valid Cypher inside a COPY subquery
+- `IGNORE_ERRORS=true` silently drops rows where FROM/TO node lookup fails — avoid using
+  it as a crutch; fix the sentinel instead so the sentinel itself is filtered before lookup
+- The mmap "Buffer manager exception: Mmap for size 8796093022208 failed" is virtual
+  address-space exhaustion. lbug's default `maxDBSize` is `1 << 43` = 8 TiB per
+  `Database`; on 64-bit Linux the user has ~128 TiB of VA, so ~16 concurrent DBs
+  exhaust the address space (kernel `MAP_FAILED`). Fix: pass an explicit
+  `maxDBSize` (5th `Database` ctor arg, MUST be a power of 2) — 16 GiB is plenty
+  for OCH-scale graphs and drops virtual reserve 512×. Also pass `bufferManagerSize`
+  (2nd arg) — native default is `min(systemMem, maxDBSize) * 0.8`, often >50 GiB;
+  cap at 2 GiB for headroom across concurrent test pools without surfacing the
+  sibling error "Buffer manager exception: Unable to allocate memory! The buffer
+  pool is full and no memory could be freed!" Cite: kuzudb/kuzu#1826.
+- `Database.close()` is what triggers `~VMRegion` → `munmap()`; relying on JS GC
+  alone leaks the mapping between tests. Always `await db.close()` before opening
+  the next instance pointing at a different path.
+
+### "Trying to create a vector with ANY type" — sentinel STRING[] must be non-empty
+
+The lesson above says STRING[] columns must never be `null` and should be `[]` for
+absent arrays. That handles per-row binding, but the **sentinel row's** STRING[]
+fields must additionally be **non-empty** (e.g. `["__sentinel__"]`) — lbug's
+struct-field type inference looks at the FIRST row's array contents to fix the
+LIST element type. An empty-array sentinel (`[]`) yields `LIST(ANY)`, and any
+later data row with a string then throws "Trying to create a vector with ANY type".
+The seed value never reaches storage because `WITH r WHERE r.id <> SENTINEL`
+filters the row before COPY. Reproduces with a 3-column table and 3 rows:
+sentinel `kw=[]`, n1 `kw=[]`, n2 `kw=["a","b"]` → fails. Switching the
+sentinel to `kw=["__seed__"]` fixes it.
+
+### Read-only opens cannot run `CALL CREATE_FTS_INDEX` / `CALL CREATE_VECTOR_INDEX`
+
+lbug rejects writes against a Database opened with `readOnly=true`, including the
+`CALL CREATE_FTS_INDEX(...)` and `CALL CREATE_VECTOR_INDEX(...)` admin procedures
+that adapters use to ensure search-side indexes exist. Surfaces as "Cannot
+execute write operations in a read-only database!" the moment a reader calls
+`search()` or `vectorSearch()`. Fix: build both indexes at the end of `bulkLoad`
+(when the connection is read-write) and have the lazy-ensure helpers no-op in
+readOnly mode. Readers query the existing index; the index already exists
+because every write path runs through bulkLoad.
+
+**Why:** [[scip-replaces-lsp]] — same pattern of lbug's binding layer having non-obvious
+per-row type inference that requires careful workarounds.
@@ -80,19 +80,25 @@ This repo ships a Claude Code plugin at `plugins/opencodehub/` — it
 provides a `code-analyst` subagent and 10 skills. Install via
 `codehub init` (writes `.mcp.json` + links the plugin).
 
-## Storage backend — graph-default
-
-`CODEHUB_STORE` is unset by default. OpenCodeHub probes
-`@ladybugdb/core` and uses the graph-database backend when the binding
-is available; otherwise it falls back to DuckDB with a one-shot stderr
-advisory (gated on TTY or `OCH_VERBOSE=1`). Set `CODEHUB_STORE=duck` to
-force the legacy layout (single DuckDB file backs both graph + temporal
-views) or `CODEHUB_STORE=lbug` to require the graph-database backend.
-
-When both `graph.duckdb` and `graph.lbug` exist as siblings in the same
-`<repo>/.codehub/`, the newer-mtime file wins. See ADR 0013
-(`docs/adr/0013-m7-default-flip-and-abstraction.md`) for the rationale
-and the AGE/Memgraph/Neo4j/Neptune community-adapter escape hatch.
+## Storage backend — lbug graph + DuckDB temporal
+
+The graph tier is always `@ladybugdb/core` (`graph.lbug`); the temporal
+tier — cochanges, structured symbol summaries, and the
+`codehub query --sql` escape hatch — is always DuckDB
+(`temporal.duckdb`). Both files live under `<repo>/.codehub/`. There is
+no env-var, no probe, no fallback; if the lbug binding fails to load,
+`open()` throws `GraphDbBindingError` and the operation aborts. See
+ADR 0016 (`docs/adr/0016-duckdb-graph-rip.md`) for the rationale and the
+AGE/Memgraph/Neo4j/Neptune community-adapter contract that survives the
+rip-out (the segregated `IGraphStore` / `ITemporalStore` interfaces stay
+exactly because community-fork adapters are a deliberate escape hatch).
+
+`IGraphStore` lives only on `GraphDbStore`; `DuckDbStore` implements
+`ITemporalStore` only. Embeddings live in `graph.lbug` and stream into a
+per-call DuckDB temp table at pack time so the byte-identical Parquet
+sidecar still works (see `packages/pack/src/embeddings-sidecar.ts`).
+Future temporal swap (e.g. SQLite-WASM) only needs a new `ITemporalStore`
+implementor — no graph-tier change.
 
 ## Parse runtime — WASM-only, vendored grammars
 

@@ -1,6 +1,11 @@
 # ADR 0011 — Graph-DB backend (LadybugDB phase-1)
 
-- Status: **Accepted** — 2026-05-05 (Proposed) → flipped on the M3 merge.
+- Status: **Partially superseded** by [ADR 0016](./0016-duckdb-graph-rip.md)
+  on 2026-05-16. The "DuckDB-default plus LadybugDB opt-in" framing is
+  obsolete; lbug is the unconditional graph backend after the rip. The
+  LadybugDB integration shape and `IGraphStore` design introduced here
+  are unchanged.
+- Was: **Accepted** on 2026-05-05 and flipped on the M3 merge.
 - Authors: Laith Al-Saadoon + Claude.
 - Branch: `feat/v1-m3-m4`.
 - Supersedes nothing. Interacts with ADR 0001 (DuckDB backend stays the

@@ -5,7 +5,13 @@
 > in-tree because they were authored in parallel branches and accepted
 > on the same release. The next ADR uses 0014.
 
-- Status: **Accepted** — 2026-05-09 (Proposed) → flipped on the
+- Status: **Superseded** by [ADR 0016](./0016-duckdb-graph-rip.md)
+  on 2026-05-16. The auto-probe, dual-artifact arbitration, and
+  `CODEHUB_STORE` resolver introduced here are gone. lbug is the only
+  graph backend; DuckDB serves the temporal tier. The
+  IGraphStore/ITemporalStore segregation survives because community
+  adapters (AGE, Memgraph, Neo4j, Neptune) target it.
+- Was: **Accepted** on 2026-05-09 and flipped on the
   `feat/v1-finalize-track-a` merge (PR #71).
 - Authors: Laith Al-Saadoon + Claude.
 - Branch: `feat/v1-finalize-track-a`.

@@ -0,0 +1,146 @@
+# ADR 0016 — Rip out the DuckDB graph backend; lbug-only graph, DuckDB temporal-only
+
+- Status: **Accepted** — 2026-05-16.
+- Authors: Laith Al-Saadoon + Claude.
+- Branch: `feat/duckdb-graph-rip`.
+- Supersedes: [ADR 0013 — M7 default flip and storage abstraction](./0013-m7-default-flip-and-abstraction.md)
+  in its entirety; partially supersedes [ADR 0011 — graph-db backend](./0011-graph-db-backend.md)
+  (the "DuckDB-as-graph default" passages).
+
+## Context
+
+ADR 0011 introduced `@ladybugdb/core` (lbug) as a second `IGraphStore`
+backend behind a `CODEHUB_STORE` env-var selector. ADR 0013 flipped the
+default to graph-default with auto-probe-and-fallback semantics: when
+`CODEHUB_STORE` was unset, the resolver imported `@ladybugdb/core` and
+preferred lbug on success, otherwise fell back to DuckDB-as-graph. A
+dual-artifact detector picked the newer-mtime file when both
+`graph.duckdb` and `graph.lbug` existed in `<repo>/.codehub/`. The
+DuckDB graph adapter therefore lived as a permanent fallback path,
+maintained alongside the lbug adapter.
+
+Two things changed at the start of the 2026-05 dogfood cycle:
+
+1. **lbug bulk-load became feature-complete.** A separate session landed
+   the `COPY <Table> FROM (UNWIND $rows ...)` pattern that DuckDB
+   already had — type-safe ingestion of nodes and edges through lbug's
+   bulk path. After that, every `IGraphStore` surface — bulk-load, all
+   15 typed finders, BM25 search, HNSW vector search, traversals,
+   embeddings — runs on lbug; the v1.0 conformance suite passes against
+   lbug; the cross-adapter parity tests existed only to keep DuckDB
+   honest.
+2. **The dual-write code carried real cost.** ~1900 LOC of graph-tier
+   code in `duckdb-adapter.ts`, the ~3500-LOC parity test suite, the
+   resolver/probe/dual-artifact apparatus, the env-var, the docs that
+   tried to keep `codehub-graph` as a backend axis. Every `analyze`
+   path took two branches and every architectural claim ("storage is
+   pluggable") had to defend a backend that nobody set explicitly.
+
+The user's framing was "rip out the DuckDB fallback for graph store …
+keep the generic / abstractions but I don't want all this code for
+duckdb unless it's the temporal/genuine tabular type stuff. and in fact
+maybe even that should be sqlite wasm or something."
+
+## Decision
+
+**`IGraphStore` lives only on `GraphDbStore`; `DuckDbStore` implements
+`ITemporalStore` only.** The interface segregation introduced in
+session-33f24f (see
+[`solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md`](../../.erpaval/solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md))
+was anticipating exactly this split — community AGE / Memgraph / Neo4j /
+Neptune adapters target `IGraphStore` only and pair with the
+DuckDB-backed `ITemporalStore`. After this rip-out, that's also the
+in-tree shape: lbug owns `IGraphStore`, DuckDB owns `ITemporalStore`,
+and the in-tree adapters stop demonstrating the structural-typing-via-
+`implements both` case.
+
+Concrete shape after the rip:
+
+- `openStore({path})` always returns `{graph: GraphDbStore, temporal:
+  DuckDbStore, graphFile, temporalFile, close}`. No `backend` field on
+  the result envelope; no `backend?` option on the input.
+- The graph artifact is `<dir>/graph.lbug`. The temporal artifact is
+  `<dir>/temporal.duckdb`. `paths.describeArtifacts()` takes no arguments
+  and returns `{graphFile: "graph.lbug", temporalFile: "temporal.duckdb"}`.
+- `resolveDbPath` is renamed `resolveGraphPath` and returns the lbug
+  filename.
+- `CODEHUB_STORE` is gone. The env var is no longer consulted anywhere
+  in storage. The resolver, the dynamic-import probe of `@ladybugdb/core`,
+  the dual-artifact mtime arbitration, the `_lbugFallbackWarned` /
+  `_dualArtifactWarned` advisory state, and the
+  `_resetStoreResolverCache` test escape hatch are all deleted.
+- The MCP `sql` tool's `cypher` field becomes unconditionally available;
+  it routes to `store.graph.execCypher(...)`.
+- Embeddings live in `graph.lbug`. The pack embeddings sidecar streams
+  rows from `store.graph.listEmbeddings()` into a per-call DuckDB
+  `CREATE TEMP TABLE embeddings_export` on `temporal.duckdb`, then runs
+  the existing deterministic
+  `COPY (... ORDER BY ...) TO '<path>' (FORMAT PARQUET, COMPRESSION ZSTD)`,
+  then drops the temp table. The byte-identity contract for
+  `embeddings.parquet` is preserved.
+- The conformance suite at `packages/storage/src/test-utils/conformance.ts`
+  (`assertIGraphStoreConformance(name, factory)`) and the parity-harness
+  rebuilder at `packages/storage/src/test-utils/parity-harness.ts` stay.
+  They are the v1.0 contract for community adapters; deleting them
+  would contradict the segregation ADR's promise.
+
+## Backwards compatibility
+
+None. Existing `<repo>/.codehub/graph.duckdb` files are no longer read.
+Users re-run `codehub analyze` to write `graph.lbug` from scratch.
+There is no stale-artifact warning, no legacy alias, no kill-switch.
+This is a single-user dogfood repo today; the cost of a hard cutover is
+"re-run analyze once."
+
+## Operational impact
+
+- **Platform reach narrows to lbug's 5 prebuilt targets**:
+  `darwin-arm64`, `darwin-x64`, `linux-arm64`, `linux-x64`, `win32-x64`.
+  Alpine/musl and 32-bit Linux ARM users need a source build via
+  `cmake-js`. `codehub doctor` now hard-fails on missing binding (was
+  warn-and-continue in the auto-probe era).
+- **lbug's 8 TiB virtual mmap per Database can exhaust the 47-bit user
+  virtual address space on 64-bit Linux** when many test pools open
+  concurrently — surfaces as `Buffer manager exception: Mmap for size
+  8796093022208 failed`. Wave 1 chased this in detail and confirmed
+  with a probe: lbug's `maxDBSize` defaults to `1 << 43` and the
+  request is reserved at `Database` construction (not lazy).
+  `bufferManagerSize` defaults to `min(systemMem, maxDBSize) * 0.8`.
+  The pool now passes both as explicit constructor args
+  (16 GiB `maxDbBytes`, 2 GiB `bufferManagerBytes`) so concurrent
+  Databases do not exhaust VA. See
+  [`solutions/conventions/lbug-copy-from-subquery-bulk-load.md`](../../.erpaval/solutions/conventions/lbug-copy-from-subquery-bulk-load.md)
+  for the citations (kuzudb/kuzu#1826, the upstream Database constructor,
+  `BufferPoolConstants::DEFAULT_VM_REGION_MAX_SIZE`).
+- **Sentinel STRING[] columns must be non-empty in lbug bulk-load.**
+  An empty-array sentinel (`[]`) makes lbug's struct-field type
+  inference resolve to `LIST(ANY)`, and any later data row with a
+  string then throws "Trying to create a vector with ANY type". The
+  fix is `["__sentinel__"]`; the seed value is filtered before COPY by
+  the existing `WITH r WHERE r.id <> SENTINEL`.
+- **lbug rejects writes against a Database opened with `readOnly=true`,
+  including `CALL CREATE_FTS_INDEX(...)` and `CALL CREATE_VECTOR_INDEX(...)`.**
+  These are now built at the end of `bulkLoad` (write phase). The
+  `ensureFtsIndex` / `ensureVectorIndex` lazy helpers no-op in
+  readOnly mode; readers query the existing index built by the most
+  recent write.
+
+## Future work
+
+The user's "maybe sqlite-wasm for temporal" comment is captured here as
+forward-work: replacing `DuckDbStore` with a JS-only `ITemporalStore`
+implementor (e.g. `sql.js`, `wa-sqlite`) would drop the last native
+binding from the temporal tier and let OCH ship as pure-JS at the
+distributed-boundary. The interface contract — `exec(sql, params)`,
+`bulkLoadCochanges`, `lookupCochangesForFile`, `bulkLoadSymbolSummaries`,
+`exportEmbeddingsToParquet` — is small enough to port; only the
+deterministic Parquet writer would need investigation (sql.js does not
+ship a `COPY ... TO PARQUET` analog out of the box). Not in scope for
+this ADR.
+
+## Numbers
+
+Net diff for this rip: ~5,800 deletions, ~150 insertions. Workspace
+test count after: 1931 passing, 0 failing, 2 skipped (one platform-
+gated lbug vector probe + one platform-gated embedder probe). Storage
+package: 148/0/1 over three consecutive runs — no flake.
@@ -150,7 +150,7 @@ function sortEdges(edges: readonly FakeEdge[]): FakeEdge[] {
  * code under test.
  */
 export class FakeStore implements IGraphStore {
-  readonly dialect: GraphDialect = "none";
+  readonly dialect: GraphDialect = "cypher";
   readonly nodes: FakeNode[] = [];
   readonly edges: FakeEdge[] = [];
 

@@ -35,7 +35,7 @@ import {
   KnowledgeGraph,
   type NodeId,
 } from "@opencodehub/core-types";
-import { DuckDbStore, resolveDbPath, resolveRepoMetaDir } from "@opencodehub/storage";
+import { openStore, resolveGraphPath, resolveRepoMetaDir } from "@opencodehub/storage";
 import { loadPreviousGraph } from "./analyze.js";
 
 /**
@@ -164,11 +164,12 @@ async function seedPriorIndex(repoPath: string): Promise<{
   });
 
   await mkdir(resolveRepoMetaDir(repoPath), { recursive: true });
-  const store = new DuckDbStore(resolveDbPath(repoPath));
+  const store = await openStore({ path: resolveGraphPath(repoPath) });
   try {
-    await store.open();
-    await store.createSchema();
-    await store.bulkLoad(graph);
+    await store.graph.open();
+    await store.temporal.open();
+    await store.graph.createSchema();
+    await store.graph.bulkLoad(graph);
   } finally {
     await store.close();
   }