feat(v0.21.x): cookbook v1 — BM25 retrieval over 12 curated kCAD patterns + lookup_cookbook MCP tool + eval --cookbook flag (#22) by w1ne · Pull Request #59 · w1ne/kernelCAD-web

w1ne · 2026-05-03T14:59:18Z

Summary

Workstream #22 from the v0.2-to-v1.0 gap-closure roadmap (§I4). Adds a self-growing pattern library that the agent can search at runtime.

12 curated .kcad.ts snippets under cookbook/snippets/ (edge features, booleans, holes, sketches, symmetry, parameters). Markdown + YAML frontmatter + fenced TS body. Tag whitelist at cookbook/tags.json.
Pure BM25 retrieval in src/cookbook/ — search(query, snippets, k=3) over title + tags + keywords + when_to_use. Score floor 0.5, k clamped to [1, 5]. ~60 LoC, no external deps. Snapshot test locks ranking on 5 queries.
MCP tool lookup_cookbook(query, k?) registered alongside the existing 14 tools. Empty-hits is a valid success.
SKILL.md cookbook index — build-generated between  markers. CI gate via diff-check.
Eval --cookbook flag — pre-injects top-3 hits into a separate cache_control block on the system prompt. A/B golden test (eval/cookbook.test.ts) locks deterministic ranking.
npm run eval:ab — runs the suite twice (off/on) and prints score / token delta.
CI gates wired into npm run qc: cookbook:validate + cookbook:evaluate + cookbook:build + SKILL.md diff-check.

Test Plan

npm test — 959 passing, 25 skipped (rebased on develop @ c3de52d after feat(v0.21): synchronized live-build demo automation + 3 demo sets #58 landed)
CI green on the PR (qc + e2e)
Spot-check lookup_cookbook via MCP returns top-3 BM25 hits
npm run eval:ab smoke run shows the cookbook injection diff

Curated library of canonical .kcad.ts pattern snippets, indexed for in-prompt retrieval. Distinct from corpus expansion (cookbook is for agents to reference, corpus is for evaluation). Continuous. v1 scope: - 12 starter pattern snippets seeded from eval expert solutions and the documented kernel surface - Markdown frontmatter file format (id, title, tags, keywords, when_to_use, fenced TS body) - BM25 retrieval over title/tags/keywords/when_to_use; score floor 0.5; ~40 LoC pure TS, no external deps - Hybrid agent surface: build-generated SKILL.md cookbook index + MCP tool lookup_cookbook(query, k) - Eval --cookbook flag with pre-injection (separate cache_control block); A/B golden test on bracket-holes - CI gates: validate frontmatter, evaluate every body clean, diff-check generated SKILL.md section - Continuous growth contract: same-PR additions, eval-driven additions, snapshot-test gate on ranking shifts Native-framed per the no-competitor-refs rule. Lineage captured in ~/.claude/projects/-home-andrii/memory/kernelcad_design_lineage.md under '#22 cookbook with retrieval (2026-05-03) — design-time lineage'.

19 tasks (102 bite-sized steps) covering: - yaml dep + tags whitelist + 12 starter snippets - Pure BM25 retrieval module (~60 LoC, no deps) - Snippet loader with frontmatter + tag-whitelist validation - search() composition with score floor + k clamping - 3 CI gates (validate, evaluate, build) wired into qc - SKILL.md cookbook index generator + diff-check - MCP tool lookup_cookbook - TranscriptEvent kind cookbook_inject + renderer - AgentClient systemAddendum (separate cache_control block) - Eval --cookbook flag + per-task pre-injection - A/B golden test on bracket-holes - npm run eval:ab convenience script - CHANGELOG entry TDD throughout (test before implementation). Frequent commits (one per task). Ready for subagent-driven-development execution. Spec lineage: docs/superpowers/specs/2026-05-03-cookbook-with-retrieval-design.md (2ab8190).

…luate clean)

…nto qc - Add cookbook:validate, cookbook:evaluate, and cookbook:build to qc script - Add git diff --exit-code check on src/skill/SKILL.md to catch drift - Fix lint error: remove unused import of loadSnippets (re-exported via export statement)

…che block)

w1ne and others added 26 commits May 3, 2026 16:55

chore(cookbook): add yaml dep + tags whitelist + snippets folder

efb98b2

feat(cookbook): pure BM25 tokenizer + scorer

f99ad62

feat(cookbook): snippet loader with frontmatter + tag validation

05e399d

feat(cookbook): public search() with score floor + k clamping

637d350

feat(cookbook): 12 starter pattern snippets

1bad139

fix(cookbook): use 3-backtick fences in snippet bodies

e81e0cd

fix(cookbook): drop no-op translate from mirror-half-part snippet

ccf7387

feat(cookbook): cookbook:validate npm script

c41b6d7

feat(cookbook): cookbook:evaluate gate (every body must kernelcad eva…

96884b2

…luate clean)

test(cookbook): snapshot top-3 IDs for 5 hand-picked queries

c0b67d7

feat(cookbook): SKILL.md cookbook index generator

b418e21

fix(cookbook): use plan-verbatim empty-cookbook placeholder text

1f20e5b

revert: keep '_(empty)_' placeholder so plan-verbatim test still passes

3bc40a8

feat(cookbook): SKILL.md cookbook index — markers + generated section

89043b8

feat(mcp): lookup_cookbook tool — BM25 retrieval over cookbook

a3561bd

feat(eval): cookbook_inject TranscriptEvent + renderer

7e4bbc5

feat(eval): AgentClient supports optional systemAddendum (separate ca…

8e6af46

…che block)

feat(eval): cookbook injector — wraps search() for the harness

ab88edb

feat(eval): --cookbook flag wires per-task injection into runner

8729736

test(eval): A/B golden — bracket-holes identical with/without --cookbook

06bb236

chore(eval): eval:ab script — runs suite twice, prints score delta

1f6e179

fix(eval): eval:ab forwards argv to inner runs (e.g. --mock, task name)

c9fa596

docs(changelog): cookbook v1 — workstream #22

fab2c44

w1ne merged commit bbc4a9c into develop May 3, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v0.21.x): cookbook v1 — BM25 retrieval over 12 curated kCAD patterns + lookup_cookbook MCP tool + eval --cookbook flag (#22)#59

feat(v0.21.x): cookbook v1 — BM25 retrieval over 12 curated kCAD patterns + lookup_cookbook MCP tool + eval --cookbook flag (#22)#59
w1ne merged 26 commits intodevelopfrom
feat/cookbook-v1

w1ne commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w1ne commented May 3, 2026

Summary

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant