feat: ship feature-flow:consult-codex Phase 1+2 (opt-in) (#235)#240
Merged
feat: ship feature-flow:consult-codex Phase 1+2 (opt-in) (#235)#240
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses code-quality review of T1: mutators now fail with a clear 'state not initialized' error if load() was never called, and tests cover both that path and the existing setVerdict unknown-id throw.
T2 review found that quoted string values (model: "gpt-5.2") were passed through to callers with embedded quotes, which would silently break T3 model resolution. Adds stripValue helper. Also renames the malformed-yaml test to reflect that it tests throw-safety (the loader is intentionally lenient and silently drops unrecognized lines), and removes a dead currentKey variable.
T5 review found that Buffer slice mid-codepoint produced a U+FFFD replacement that overran the cap by 1-2 bytes. Strip the trailing replacement char so the result stays <= MAX_BRIEF_BYTES. Tighten the truncation test to use Buffer.byteLength and add a multi-byte case. Also fix the constraint-block test count (4->5) and add the missing wrong-goal substring check.
…isolation T6 review found that briefExcerpt would crash on non-string brief inputs (caller-side guard only handled falsy values), and there was no test pinning the multi-consultation rewrite isolation invariant that the rewrite path silently relies on. Add a String() coercion guard, an isolation test, and extract the PENDING_MARKER constant so the search and slice paths share one source of truth.
…d budget-exhaustion test T7 review found that the recordResponse error path bypassed state.load (skipping recovery branches and inconsistent with the rest of the codebase) and that start silently swallowed corrupt-state errors, which would let every consultation get a 'first call free' on a broken state file. Routes the error-path read through state.load, adds stderr warnings on the two corrupt-state catches in start, drops the unused sessionId destructure, hoists fs to the top of the file, and adds the missing budget-exhaustion cross-subcommand test.
…error status T8 review found that Phase 3's echo-pipe pattern would break on apostrophes/backticks in Codex's response, the error status was undocumented for both subprocess phases, and the references files listed deferred features without a deferred-status label. Replace echo with a quoted heredoc, add the error status bullet to both phases, and label modes.md / escape-hatch.md deferred features.
T10 review found that loadDoc had the same UTF-8 boundary overrun as T5's build-brief, the 10000-char truncation assertion was unsatisfiable against the 10240-byte cap, and the new T10 buildInputs throw broke the dispatcher's start path because consult.js's transientState didn't forward design_doc_path from the persisted state file. Strip trailing U+FFFD on doc truncation, switch the assertion to byte length + add a multi-byte case, read design_doc_path into transientState (read-only, preserves the no-touch-state invariant), and update the 2 dispatcher tests to set up a design doc and metadata.
…din cap, observability Panel review found two blockers and three cheap hardening wins: - setMetadata threw on missing state, breaking the design-document integration's first-use happy path. Self-initialize using env-var defaults that match the consult.js CLI wrapper. - loadDoc had no path-traversal containment, so a design_doc_path typo could silently exfiltrate ~/.aws/credentials or ~/.ssh/id_rsa via Codex. Reject paths outside the worktree. - Unbounded stdin in consult.js could OOM on a runaway 100MB Codex response. Cap at 2 MB and convert overflow to codex_call_failed. - verdict-gate's outer catch had zero observability — strict-tier enforcement could silently break. Add stderr warning. - config.js silently treated 'enabled: tru' (typo) as disabled with no feedback. Warn to stderr when the value is unparseable.
Architecture re-review caught a second-order bug in the prior fix:
setMetadata's auto-init used auto-${Date.now()} when env vars were
absent, but consult.js's CLI wrapper defaults to 'cli-session'. The
mismatch silently triggered GC on the next state.load, wiping the
design_doc_path the design-document integration just wrote. Use the
same 'cli-session' default and add a regression test that proves the
no-env-var integration path survives a consult.js-style load.
Closes the remaining panel follow-ups before push: - Extract state.peek helper to consolidate consult.js start's three duplicate raw-fs reads behind one read-only path with consistent corrupt-state handling. - Forward introspect error message via resolved.detail so operators see why model resolution failed instead of bare 'model_unresolvable'. - Persist mcpResult.error.detail in state.consultations[].error_detail for post-mortem audit (log render deferred — state is canonical). - Tag rewritePendingVerdict no-op branches (header / pending marker missing) with stderr warnings so log/state divergence is visible. - Tag verdict-gate's loadState and JSON.parse catches with stderr (kept readStdinSync silent — empty stdin is normal). - Tag state.js .bak rename with stderr so silent history wipes are traceable. - Widen verdict-gate args regex to tolerate leading whitespace and --id=c1 form. - Add 12 new test assertions (1 resolve-model, 3 consult, 6 verdict-gate, 1 review-design surrogate-pair, 1 record-exchange hand-edit). Total suite: 85 assertions across 9 files.
uta2000
added a commit
that referenced
this pull request
Apr 17, 2026
…236 #238 #239 Aligns version fields across .claude-plugin/plugin.json (already at 1.36.0 from #240), .claude-plugin/marketplace.json, and .feature-flow.yml. Cuts the [1.36.0] - 2026-04-17 release heading in CHANGELOG.md bundling the five features shipped today: - #238 Scope-critique pass for design-verification (PR #243, this PR) - #236 Advisor tool integration (PR #242) - #239 Senior developer panel, Phase 1c code review (PR #241) - #235 Codex consultation, Phase 1+2 opt-in (PR #240) - #234 Quick-path triage with code-aware scope confirmation (PR #237) Consolidates .changelogs/238.md fragment into CHANGELOG.md (per the Ship phase convention) and deletes the fragment. Updates README with a "What's New in 1.36.0" section listing all five PRs, adds a Codex Consultation subsection under Optional Enhancements, adds a Quick-path triage subsection, and updates the Skills table to include consult-codex and the scope-critique pass note for design-verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
uta2000
added a commit
that referenced
this pull request
Apr 17, 2026
* docs: add implementation plan for scope-critique-pass (#238) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design-verification): add scope-critique pass (#238) Adds a ## Scope critique section to skills/design-verification/SKILL.md and a supporting reference file with five strategic-shape questions (scope, dependencies, simpler alternatives, observability, config surface) that run after the existing compatibility check to catch oversized scope, phantom dependencies, and unobservable capability bets before a design reaches create-issue. Motivated by #236 post-creation review: 6 strategic issues bypassed existing brainstorming, design-verification, and self-review gates. The new pass fills that gap with a bias-resistance checklist, red-flag lists per question, and a findings table that promotes BLOCKER rows to the top-level Blockers list. Closes #238 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add changelog fragment for #238 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 1.36.0 and consolidate changelog for #234 #235 #236 #238 #239 Aligns version fields across .claude-plugin/plugin.json (already at 1.36.0 from #240), .claude-plugin/marketplace.json, and .feature-flow.yml. Cuts the [1.36.0] - 2026-04-17 release heading in CHANGELOG.md bundling the five features shipped today: - #238 Scope-critique pass for design-verification (PR #243, this PR) - #236 Advisor tool integration (PR #242) - #239 Senior developer panel, Phase 1c code review (PR #241) - #235 Codex consultation, Phase 1+2 opt-in (PR #240) - #234 Quick-path triage with code-aware scope confirmation (PR #237) Consolidates .changelogs/238.md fragment into CHANGELOG.md (per the Ship phase convention) and deletes the fragment. Updates README with a "What's New in 1.36.0" section listing all five PRs, adds a Codex Consultation subsection under Optional Enhancements, adds a Quick-path triage subsection, and updates the Skills table to include consult-codex and the scope-critique pass note for design-verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
feature-flow:consult-codexskill — a Claude-orchestrated four-phase flow that consults the existingcodexMCP server as a second AI opinion at lifecycle checkpoints. Opt-in by default (codex.enabled: false), so existing installs see no behavioral change.Closes #235.
What ships in Phase 1+2
feature-flow:consult-codexskill — orchestration backbone (consult.jsCLI withstart/record-response/verdictsubcommands), state module (atomic JSON writes + GC), config loader (regex YAML, no new deps), model fallback chain, brief skeleton builder (12 KB UTF-8 byte cap), record-exchange (state + Markdown log with pending-section rewrite), and thereview-designproactive mode.verdict-gate.jshook — fail-open Skill gate that BLOCKs non-verdict calls when a strict consultation is pending. Wired in for the deferredstuckmode; soft tier ships without it.feature-flow:design-documentintegration — new "Optional codex review" step gated oncodex.enabledANDcodex.proactive_reviews.design_doc. No-op when codex is off.codex:block inskills/start/references/project-context.mdand the dogfooding.feature-flow.ymlso users discover the option.Architecture (load-bearing)
Claude-orchestrated four-phase flow. MCP tools like
mcp__codex__codexare only callable from Claude's own context, not subprocess scripts — so the skill is an orchestration GUIDE Claude follows, not a monolithic runner:consult.js start(subprocess) — builds brief, returns JSON, does NOT touch statemcp__codex__codexdirectly (not subprocess), per the SKILL.md error-classification tableconsult.js record-response(subprocess, stdin-driven) — appends consultation to state, writes log, increments budgetconsult.js verdict(separate Skill call) — records decision, rewrites pending log sectionBudget is incremented in Phase 3, NOT Phase 1, so failed MCP calls don't burn budget.
Tiered verdict enforcement
review-design): single-shot reminder; missing verdict surfaces as<not recorded>PR audit defect.stuck, deferred to follow-up plan): PreToolUse hook BLOCKs non-verdict Skill calls until verdict is recorded.Out of scope (deferred follow-ups)
review-planmode (afterverify-plan-criteria)review-codemode + PRfeature-flow-metadataintegrationstuckmode + signal collector + escape-hatch dispatcherQuality gates passed
loadDocsetMetadataself-init bug fixed (broke first-use happy path); session-id default harmonized with consult.js CLI to prevent silent GC wipeverdict-gatefail-open paths, dispatcher path-traversal, CLI shell-invocation, surrogate-pair truncation, hand-edited logstate.peek()to consolidate three duplicate raw-fs reads inconsult.js startReal bugs caught and fixed during review
load()was forgotten →readOrThrowhelperstripValuehelper\uFFFDstripstartswallowed corrupt-state silently → stderr warning; error path bypassedstate.load→ routed through itechopiping broke on apostrophes/backticks in Codex response → quoted heredoc;errorstatus undocumented → addeddesign_doc_path) → bridged via persisted-state readsetMetadataself-init session-id usedDate.now(), mismatched consult.js'scli-sessiondefault → would silently GC the metadata; harmonizedTest plan
for t in skills/consult-codex/scripts/*.test.js skills/consult-codex/scripts/modes/*.test.js hooks/scripts/verdict-gate.test.js; do node "$t" || exit 1; done)smoke.test.js) exercises four-phase orchestration with simulated MCP resultmain(resolved one CHANGELOG conflict, kept both### Addedentries — quick-path start: add quick-path triage with code-aware scope confirmation #234 and codex)mcp__codex__codexMCP server in isolated temp worktree — all four phases passed end-to-end, log/state invariants verified, Codex returned substantive design review (not rubber stamp). Result documented on #235.Known minor follow-ups (do not block merge)
.feature-flow.ymlandskills/start/references/project-context.md, but NOT toreferences/project-context-schema.md(the canonical schema doc). Adding it there improves discoverability for users browsing the schema.plugin.jsonversion bump reformatted the previously-inlinekeywordsarray onto multiple lines (spec-prescribedJSON.stringify(j,null,2); cosmetic noise).🤖 Generated with Claude Code