feat: ship feature-flow:consult-codex Phase 1+2 (opt-in) (#235) by uta2000 · Pull Request #240 · uta2000/feature-flow

uta2000 · 2026-04-17T17:07:04Z

Summary

Adds the feature-flow:consult-codex skill — a Claude-orchestrated four-phase flow that consults the existing codex MCP server as a second AI opinion at lifecycle checkpoints. Opt-in by default (codex.enabled: false), so existing installs see no behavioral change.

Closes #235.

What ships in Phase 1+2

feature-flow:consult-codex skill — orchestration backbone (consult.js CLI with start / record-response / verdict subcommands), state module (atomic JSON writes + GC), config loader (regex YAML, no new deps), model fallback chain, brief skeleton builder (12 KB UTF-8 byte cap), record-exchange (state + Markdown log with pending-section rewrite), and the review-design proactive mode.
PreToolUse verdict-gate.js hook — fail-open Skill gate that BLOCKs non-verdict calls when a strict consultation is pending. Wired in for the deferred stuck mode; soft tier ships without it.
feature-flow:design-document integration — new "Optional codex review" step gated on codex.enabled AND codex.proactive_reviews.design_doc. No-op when codex is off.
Config template — commented codex: block in skills/start/references/project-context.md and the dogfooding .feature-flow.yml so users discover the option.

Architecture (load-bearing)

Claude-orchestrated four-phase flow. MCP tools like mcp__codex__codex are only callable from Claude's own context, not subprocess scripts — so the skill is an orchestration GUIDE Claude follows, not a monolithic runner:

consult.js start (subprocess) — builds brief, returns JSON, does NOT touch state
Claude calls mcp__codex__codex directly (not subprocess), per the SKILL.md error-classification table
consult.js record-response (subprocess, stdin-driven) — appends consultation to state, writes log, increments budget
consult.js verdict (separate Skill call) — records decision, rewrites pending log section

Budget is incremented in Phase 3, NOT Phase 1, so failed MCP calls don't burn budget.

Tiered verdict enforcement

Soft tier (proactive review-design): single-shot reminder; missing verdict surfaces as <not recorded> PR audit defect.
Strict tier (reactive stuck, deferred to follow-up plan): PreToolUse hook BLOCKs non-verdict Skill calls until verdict is recorded.

Out of scope (deferred follow-ups)

Phase 3: review-plan mode (after verify-plan-criteria)
Phase 4: review-code mode + PR feature-flow-metadata integration
Phase 5: stuck mode + signal collector + escape-hatch dispatcher

Quality gates passed

24 commits on the branch with two-stage subagent reviews (spec-compliance + code-quality) on every implementation task
85 test assertions across 9 test files — state (13), config (10), resolve-model (6), build-brief (6), record-exchange (5), consult (13), review-design (8), smoke e2e (14), verdict-gate (10)
4-reviewer pre-push panel (security, architecture, silent-failure, test-coverage) — all blockers fixed and re-verified:
- Security: stdin size cap, path traversal containment in loadDoc
- Architecture: setMetadata self-init bug fixed (broke first-use happy path); session-id default harmonized with consult.js CLI to prevent silent GC wipe
- Silent-failure: stderr observability on F4-F8 paths
- Test coverage: verdict-gate fail-open paths, dispatcher path-traversal, CLI shell-invocation, surrogate-pair truncation, hand-edited log
Architecture refactor: extracted state.peek() to consolidate three duplicate raw-fs reads in consult.js start

Real bugs caught and fixed during review

T1: cryptic ENOENT in mutators when load() was forgotten → readOrThrow helper
T2: YAML loader retained quotes around values (would've broken T3 model lookup) → stripValue helper
T5: 12 KB byte cap overshoot on multi-byte UTF-8 codepoint boundary → trailing \uFFFD strip
T7: start swallowed corrupt-state silently → stderr warning; error path bypassed state.load → routed through it
T8: Phase 3 echo piping broke on apostrophes/backticks in Codex response → quoted heredoc; error status undocumented → added
T10: same UTF-8 boundary bug as T5; load-bearing T7/T10 integration gap (transientState didn't forward design_doc_path) → bridged via persisted-state read
Pre-push panel: setMetadata self-init session-id used Date.now(), mismatched consult.js's cli-session default → would silently GC the metadata; harmonized

Test plan

All 85 unit/integration test assertions pass locally (for t in skills/consult-codex/scripts/*.test.js skills/consult-codex/scripts/modes/*.test.js hooks/scripts/verdict-gate.test.js; do node "$t" || exit 1; done)
End-to-end smoke test (smoke.test.js) exercises four-phase orchestration with simulated MCP result
Verdict-gate hook tested for fail-open paths, args-regex tolerance, and skill-name mismatch BLOCK
Plugin minor-bumped to 1.36.0 (additive opt-in, no breaking changes)
Rebased onto current main (resolved one CHANGELOG conflict, kept both ### Added entries — quick-path start: add quick-path triage with code-aware scope confirmation #234 and codex)
Manual live-codex smoke test (spec sub-step 12.4): completed against real mcp__codex__codex MCP server in isolated temp worktree — all four phases passed end-to-end, log/state invariants verified, Codex returned substantive design review (not rubber stamp). Result documented on #235.

Known minor follow-ups (do not block merge)

The codex example block was added to dogfooding .feature-flow.yml and skills/start/references/project-context.md, but NOT to references/project-context-schema.md (the canonical schema doc). Adding it there improves discoverability for users browsing the schema.
plugin.json version bump reformatted the previously-inline keywords array onto multiple lines (spec-prescribed JSON.stringify(j,null,2); cosmetic noise).

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Addresses code-quality review of T1: mutators now fail with a clear 'state not initialized' error if load() was never called, and tests cover both that path and the existing setVerdict unknown-id throw.

T2 review found that quoted string values (model: "gpt-5.2") were passed through to callers with embedded quotes, which would silently break T3 model resolution. Adds stripValue helper. Also renames the malformed-yaml test to reflect that it tests throw-safety (the loader is intentionally lenient and silently drops unrecognized lines), and removes a dead currentKey variable.

T5 review found that Buffer slice mid-codepoint produced a U+FFFD replacement that overran the cap by 1-2 bytes. Strip the trailing replacement char so the result stays <= MAX_BRIEF_BYTES. Tighten the truncation test to use Buffer.byteLength and add a multi-byte case. Also fix the constraint-block test count (4->5) and add the missing wrong-goal substring check.

…isolation T6 review found that briefExcerpt would crash on non-string brief inputs (caller-side guard only handled falsy values), and there was no test pinning the multi-consultation rewrite isolation invariant that the rewrite path silently relies on. Add a String() coercion guard, an isolation test, and extract the PENDING_MARKER constant so the search and slice paths share one source of truth.

…mands

…d budget-exhaustion test T7 review found that the recordResponse error path bypassed state.load (skipping recovery branches and inconsistent with the rest of the codebase) and that start silently swallowed corrupt-state errors, which would let every consultation get a 'first call free' on a broken state file. Routes the error-path read through state.load, adds stderr warnings on the two corrupt-state catches in start, drops the unused sessionId destructure, hoists fs to the top of the file, and adds the missing budget-exhaustion cross-subcommand test.

…error status T8 review found that Phase 3's echo-pipe pattern would break on apostrophes/backticks in Codex's response, the error status was undocumented for both subprocess phases, and the references files listed deferred features without a deferred-status label. Replace echo with a quoted heredoc, add the error status bullet to both phases, and label modes.md / escape-hatch.md deferred features.

T10 review found that loadDoc had the same UTF-8 boundary overrun as T5's build-brief, the 10000-char truncation assertion was unsatisfiable against the 10240-byte cap, and the new T10 buildInputs throw broke the dispatcher's start path because consult.js's transientState didn't forward design_doc_path from the persisted state file. Strip trailing U+FFFD on doc truncation, switch the assertion to byte length + add a multi-byte case, read design_doc_path into transientState (read-only, preserves the no-touch-state invariant), and update the 2 dispatcher tests to set up a design doc and metadata.

…ersion bump

…din cap, observability Panel review found two blockers and three cheap hardening wins: - setMetadata threw on missing state, breaking the design-document integration's first-use happy path. Self-initialize using env-var defaults that match the consult.js CLI wrapper. - loadDoc had no path-traversal containment, so a design_doc_path typo could silently exfiltrate ~/.aws/credentials or ~/.ssh/id_rsa via Codex. Reject paths outside the worktree. - Unbounded stdin in consult.js could OOM on a runaway 100MB Codex response. Cap at 2 MB and convert overflow to codex_call_failed. - verdict-gate's outer catch had zero observability — strict-tier enforcement could silently break. Add stderr warning. - config.js silently treated 'enabled: tru' (typo) as disabled with no feedback. Warn to stderr when the value is unparseable.

Architecture re-review caught a second-order bug in the prior fix: setMetadata's auto-init used auto-${Date.now()} when env vars were absent, but consult.js's CLI wrapper defaults to 'cli-session'. The mismatch silently triggered GC on the next state.load, wiping the design_doc_path the design-document integration just wrote. Use the same 'cli-session' default and add a regression test that proves the no-env-var integration path survives a consult.js-style load.

Closes the remaining panel follow-ups before push: - Extract state.peek helper to consolidate consult.js start's three duplicate raw-fs reads behind one read-only path with consistent corrupt-state handling. - Forward introspect error message via resolved.detail so operators see why model resolution failed instead of bare 'model_unresolvable'. - Persist mcpResult.error.detail in state.consultations[].error_detail for post-mortem audit (log render deferred — state is canonical). - Tag rewritePendingVerdict no-op branches (header / pending marker missing) with stderr warnings so log/state divergence is visible. - Tag verdict-gate's loadState and JSON.parse catches with stderr (kept readStdinSync silent — empty stdin is normal). - Tag state.js .bak rename with stderr so silent history wipes are traceable. - Widen verdict-gate args regex to tolerate leading whitespace and --id=c1 form. - Add 12 new test assertions (1 resolve-model, 3 consult, 6 verdict-gate, 1 review-design surrogate-pair, 1 record-exchange hand-edit). Total suite: 85 assertions across 9 files.

…236 #238 #239 Aligns version fields across .claude-plugin/plugin.json (already at 1.36.0 from #240), .claude-plugin/marketplace.json, and .feature-flow.yml. Cuts the [1.36.0] - 2026-04-17 release heading in CHANGELOG.md bundling the five features shipped today: - #238 Scope-critique pass for design-verification (PR #243, this PR) - #236 Advisor tool integration (PR #242) - #239 Senior developer panel, Phase 1c code review (PR #241) - #235 Codex consultation, Phase 1+2 opt-in (PR #240) - #234 Quick-path triage with code-aware scope confirmation (PR #237) Consolidates .changelogs/238.md fragment into CHANGELOG.md (per the Ship phase convention) and deletes the fragment. Updates README with a "What's New in 1.36.0" section listing all five PRs, adds a Codex Consultation subsection under Optional Enhancements, adds a Quick-path triage subsection, and updates the Skills table to include consult-codex and the scope-critique pass note for design-verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add implementation plan for scope-critique-pass (#238) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design-verification): add scope-critique pass (#238) Adds a ## Scope critique section to skills/design-verification/SKILL.md and a supporting reference file with five strategic-shape questions (scope, dependencies, simpler alternatives, observability, config surface) that run after the existing compatibility check to catch oversized scope, phantom dependencies, and unobservable capability bets before a design reaches create-issue. Motivated by #236 post-creation review: 6 strategic issues bypassed existing brainstorming, design-verification, and self-review gates. The new pass fills that gap with a bias-resistance checklist, red-flag lists per question, and a findings table that promotes BLOCKER rows to the top-level Blockers list. Closes #238 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add changelog fragment for #238 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 1.36.0 and consolidate changelog for #234 #235 #236 #238 #239 Aligns version fields across .claude-plugin/plugin.json (already at 1.36.0 from #240), .claude-plugin/marketplace.json, and .feature-flow.yml. Cuts the [1.36.0] - 2026-04-17 release heading in CHANGELOG.md bundling the five features shipped today: - #238 Scope-critique pass for design-verification (PR #243, this PR) - #236 Advisor tool integration (PR #242) - #239 Senior developer panel, Phase 1c code review (PR #241) - #235 Codex consultation, Phase 1+2 opt-in (PR #240) - #234 Quick-path triage with code-aware scope confirmation (PR #237) Consolidates .changelogs/238.md fragment into CHANGELOG.md (per the Ship phase convention) and deletes the fragment. Updates README with a "What's New in 1.36.0" section listing all five PRs, adds a Codex Consultation subsection under Optional Enhancements, adds a Quick-path triage subsection, and updates the Skills table to include consult-codex and the scope-critique pass note for design-verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

uta2000 and others added 24 commits April 17, 2026 13:04

feat: add consult-codex state module with atomic writes and GC

fe8c5fc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: harden state.js mutators with readOrThrow + add error-path tests

9cb1f8c

Addresses code-quality review of T1: mutators now fail with a clear 'state not initialized' error if load() was never called, and tests cover both that path and the existing setVerdict unknown-id throw.

chore: drop unused 's' binding in setMetadata test

8065029

feat: add codex config loader with regex-parsed YAML subset

b38532f

feat: add codex model fallback chain (config → introspection → null)

0960450

feat: add consult-codex brief skeleton builder with 12 KB cap

14b0b43

feat: add record-exchange for state + codex-log.md append

5ae006d

chore: drop unused c1 binding in isolation test

1d91de4

feat: add consult-codex CLI with start/record-response/verdict subcom…

afe09af

…mands

docs: add consult-codex SKILL.md with tiered enforcement guidance

bca7e41

feat: add verdict-gate PreToolUse hook for strict codex consultations

c1d4c4a

feat: implement review-design mode brief builder with doc truncation

b4cea80

feat: integrate consult-codex review-design into design-document skill

a742e77

test: add end-to-end smoke test for review-design consultation

d9cf255

feat: ship codex consultation Phase 1+2 (opt-in) with CHANGELOG and v…

d9c3c0e

…ersion bump

uta2000 mentioned this pull request Apr 17, 2026

codex: consult codex MCP at 4 lifecycle checkpoints #235

Closed

uta2000 merged commit 2c289be into main Apr 17, 2026

uta2000 mentioned this pull request Apr 17, 2026

advisor: tiered stuck mode + soft checkpoints + settings hint #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ship feature-flow:consult-codex Phase 1+2 (opt-in) (#235)#240

feat: ship feature-flow:consult-codex Phase 1+2 (opt-in) (#235)#240
uta2000 merged 24 commits intomainfrom
feat/codex-consultation

uta2000 commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uta2000 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What ships in Phase 1+2

Architecture (load-bearing)

Tiered verdict enforcement

Out of scope (deferred follow-ups)

Quality gates passed

Real bugs caught and fixed during review

Test plan

Known minor follow-ups (do not block merge)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

uta2000 commented Apr 17, 2026 •

edited

Loading