Skip to content

feat(agent): codex adapter stage 2 with documented scope expansion#154

Merged
winoooops merged 23 commits into
mainfrom
feat/codex-adapter-stage-2-impl
May 4, 2026
Merged

feat(agent): codex adapter stage 2 with documented scope expansion#154
winoooops merged 23 commits into
mainfrom
feat/codex-adapter-stage-2-impl

Conversation

@winoooops
Copy link
Copy Markdown
Owner

Summary

  • Codex Adapter Stage 2 lands behind the existing AgentAdapter trait so a PTY running codex populates the same status panel as a Claude session — model, context window (driven by last_token_usage), 5h/7d rate limits, durations. SQLite-primary session locator (schema-driven logs / threads discovery, named-placeholder tuple comparison on (ts, ts_nanos) >= pty_start) with FS-scan fallback. New rusqlite (bundled). cost.total_cost_usd: f64 → Option<f64> with frontend override + BudgetMetrics null rendering. ManagedSession.started_at + BindContext + bounded start_for retry on BindError::Pending. Six rollout JSONL fixtures pin the spec's locked parser rules.
  • Three deviations from the spec are explicitly ratified in docs/decisions/2026-05-04-codex-adapter-stage-2-scope-expansion.md: the codex transcript tailer landed in v1 (originally Non-Goal chore: reset harness files to placeholders #2; reuses claude_code/test_runners/* to emit AgentToolCallEvent / AgentTurnEvent / test-run signals); /proc/<pid>/fd/* and /proc/<pid>/cmdline contribute Linux fast-paths when the SQLite logs query returns no rows (every fd candidate cross-checked against threads.rollout_path); BindContext.pid carries the detected agent PID, not the shell PID, because Codex's logs.process_uuid indexes by the codex child PID.
  • Spec, plan, CHANGELOG, and ADR index all updated so the docs match the code; the spec's frontmatter status is now "Implemented (with documented scope expansion)" and the four points the implementation diverged are amended in place with cross-references to the ADR.

Documents

Followups tracked in the new ADR's "Known risks & mitigations"

Not blocking this PR; logged so they aren't lost:

  • Schema-drift dispatch in CompositeLocator::resolve_rollout matches on substring "schema drift" in the Unresolved reason field. Fragile to refactors. Refactor to typed LocatorError::SchemaDrift variant.
  • ValidateTranscriptError::OutsideRoot Display still says "outside Claude directory" even when triggered by codex's validate_transcript_path. Cosmetic; rename to "outside agent root".
  • Mutex<Option<PathBuf>> lifecycle invariant in CodexAdapter (status_source writes, parse_status reads). Holds by construction because each attach gets its own Arc<CodexAdapter> via for_type; document in the struct's doc-comment.

Test plan

  • cd src-tauri && cargo test --workspace --all-features passes locally.
  • npm run test passes locally (vitest).
  • npm run lint and npm run type-check clean.
  • npm run tauri:dev — fresh codex session populates the status panel (model gpt-5.x, context window, 5h/7d rate limits, cost row shows ).
  • codex resume --last from a fresh terminal pane populates the panel within ~1s from rolled-up history.
  • Activity feed fires for codex tool calls (function_call / function_call_output / custom_tool_call*).
  • Test runs surface from codex exec_command_end / patch_apply_end.
  • Claude session in another terminal pane unaffected; cost row shows \$x.yz with no regression.
  • Rapid tab switching between agent sessions doesn't trigger overlapping start_agent_watcher calls (covered by the new useAgentStatus watcherStartInFlightRef guard).

🤖 Generated with Claude Code

winoooops and others added 18 commits May 3, 2026 23:13
Design for CodexAdapter implementation against the AgentAdapter trait
landed in Stage 1. Covers the SQLite-first session locator (with
schema-driven DB discovery + FS-scan fallback), the rollout JSONL
parser shape, the IPC contract bump for cost.total_cost_usd:
Option<f64>, and deferred follow-up criteria for cross-adapter parser
helper extraction per the 2026-05-03 ADR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four targeted fixes from the review:

1. context_window mapping switched from total_token_usage (lifetime
   spend, would pin gauge at 100% on long sessions) to last_token_usage
   so the "CURRENT CONTEXT" gauge reflects actual context size. Added
   task_started.model_context_window fallback for the early-session
   window before the first token_count.info arrives. Pinned by a new
   long-session regression test + fixture.

2. LocatorError::Fatal corrected so missing-schema is Ok(None) routing
   to FS fallback, not an immediate Fatal. Added a "Fatal precedence"
   subsection. Fatal now reserved for SQLite + FS fallback both
   exhausted permanently.

3. Explicit partial-update rule for token_count events: payload.info =
   null preserves prior context state and only absorbs rate_limits;
   payload.rate_limits = null does the inverse. Pinned by a new
   info-null fixture and unit test.

4. total_api_duration_ms emits 0 (was: aliased to total_duration_ms).
   Aliasing would print a wrong "API Time" cell in BudgetMetrics. A
   future Option<u64> bump is tracked as a follow-up so the UI can
   render "—" instead of "0ms".

Plus a layered-test split for the locator: locator-internal asserts
LocatorError variants; adapter-level asserts the LocatorError →
BindError mapping; orchestration-level (start_for retry) was already
correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from cycle-2 review:

1. HIGH — logs.ts is verified Unix seconds with subsecond precision
   in ts_nanos, not milliseconds. Earlier draft annotated the gate as
   "ms-since-epoch" which would always return zero rows. Updated the
   SQL sketch to bind (pty_start_secs, pty_start_nanos) and use a
   tuple comparison matching the existing idx_logs_ts index ordering.

2. MEDIUM — Reconciled the FS-fallback exhaustion rule. Zero matches
   on the FS scan is unconditionally NotYetReady (transient — the
   rollout file isn't on disk yet); Fatal in the FS path is only
   I/O errors (permission denied, unreadable file). New "Fatal
   precedence" decision tree spells out which empty result maps to
   which LocatorError variant. Test bullets updated to match the
   single locked rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier sketch mixed an anonymous ? for the PID predicate with
explicit ?1/?2 for the (ts, ts_nanos) tuple. In SQLite the anonymous
? aliases to the next sequential numbered slot starting at 1, so the
PID predicate and pty_start_secs would share slot 1 — a literal
implementation would bind the timestamp value into the PID filter and
misbind the attach path.

Switched to named placeholders (:pid, :pty_start_secs,
:pty_start_nanos) which rusqlite supports and the spec reads more
clearly. Added an explicit warning against mixing anonymous and
numbered placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier the primary bind path step 4 said missing threads row →
LocatorError::Unresolved, but the Fatal-precedence decision tree
said empty SQLite → NotYetReady. The two branches map to opposite
frontend behaviors: Unresolved → BindError::Fatal short-circuits
the start_for retry; NotYetReady → BindError::Pending stays in the
retry budget.

Codex commits the logs row before the corresponding threads row
during session bootstrap, so the gap between them is a race-
transient. Locking the rule: zero rows on either SQLite query is
NotYetReady, period. Unresolved is reserved for the FS fallback's
multi-candidate ambiguity case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier paragraph contradicted itself: said discovery was "cached for
the lifetime of the adapter instance" and then "v1 just resolves
once per status_source call". Those are different designs.

Locked: memoize via OnceLock on the CodexAdapter struct, populated
on first status_source call, reused inside start_for's retry loop.
Each attach constructs a fresh Arc<CodexAdapter> via for_type, so
the cache scope is one attach — discovery re-runs across attaches.
Trade-off accepted because the per-attach scan is a few ms and the
local-only cache scope avoids any shared-state staleness questions.
A process-global cache is a profiling-driven follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 tasks executing the spec at
docs/superpowers/specs/2026-05-03-codex-adapter-stage-2-design.md.

Phases:
- A (foundation, Tasks 1-7): rusqlite dep, ManagedSession.started_at,
  CostMetrics IPC bump (Rust + frontend), BindContext/BindError, trait
  signature change, start_for retry loop.
- B (codex module, Tasks 8-15): module skeleton, parser (happy path
  + 6 fixtures covering edge cases), CodexSessionLocator (DB
  discovery, SqliteFirstLocator, FsScanFallback, CompositeLocator),
  CodexAdapter impl with locator memoized via OnceLock, wire into
  for_type.
- C (verification, Tasks 16-17): hook regression test, manual
  end-to-end sign-off (fresh codex, codex resume, Claude regression,
  cold-start tolerance).

Each task is TDD-shaped: failing test → run → minimal impl → run
→ commit. Six rollout JSONL fixtures pin the spec's locked rules
(last_token_usage not lifetime, info-null partial update,
incomplete trailing line, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from cycle-1 review of the implementation plan:

1. HIGH — Task 6 introduced the new adapter.start(..., pid, pty_start)
   call shape but deferred the matching base::start_for signature
   change to Task 7, leaving Task 6 in a guaranteed compile-broken
   state. Restructured: Task 6 now updates trait + Claude/NoOp impls
   + start_agent_watcher + the inherent start AND start_for's
   signature. start_for's body just calls status_source once
   (no retry); Task 7 keeps the signature stable and only adds the
   retry loop body. Both tasks end in a green workspace.

2. MEDIUM — Task 4's failing tests passed `rateLimits` and asserted
   on a Cost cell, but BudgetMetrics' SubscriberVariant (rendered
   when rateLimits is present, applies to Codex AND Claude
   subscribers) has no Cost cell. Only ApiKeyVariant renders cost.
   Reworked the tests to route to ApiKeyVariant (rateLimits: null)
   and added an explicit SubscriberVariant test pinning the
   no-Cost-cell behavior. Step 5 narrowed to a single-line
   formatCost signature change in BudgetMetrics.tsx instead of an
   imagined cross-variant edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add ManagedSession.started_at, expose PtyState::get_started_at, and
thread the timestamp through PTY/session test construction. Also add
rusqlite (bundled) for the upcoming codex SQLite locator.
Rust now models CostMetrics.total_cost_usd as Option<f64>, Claude's
status parser distinguishes missing-vs-present cost blocks, bindings
are regenerated, and the frontend preserves and renders null cost
without coercing it to zero.
Add BindContext/BindError, make AgentAdapter::status_source fallible,
and thread pid plus pty_start through start_agent_watcher and
base::start_for so codex can bind by session metadata instead of a
pure cwd-derived path.
base::start_for now retries BindError::Pending within a 500ms
budget and includes focused tests for pending-then-success and
budget exhaustion. This keeps watcher startup below the frontend's
2s detection poll while tolerating delayed codex metadata commits.
Three deviations from the original Stage 2 spec landed during
implementation: the codex transcript tailer (specced as Non-Goal #2),
/proc-driven Linux fast-paths in the locator (specced as
verifier-only), and BindContext.pid carrying the detected agent PID
rather than the shell PID. Implementation has been manually verified
end-to-end and the deviations are correctness-driven, not stylistic.

This commit ratifies the deviations rather than reverting them:

- New ADR docs/decisions/2026-05-04-codex-adapter-stage-2-scope-
  expansion.md formalizes the scope changes (context, decision,
  alternatives rejected, risks, mitigations).
- docs/superpowers/specs/2026-05-03-codex-adapter-stage-2-design.md
  amended at the four points the implementation diverged: Non-Goal
  #2 (transcript tailer), /proc rule under "Sources to ignore",
  "Optional Linux verifier" section renamed to "Linux fast-paths"
  with the three layered strategies described, and the
  start_agent_watcher BindContext build documented as using the
  detected agent PID. Status moved to "Implemented (with documented
  scope expansion)".
- docs/superpowers/plans/2026-05-04-codex-adapter-stage-2.md notes
  added at Tasks 5 (BindContext.pid semantics), 8 (transcript was
  not stubbed), and 14 (CodexAdapter Mutex<Option<PathBuf>> field
  for transcript-path threading).
- CHANGELOG.md and CHANGELOG.zh-CN.md gain a Phase 4 entry citing
  the spec, plan, ADR, and the scope expansion. ADR index updated.

Out of scope of this commit: the schema-drift string-match dispatch
in CompositeLocator, the "outside Claude directory" Display message
in ValidateTranscriptError::OutsideRoot. Both tracked as follow-up
nits in the new ADR's "Known risks & mitigations" section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5790f9336e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/features/agent-status/hooks/useAgentStatus.ts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟠 [HIGH] Blocking std::thread::sleep on Tokio async executor thread

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/base/mod.rs L81-90
🎯 Confidence: 95%

resolve_status_source_with_retry in base/mod.rs calls std::thread::sleep up to 5 times (100ms each = 500ms max). It is called synchronously — without spawn_blocking — from the async fn start_agent_watcher Tauri command. Tauri async commands run on the Tokio multi-thread executor; blocking a Tokio thread with a synchronous sleep stalls that thread for the full retry budget, reducing the pool's capacity for concurrent operations. Under rapid tab-switching with several Codex sessions starting simultaneously, multiple threads can be stuck sleeping at once, degrading UI responsiveness and potentially queuing other async commands. Fix: wrap resolve_status_source_with_retry in tokio::task::spawn_blocking, or refactor start_for to accept an async sleep so the caller can use tokio::time::sleep().await.

💡 IDEA
  • I — Intent: Retry the bind call a handful of times before the watcher starts, giving Codex a moment to write its first DB row without making the frontend wait.
  • D — Danger: Tokio's async thread pool is finite. 500ms of synchronous sleep on a thread executing an async command starves that thread. If N sessions start simultaneously, N threads are blocked for up to 500ms each, stalling Tauri IPC including event delivery.
  • E — Explain: The retry logic was written synchronously because status_source and start_for are sync functions. Bridging into async would require changing the AgentAdapter trait signature, a larger refactor the author deferred to keep the diff scoped.
  • A — Alternatives: Wrap the entire resolve_status_source_with_retry + start_for body in tokio::task::spawn_blocking from inside start_agent_watcher. This is a one-line fix at the call site that keeps the inner code synchronous. The cleaner long-term fix is making start_for async and using tokio::time::sleep().await.

🟡 [MEDIUM] Schema drift fallback dispatch by fragile string-contains match

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L603-613
🎯 Confidence: 93%

CompositeLocator::resolve_rollout gates its FsScanFallback path on reason.contains("schema drift"). Both the threads-missing and logs-missing cases in SqliteFirstLocator produce Unresolved with that prefix. Any future rename of those strings silently breaks the dispatch: the Unresolved case no longer falls through to the FS scan and instead surfaces as a fatal BindError. The PR's own ADR already tracks this as a known risk; this finding flags it as a correctness risk rather than cosmetic. Fix: add a LocatorError::SchemaDrift variant and match on it.

💡 IDEA
  • I — Intent: Degrade gracefully when the SQLite schema doesn't match expectations: fall back to the FS scan rather than returning a hard error.
  • D — Danger: Any refactor that renames the schema-drift reason strings (extracting to a constant, rewording, adding a prefix) silently disables the fallback with no compiler error or test failure, surfacing a Fatal error to users instead of attempting the FS scan.
  • E — Explain: The Unresolved variant already carried a reason string for logging; reusing it for dispatch avoided adding a new enum arm. The author was aware of the fragility and deferred the fix to a followup.
  • A — Alternatives: Add LocatorError::SchemaDrift (or Unresolved { schema_drift: bool, reason: String }) and match on the typed variant. The change is contained to locator.rs with no external API impact since LocatorError is pub(super).

🔵 [LOW] cfg_attr(test, ts(optional)) produces divergent binding shapes

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/types.rs L94-95
🎯 Confidence: 82%

#[cfg_attr(test, ts(optional))] on total_cost_usd means ts-rs only applies optional during cargo test. Test-build binding: totalCostUsd?: number; non-test build: totalCostUsd: number | null. The committed src/bindings/CostMetrics.ts matches the test shape. If a non-test binding-regen step is ever added to CI or a future feature imports the binding directly (the current code removed that import and uses a local override), it would receive the wrong shape silently. Fix: use #[ts(optional)] unconditionally; since the parent derive(TS) is also behind cfg_attr(test, ...), it has no effect outside test builds anyway.

💡 IDEA
  • I — Intent: Generate totalCostUsd?: number in the TypeScript binding to match the local override in types/index.ts rather than the ts-rs default number | null for Option<T>.
  • D — Danger: Any non-test binding regeneration produces a different file than what is committed. Downstream consumers relying on === undefined guards would silently pass null through.
  • E — Explain: The project gates all derive(TS) behind cfg_attr(test, ...), so binding generation happens only during cargo test. Within that convention the attribute works. The wrapper was added to match the committed binding shape.
  • A — Alternatives: Drop the cfg_attr(test, ...) wrapper and write #[ts(optional)] directly. Since derive(TS) itself is test-gated, the attribute is a no-op in non-test builds, making the unconditional form strictly clearer.

Overall: ⚠️ patch has issues (confidence: 90%)

The PR lands a substantial Codex adapter (SQLite locator, JSONL parser, transcript tailer) with solid test coverage and correctly implements Option<f64> cost propagation across the Rust/TypeScript boundary. The core logic — session binding via SQLite then /proc fast-paths then FS fallback, the watcher-dedup guards in useAgentStatus, and the null cost rendering — is functionally correct. The one blocking concern is architectural: resolve_status_source_with_retry performs up to 500ms of synchronous thread::sleep inside an async fn Tauri command without a spawn_blocking wrapper, which starves Tokio executor threads during concurrent session starts. Two non-blocking issues are also noted: the schema-drift fallback dispatch relying on a string-contains check (explicitly acknowledged in the ADR), and a #[cfg_attr(test, ts(optional))] pattern that generates different binding shapes between test and non-test builds.

The previous Stage 2 zh-CN entry wrapped a continuation line such
that "- FS 扫描回退" landed at column 2, which prettier interpreted as
an unintended sub-list item, mangling the bullet structure on the
following lines. Local lint-staged prettier-write didn't catch it
because of how the line wrapped before staging, but
`prettier --check .` on CI flagged it.

Rewrote the Stage 2 entry as a single flowing paragraph (no internal
line breaks within the parent bullet) plus two clean sub-bullets,
matching the English entry's logical structure. No content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟡 [MEDIUM] Blocking sleep in async start_agent_watcher Tauri command

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/base/mod.rs L63-109
🎯 Confidence: 92%

resolve_status_source_with_retry calls std::thread::sleep(100 ms) up to 5 times inside the synchronous adapter.start(...) call made from the async start_agent_watcher Tauri command. This blocks the tokio worker thread for up to 500 ms. The git-watcher command in the same codebase correctly uses tokio::task::spawn_blocking for its blocking inner function. Fix: wrap adapter.start(...) in tokio::task::spawn_blocking inside start_agent_watcher.

💡 IDEA
  • I — Intent: Author wanted a bounded retry window so the Codex SQLite locator has time to see the first log row before returning an error to the frontend.
  • D — Danger: Calling std::thread::sleep on a tokio worker thread starves other futures scheduled on that thread. With the 5×100 ms worst case, start_agent_watcher can hold a tokio thread for 500 ms. The git-watcher command in the same codebase already shows the correct pattern (tokio::task::spawn_blocking); the inconsistency creates a trap for future contributors.
  • E — Explain: The retry was written as a straightforward blocking loop because start_for is already synchronous. Switching to async would require threading async through the AgentAdapter trait, so std::thread::sleep was the path of least resistance.
  • A — Alternatives: Wrap adapter.start(...) in tokio::task::spawn_blocking(|| adapter.start(...)) — the pattern the git-watcher already uses at src-tauri/src/git/watcher.rs:399. Alternatively, expose an async variant that uses tokio::time::sleep, keeping the synchronous start_for for non-async call sites.

🟡 [MEDIUM] cfg_attr(test, ts(optional)) generates ?: number, not number | null

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/types.rs L93-96
🎯 Confidence: 88%

The #[cfg_attr(test, ts(optional))] attribute is only active during test-mode compilation. When ts-rs generates CostMetrics.ts by running tests, it emits totalCostUsd?: number (may be undefined). But serde always serializes Option<f64>::None as JSON null, not as an absent field — so at runtime consumers receive null where the TypeScript type says number | undefined. Code that type-narrows with === null against a ?: number field is flagged by TypeScript as a dead branch. The hook-level ?? null and the local CostMetrics override in types/index.ts work around it, but any code importing CostMetrics from bindings/CostMetrics.ts directly inherits the wrong type. Fix: either drop ts(optional) entirely (ts-rs then generates the correct number | null) or add #[serde(skip_serializing_if = "Option::is_none")] alongside unconditional #[ts(optional)] so the wire format and type both say "possibly absent".

💡 IDEA
  • I — Intent: Author wanted totalCostUsd optional in TypeScript so Codex cost-free sessions render an em-dash without changing the serde wire format.
  • D — Danger: TypeScript consumers that import CostMetrics from bindings/CostMetrics.ts see ?: number (undefined-or-number) but receive null on the wire. A strict-mode check if (cost.totalCostUsd === null) is typed as dead code, while === undefined is typed correctly but is never true at runtime. If the ts-rs generation step is ever run outside test mode, the regenerated file contradicts the committed version.
  • E — Explain: The cfg_attr(test, ...) pattern for ts-rs is a known workaround because ts-rs generation tests run under #[cfg(test)]. The author used it to produce an optional binding without restructuring serde serialization.
  • A — Alternatives: Drop ts(optional) — ts-rs generates totalCostUsd: number | null, the correct type that matches what serde sends. The hook's ?? null stays valid. Alternatively add #[serde(skip_serializing_if = "Option::is_none")] + unconditional #[ts(optional)] so both wire and type agree the field may be absent.

🔵 [LOW] Dead-code WAL/SHM guard conditions in discover_db filter

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L57-63
🎯 Confidence: 95%

The conditions || name.ends_with(".sqlite-wal") || name.ends_with(".sqlite-shm") are unreachable. A filename ending in .sqlite-wal or .sqlite-shm does not end in .sqlite, so !name.ends_with(".sqlite") is already true and short-circuits before the trailing OR-branches are evaluated. Remove the two redundant clauses.

💡 IDEA
  • I — Intent: Author wanted to document that WAL and SHM sidecar files are excluded, mirroring some SQLite library examples that list sidecar extensions explicitly.
  • D — Danger: No runtime impact — the behaviour is correct. The risk is maintenance: a refactor that restructures the condition may accidentally enable one of the dead branches without noticing it was already unreachable.
  • E — Explain: The conditions were probably written from a mental model of 'also skip these extensions' without noticing that !ends_with(".sqlite") already subsumes them, since .sqlite-wal does not have .sqlite as a suffix.
  • A — Alternatives: Delete the two trailing OR-clauses. The filter becomes a single !name.ends_with(".sqlite") guard which is self-explanatory and avoids the dead-code confusion.

Overall: ⚠️ patch has issues (confidence: 88%)

The Codex adapter stage 2 is a substantial, well-structured addition. The SQLite-primary locator with /proc fast-paths, the transcript tailer reusing claude_code test-runner infrastructure, the bounded retry on BindError::Pending, and the frontend null-cost handling all work correctly end-to-end. The six JSONL fixtures and exhaustive unit tests give strong confidence in the parser and locator logic. Two MEDIUM issues warrant attention before merge: (1) the retry loop in resolve_status_source_with_retry calls std::thread::sleep directly on a tokio executor thread inside the async start_agent_watcher command — the git-watcher command in the same repo uses tokio::task::spawn_blocking as the correct pattern; (2) the cfg_attr(test, ts(optional)) annotation generates a ?: number TypeScript binding that doesn't include null in its union, creating a type mismatch with what serde actually serializes (null for None). The hook-layer normalization (?? null) and the local CostMetrics override in types/index.ts paper over it at runtime, but direct consumers of bindings/CostMetrics.ts see the wrong type. One LOW issue: two unreachable conditions in the discover_db WAL/SHM filter.

Four findings from Claude Code Review and chatgpt-codex-connector:

F1 [Claude MEDIUM] start_agent_watcher blocked the tokio worker via
std::thread::sleep in the synchronous retry chain inside adapter.start.
Wrap adapter.start(...) in tokio::task::spawn_blocking inside the async
Tauri command, mirroring src/git/watcher.rs:399.

F2 [Claude MEDIUM] CostMetrics.total_cost_usd carried
cfg_attr(test, ts(optional)) which made ts-rs emit `totalCostUsd?: number`
even though serde always serializes None as JSON null. Drop the attribute
so the binding emits `number | null`, matching what serde sends.
Regenerated src/bindings/CostMetrics.ts.

F3 [Claude LOW] Removed dead-code WAL/SHM OR-clauses in
discover_db's filename suffix filter. `name.ends_with(".sqlite-wal")`
and `name.ends_with(".sqlite-shm")` were unreachable because
`!name.ends_with(".sqlite")` already excludes them.

F4 [Codex P1] When start_agent_watcher resolved AFTER a newer same-sid
start had already registered a backend watcher, the stale-guard branch
unconditionally called stopWatchers(sid, ptySessionId). Backend stop
keyed only by session ID would tear down the newer generation's still-
active watcher. Refined gate: skip stop ONLY when
(currentSessionIdRef.current === sid) ∧ (gen mismatch) ∧
watcherStartedRef.current — three conjuncts identify the codex-flagged
"newer same-sid registrant succeeded" case and only that case. On every
other bail path (sid switch, unmount, agent exit, our own bumped-gen
with no successor) the stop must still fire — sid-switch and unmount
cleanups can't help when knownPtyIdRef is still unset, so the stale-
resolution branch is the only place that has the captured ptySessionId.
Added regression test pinning the IF-branch.

Codex-Verify: pass_with_deferred_LOW (cycle-1 retry-3)
Deferred: regression test for F4 uses a simplified mock that does not
fully drive the exit/re-detect sequence; it pins the IF-branch but a
follow-up test should drive the full exit/re-detect race per codex's
note. Tracked as test-quality follow-up.

Pattern-Append-Decisions:

- F1 → patterns/tokio-blocking-on-async.md (NEW pattern, category: backend)
- F2 → patterns/generated-artifacts.md (existing, theme: ts-rs binding shape)
- F4 → patterns/async-race-conditions.md (existing, theme: stale async resolution)

Pattern-Files-Touched:
 docs/reviews/patterns/async-race-conditions.md
 docs/reviews/patterns/generated-artifacts.md
 docs/reviews/patterns/tokio-blocking-on-async.md
 docs/reviews/CLAUDE.md

Processed-Claude-Comment-IDs: 4370640003
Processed-Codex-Review-IDs: 4219498038
Processed-Codex-Inline-IDs: 3181102251
Closes-Codex-Threads: PRRT_kwDOR06LW85_UtUP

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟡 [MEDIUM] CompositeLocator schema-drift dispatch uses fragile string matching

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L603-613
🎯 Confidence: 92%

CompositeLocator::resolve_rollout (locator.rs:607) routes to the FS-scan fallback only when the reason string contains("schema drift"). The two emitters of that string are lines 351 and 364. If either is renamed or reformatted, the guard silently falls through to Err(other) instead of delegating to FsScanFallback, giving every affected user a permanent BindError::Fatal instead of graceful degradation. The PR's ADR already identifies this and proposes a typed LocatorError::SchemaDrift variant — that refactor should be tracked with an issue number so it isn't dropped.

💡 IDEA
  • I — Intent: Route to the FS fallback only on schema-drift (no threads/logs table), while propagating all other Unresolved errors as Fatal so callers don't retry genuinely ambiguous multi-candidate results.
  • D — Danger: Any future edit that renames the error strings at lines 351/364 silently disables the FS fallback. The failure mode is a BindError::Fatal with no indication that a working fallback exists. The existing test at line 1044 would also break, but only incidentally — it's not a reliable guard.
  • E — Explain: Distinguishing schema-drift Unresolved from ambiguity Unresolved required no enum change. A typed variant was deferred because the PR was already large and the current strings are stable.
  • A — Alternatives: Add LocatorError::SchemaDrift as the ADR proposes and match on it directly: Err(LocatorError::SchemaDrift) => self.fallback.resolve_rollout(ctx). This makes the dispatch visible to the type system and lets rustc flag missed patterns on future enum additions.

🔵 [LOW] now_iso8601/days_to_date duplicate UTC formatting already in chrono dep

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/transcript.rs L724-757
🎯 Confidence: 88%

transcript.rs defines a 35-line pair of functions (now_iso8601 + days_to_date, lines 724–757) to manually compute a UTC ISO-8601 timestamp. chrono is already in Cargo.toml and imported by the sibling locator.rs. Replace both with chrono::Utc::now().format("%Y-%m-%dT%H:%M:%SZ").to_string(). The custom Gregorian algorithm (days_to_date) has no direct unit tests.

💡 IDEA
  • I — Intent: Provide a last-resort UTC timestamp fallback in extract_timestamp when a JSONL event lacks a timestamp field.
  • D — Danger: The days_to_date algorithm is correct for modern dates but is untested in isolation. A subtle off-by-one would silently produce wrong timestamps in the activity feed. More practically, it is unmaintained complexity that every reader must verify by hand.
  • E — Explain: The author likely wanted to avoid adding a new import to this file, or wasn't aware chrono was already in scope. The algorithm is the standard Proleptic Gregorian implementation by Howard Hinnant, but that attribution is absent.
  • A — Alternatives: use chrono::Utc; Utc::now().format("%Y-%m-%dT%H:%M:%SZ").to_string() — one line, zero new deps. If chrono is ever removed, humantime::format_rfc3339 (already in the workspace) or time::OffsetDateTime are drop-in alternatives.

Overall: ⚠️ patch has issues (confidence: 85%)

The Codex adapter stage 2 implementation is functionally sound: SQLite-first session binding, /proc fast-paths, FS-scan fallback, rollout JSONL parser, transcript tailer, Option cost field, and the watcherStartInFlightRef/generation guard in useAgentStatus are all correctly implemented and well-tested. Two maintainability issues stand out. First, CompositeLocator dispatches to the FS fallback by matching the substring "schema drift" inside a plain string, creating a silent breakage risk if either of the two emitter sites is ever edited — the PR's own ADR flags this but defers the typed-variant refactor without a tracking issue. Second, transcript.rs reimplements UTC date formatting with a bespoke 35-line Gregorian algorithm despite chrono already being a workspace dependency. No security vulnerabilities, logic errors, or Tauri IPC boundary violations were found. The finish_replay concern is moot — TestRunEmitter is explicitly tested as idempotent.

Comment thread src-tauri/src/agent/adapter/claude_code/test_runners/matcher.rs Outdated
Comment thread src-tauri/src/agent/adapter/claude_code/statusline.rs
Comment thread src-tauri/src/git/mod.rs Outdated
Comment thread src/features/agent-status/components/BudgetMetrics.test.tsx
Comment thread src/features/agent-status/hooks/useAgentStatus.ts Outdated
Five new comments from @winoooops focused on scope discipline:

H1 [matcher.rs:8] "we should try to eliminate the changes to the
code base with just reformatting or line changes ... See also for
other files within this PR, if it's only change of position in
importing and stuff, plz kindly remove them" → ADDRESSED. Reverted
17 pure-rustfmt drive-by files to origin/main:

  src-tauri/src/agent/adapter/base/watcher_runtime.rs
  src-tauri/src/agent/adapter/claude_code/test_runners/build.rs
  src-tauri/src/agent/adapter/claude_code/test_runners/cargo.rs
  src-tauri/src/agent/adapter/claude_code/test_runners/emitter.rs
  src-tauri/src/agent/adapter/claude_code/test_runners/matcher.rs
  src-tauri/src/agent/adapter/claude_code/test_runners/vitest.rs
  src-tauri/src/agent/detector.rs
  src-tauri/src/agent/mod.rs
  src-tauri/src/git/mod.rs
  src-tauri/src/git/watcher.rs
  src-tauri/src/lib.rs
  src-tauri/src/terminal/cache.rs
  src-tauri/tests/transcript_cargo_e2e.rs
  src-tauri/tests/transcript_turns.rs
  src-tauri/tests/transcript_vitest_e2e.rs
  src-tauri/tests/transcript_vitest_replay.rs

For terminal/commands.rs (mixed: had real Stage 2 work + drive-by
formatting), surgically reverted to main and re-applied only the
two `started_at: SystemTime::now()` fields required by Task 2
(ManagedSession.started_at).

H3 [git/mod.rs:784] "I see no point in changing the testings here
I didn't see the scope of this PR to be consist of the git diff
features" → ADDRESSED. The git/mod.rs and git/watcher.rs diff was
pure rustfmt reflows, no semantic changes. Reverted (covered by H1).

H5 [useAgentStatus.ts:248] "if you feel the code requirement
comments like this long, maybe the code itself is hard to
understand and makes it hard to maintain ... Treat this like a
red flag" → ADDRESSED. Replaced the 26-line stale-stop justification
with a 5-line summary and renamed the variable from
`newerSameSidSucceeded` to `newerSameSidWatcherIsActive` so the
predicate is self-documenting. The full per-bail-path reasoning lives
in the patterns/async-race-conditions.md #34 entry where it can be
referenced rather than embedded.

H2 [statusline.rs:602] "why change claude code adaptor testing in
this PR?" → REPLY-ONLY (not a revert). The statusline.rs diff is
required by F2 from the round-1 review: `CostMetrics.total_cost_usd`
changed from `f64` to `Option<f64>`, so the Claude parser had to be
updated too (defaults branch returns `None`, present branch wraps
in `Some(_)`) and existing tests had to assert against the `Option`
shape rather than the bare `f64`. Not Stage-2 scope creep; an
unavoidable side-effect of the IPC contract bump.

H4 [BudgetMetrics.test.tsx:103] "This is a legit addition to the
testing since codex will have no totalCost like Claude, I will call
this a necessary testing update" → REPLY-ONLY (positive ack).

Codex-Verify: skipped (revert-only diff against round-1 codex-
verified work; no new findings introduced).

Processed-Human-Inline-Comment-IDs: 3181495520 3181502600 3181518695 3181527310 3181541747

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟡 [MEDIUM] CompositeLocator dispatches to FS fallback via fragile string match

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L604-612
🎯 Confidence: 95%

CompositeLocator::resolve_rollout (line 607) routes to the FS fallback by checking reason.contains("schema drift") on the Unresolved variant's message string. The two error messages that trigger this path are "schema drift: threads table not found" (locator.rs:351) and "schema drift: logs table not found" (locator.rs:364). If either string is reworded, the guard silently stops matching: CompositeLocator falls through to Err(other) and returns the raw Unresolved, which CodexAdapter::status_source maps to BindError::Fatal. That causes an immediate bind failure with no FS-fallback attempt — users whose Codex instance has no SQLite database yet (or has a different schema) see the status panel fail immediately instead of binding via the FS scan. Fix: introduce LocatorError::SchemaDrift as a typed variant (already tracked in the ADR's "Known risks" section) and match on the variant, not the string.

💡 IDEA
  • I — Intent: The author wanted CompositeLocator to delegate to the FS fallback only when the SQLite locator failed because the Codex schema doesn't match (a known deployment variation), while propagating all other errors unchanged.
  • D — Danger: If SqliteFirstLocator renames either "schema drift: threads table not found" or "schema drift: logs table not found" (or if a refactor wraps the message), the guard silently stops firing. The FS fallback never runs, and Codex sessions on installations without a SQLite state database bind with a Fatal error on every poll cycle — the status panel never populates.
  • E — Explain: The author chose string matching because LocatorError::Unresolved currently carries a free-form String reason rather than a typed sub-variant. Adding a new variant mid-implementation would have required touching all match arms across three locator types; the string match was a pragmatic shortcut scoped to a single spot, with the refactor explicitly queued in the ADR.
  • A — Alternatives: Add LocatorError::SchemaDrift as a dedicated unit variant (no payload needed, since the message is already captured in the Unresolved arm when propagating outward). CompositeLocator matches on Err(LocatorError::SchemaDrift) — a compiler-checked structural pattern instead of a substring test. Zero callsite churn: SqliteFirstLocator replaces the two Unresolved("schema drift: ...") constructions with SchemaDrift.

🔵 [LOW] totalCostUsd ?? null is a no-op given the declared type

📍 /home/runner/work/vimeflow/vimeflow/src/features/agent-status/hooks/useAgentStatus.ts L398-398
🎯 Confidence: 92%

At useAgentStatus.ts:398, p.cost.totalCostUsd is already typed number | null in both the updated src/bindings/CostMetrics.ts binding and the local CostMetrics override in types/index.ts. The ?? null coalescence therefore never changes the value — null ?? null is null, and any numeric value short-circuits. The PR's own comment in agent/types.rs states the field "Serializes to JSON null, never absent", so undefined is explicitly ruled out on the wire. The expression misleads readers into thinking undefined can arrive here, contradicting the documented invariant. Remove the ?? null and express the intent purely through the type.

💡 IDEA
  • I — Intent: The author wanted to guarantee null is preserved across the normalization step, guarding against any wire format that might send the field as undefined rather than null.
  • D — Danger: No runtime danger — the coalescing is truly a no-op. The risk is reader confusion: future maintainers may infer that p.cost.totalCostUsd can be undefined at runtime and compensate elsewhere (e.g., extra guard conditions), adding noise without value.
  • E — Explain: The ?? guard was probably written while the type was still number (pre-PR), when null from the wire would genuinely have been undefined after TypeScript's strict mode disagreement with the old binding. The author updated the type but left the defensive coalesce, which was correct at draft time but is now redundant.
  • A — Alternatives: Replace p.cost.totalCostUsd ?? null with p.cost.totalCostUsd. If truly defensive coding against future type drift is desired, a // cost is number | null per CostMetrics; never undefined comment is clearer than a runtime expression.

Overall: ✅ patch is correct (confidence: 86%)

The Stage 2 Codex adapter lands cleanly. The SQLite-first locator correctly uses parameterized queries (named_params!, params![]) with read-only flags throughout; no SQL injection or path-traversal surface is opened. The BindError::Pending → retry / Fatal → bail split is correct, and the spawn_blocking migration ensures the 500 ms retry budget doesn't starve async workers. The frontend watcherStartInFlightRef / watcherStartGenerationRef guards handle the rapid-switch and exit/re-detect races correctly: generation increments and in-flight flag resets are consistently co-located, the stale-guard's newerSameSidWatcherIsActive check is sound (a stale stop targets the old ptySessionId, not the new session's watcher), and the finally block's generation check prevents a stale completion from clearing a newer generation's in-flight flag. The CostMetrics Option<f64>number | null threading from Rust through ts-rs binding, local type override, hook normalization, and BudgetMetrics render is consistent end-to-end. Two issues found: one MEDIUM design coupling (schema-drift dispatch via string matching, already tracked in the ADR) and one LOW code smell (redundant ?? null coalesce). No CRITICAL or HIGH bugs.

Comment thread src-tauri/src/agent/types.rs Outdated
Comment thread src-tauri/src/agent/adapter/base/mod.rs
Comment thread src-tauri/src/agent/adapter/claude_code/mod.rs
Per round-3 review (winoooops): the previous 7-line doc-comment
explained the ts-rs/serde mismatch in detail. The mismatch is
captured in the docs/reviews/patterns/generated-artifacts.md #2
entry; the inline comment can be one line stating the codex
semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟡 [MEDIUM] query_candidate_rows: LIMIT 64 before Rust rollout_paths filter

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L220-265
🎯 Confidence: 85%

In resolve_from_proc_fds (locator.rs:330), query_candidate_rows is called with updated_since_ms = None and Some(&rollout_paths). The SQL fetches the 64 most recently updated threads globally, then Rust filters by path membership. If the active session's threads.updated_at_ms ranks 65th or lower — possible for heavy users with many sessions — the row is silently excluded, choose_state_candidate returns None, and the multi-fd fast-path falls through to the slower logs-based binding. The single-fd branch at line 315 avoids this by calling query_thread_by_rollout_path (a direct WHERE rollout_path = ? lookup). Fix: in the multi-fd branch, iterate query_thread_by_rollout_path for each path in the set, taking the first Some result.

💡 IDEA
  • I — Intent: Author wanted a single DB round-trip to fetch candidate rows and apply multiple filters in memory, avoiding a dynamic SQL IN clause for variable-length path sets.
  • D — Danger: Any session whose threads.updated_at_ms falls outside the top-64 is invisible to the multi-fd fast-path. On platforms where the logs DB also has no rows yet, the watcher binds noticeably later, visible as a delayed status panel.
  • E — Explain: Fetching a fixed batch and filtering in Rust looks simpler when the session count is small, and building a parameterized IN (?, ?, ...) clause in rusqlite requires dynamic SQL string construction.
  • A — Alternatives: Iterate query_thread_by_rollout_path for each path — already implemented and tested for the single-fd case — and collect the first Some. At most two SQL lookups, no LIMIT blindspot.

🟡 [MEDIUM] discover_db: NotFound maps to Fatal, bypasses FsScanFallback

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L47-52
🎯 Confidence: 82%

discover_db uses ? at line 49 to propagate io::Error from std::fs::read_dir(codex_home). Both callers at lines 342-346 wrap all errors as LocatorError::Fatal. CompositeLocator only routes Unresolved(reason) where reason.contains("schema drift") to FsScanFallback; Fatal propagates immediately and exits resolve_status_source_with_retry without retrying. When ~/.codex doesn't exist (agent detected before codex finishes initialising), each watcher-start attempt fails hard rather than returning a retryable NotYetReady. Fix: in discover_db, catch io::ErrorKind::NotFound from read_dir and return Ok(None) — an absent directory has no SQLite files, so Ok(None) is semantically correct. Callers then see state_db = NoneUnresolved("schema drift") → FsScanFallback.

💡 IDEA
  • I — Intent: Using ? for all I/O errors is idiomatic; the distinction between NotFound (transient) and permission/corruption (fatal) was not made.
  • D — Danger: First-time codex users where ~/.codex doesn't exist at detection time see Fatal: scan ~/.codex: No such file or directory in logs and the 5-attempt backend retry budget is bypassed. The frontend 2s poll loop recovers, but error semantics are wrong.
  • E — Explain: The ? covers all io::Error variants without distinguishing transient absence from genuine failure. Callers wrap everything as Fatal without inspecting the error kind.
  • A — Alternatives: Inside discover_db, match e.kind() == io::ErrorKind::NotFound on the read_dir call and return Ok(None), re-propagating all other kinds via Err(e).

🟡 [MEDIUM] Regression test mock missing exit path; assertion never executes

📍 /home/runner/work/vimeflow/vimeflow/src/features/agent-status/hooks/useAgentStatus.test.ts L1010-1098
🎯 Confidence: 90%

In useAgentStatus.test.ts:1010, the test for 'older start resolves after newer succeeded' states in its mock comment that detect_agent_in_session returns undetected on the second call and re-detected on the third. The mock always returns { agentType: 'claudeCode', ... } with no call counter. Consequently newerStartCompleted (line 1016) stays false and the assertion at line 1089 (if (newerStartCompleted) { expect(invoke).not.toHaveBeenCalledWith('stop_agent_watcher', ...) }) never executes. The test passes vacuously. Fix: add a detectCallCount in the mock closure, return null on the second detection call (simulating exit) and the agent payload on the third; remove the if (newerStartCompleted) guard and assert unconditionally after the older start resolves.

💡 IDEA
  • I — Intent: The test was meant to pin the regression where an older in-flight start_agent_watcher resolves after a newer same-sid start has completed, verifying the stale-guard skips the stop call.
  • D — Danger: The watcherStartGenerationRef logic has no automated coverage for its defining scenario. A future refactor breaking the generation-based stale guard would not be caught by this test.
  • E — Explain: Simulating an exit/re-detect cycle while a start is in-flight requires careful timer and mock state interleaving. The author added the if (newerStartCompleted) guard as an escape hatch, but it also prevents the assertion from ever running.
  • A — Alternatives: Use a call counter in the mock, advance vitest fake timers between calls, and assert stop_agent_watcher is not called unconditionally — no if (newerStartCompleted) guard needed.

🔵 [LOW] now_iso8601: manual Gregorian calendar duplicates chrono already in scope

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/transcript.rs L724-760
🎯 Confidence: 95%

transcript.rs:724-760 implements a 35-line custom now_iso8601 / days_to_date pair (Hinnant civil_from_days) for an ISO-8601 fallback timestamp. chrono = "0.4" is already declared in Cargo.toml and used in locator.rs. Adding use chrono::Utc; and calling Utc::now().format("%Y-%m-%dT%H:%M:%SZ").to_string() replaces all 35 lines with no added compile cost. The function is only called when a JSONL event lacks a timestamp field, so runtime impact is nil, but the bespoke integer arithmetic is an unnecessary maintenance burden.

💡 IDEA
  • I — Intent: The author needed a fallback ISO-8601 timestamp and chose to avoid a use chrono import to keep the module self-contained.
  • D — Danger: Minimal runtime risk — the Hinnant algorithm is correct for all Unix timestamps. Future auditors may spend time verifying the custom arithmetic when a well-tested library call would be self-explanatory.
  • E — Explain: Some modules deliberately avoid pulling in new crate imports; transcript.rs has no other chrono usage, so the author may have wanted to avoid the dependency coupling.
  • A — Alternatives: chrono is already compiled for the crate. Adding use chrono::Utc; costs nothing at link time and replaces the entire function pair with a one-liner.

Overall: ⚠️ patch has issues (confidence: 83%)

The Codex adapter stage 2 is architecturally sound: the SQLite-first locator with FsScanFallback, the BindContext/BindError trait refactor, the watcherStartInFlightRef/watcherStartGenerationRef race-condition fix, and the Option cost propagation are all correct. No CRITICAL or HIGH issues found. Three MEDIUM findings: (1) LIMIT-before-filter ordering in query_candidate_rows silently misses the target session on the multi-fd path for heavy users; (2) discover_db maps a transient NotFound (absent ~/.codex) to Fatal, bypassing retries and FsScanFallback; (3) the regression test for the stale-generation guard never executes its key assertion because the mock doesn't implement the exit/re-detect cycle it documents. One LOW: unnecessary custom Gregorian calendar when chrono is available. No blockers; findings 1-2 worth fixing before production Codex users hit them.

`d0c7145` shortened the `total_cost_usd` doc-comment in
`src-tauri/src/agent/types.rs` but didn't regenerate the ts-rs
binding. ts-rs embeds rust doc-comments in the generated `.ts`,
so CI's "Verify TypeScript Bindings" step (which regenerates and
asserts no diff) failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Claude Code Review

🟠 [HIGH] Regression test for stale-watcher stop guard never executes its key assertion

📍 /home/runner/work/vimeflow/vimeflow/src/features/agent-status/hooks/useAgentStatus.test.ts L1089-1096
🎯 Confidence: 93%

newerStartCompleted can never be true in this test's setup, so the expect on lines 1090-1092 is dead code. For a second start_agent_watcher call to fire, watcherStartInFlightRef.current must be false AND watcherStartedRef.current must be false. The first start sets watcherStartInFlightRef = true before await invoke(...). The only resets happen on session change, agent exit, or unmount — none of which the test drives. When resolveOlderStart is finally called the first start succeeds, watcherStartedRef = true prevents a second start, and newerStartCompleted stays false forever. The if (newerStartCompleted) guard means the assertion against stop_agent_watcher is never reached. The expect(startCalls).toHaveLength(1) on line 834 does validate the watcherStartInFlightRef duplicate-prevention path, but the newerSameSidWatcherIsActive three-condition guard at useAgentStatus.ts:253-260 is completely untested. A typo in any of its three conditions would pass CI undetected. Fix: drive an actual exit/re-detect cycle with fake timers (advance past the polling interval, flip detect_agent_in_session to return null then back to an agent), or at minimum unconditionally assert stop_agent_watcher is not called in the basic scenario.

💡 IDEA
  • I — Intent: Pin the invariant that a stale (older generation) start_agent_watcher completion does not call stopWatchers when a newer same-sid watcher is already active — the cited P1 regression for Codex.
  • D — Danger: The newerSameSidWatcherIsActive guard at useAgentStatus.ts:253-260 has three conditions (currentSessionIdRef === sid, generation !== startGeneration, watcherStartedRef = true). Zero of them are exercised by this test. If any condition is inverted or dropped in a future refactor, the regression (stale start tears down an active newer watcher, blanking the status panel) silently reappears.
  • E — Explain: The full exit/re-detect cycle for the same session is hard to drive deterministically in a hook-render-with-fake-timers harness; the author noted this explicitly and tried to partially approximate it. The if (newerStartCompleted) guard was probably added to avoid a spurious failure on a path that wasn't fully simulated, but it also nullifies the assertion entirely.
  • A — Alternatives: Use vi.useFakeTimers() with vi.advanceTimersByTimeAsync to drive the detection poll past a second interval; mock detect_agent_in_session to return null on the second call (exit) and an agent on the third (re-detect), then yield via await vi.runAllTimersAsync(). This produces two real start_agent_watcher calls without the newerStartCompleted guard, and the stop_agent_watcher assertion becomes unconditional.

🟡 [MEDIUM] now_iso8601() / days_to_date() duplicates chrono already in the workspace

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/transcript.rs L724-757
🎯 Confidence: 90%

transcript.rs hand-rolls a 40-line ISO 8601 formatter (now_iso8601 + days_to_date, lines 724-757) while chrono is already a workspace dependency — it is imported directly in the sibling locator.rs file as use chrono::{Datelike, Duration as ChronoDuration, Local}. The same result with subsecond precision is one line: chrono::Utc::now().format("%Y-%m-%dT%H:%M:%SZ").to_string(). The days_to_date algorithm is Howard Hinnant's civil_from_days and is arithmetically correct for all plausible inputs, so this is not a functional bug. But it doubles the code surface for timestamp generation in this crate, and a future change to sub-second formatting or timezone handling would require updating two independent implementations. Fix: use chrono::Utc; at the top of transcript.rs and replace both functions with the one-liner.

💡 IDEA
  • I — Intent: Provide a fallback timestamp when the Codex rollout JSONL event lacks a timestamp field, so extract_timestamp always returns a valid ISO 8601 string.
  • D — Danger: No immediate runtime correctness issue — the algorithm is correct. The risk is maintenance: if the call site later needs millisecond precision or local timezone, a developer may patch only one of the two implementations, leaving them silently out of sync.
  • E — Explain: The author may have wanted to avoid adding a chrono import to this file (perhaps to keep the module's dependency footprint lighter), or may not have noticed chrono was already available. The inline implementation avoids a use statement but adds significant code.
  • A — Alternatives: chrono is already in Cargo.toml and imported in locator.rs; add use chrono::Utc; and replace lines 724-757 with a one-liner. Alternatively, expose a shared utils::now_iso8601() in the adapter crate if more than one module needs it.

🟡 [MEDIUM] Schema-drift fallback dispatched via fragile substring match on error message

📍 /home/runner/work/vimeflow/vimeflow/src-tauri/src/agent/adapter/codex/locator.rs L607-609
🎯 Confidence: 88%

CompositeLocator::resolve_rollout (line 607) routes to FsScanFallback when the Unresolved reason string contains the substring "schema drift". Both producers of that string live in the same method (SqliteFirstLocator::resolve_rollout, lines 351 and 364), but the dispatch is tightly coupled to the exact text. If either message is renamed in a refactor — including an automated rename that changes capitalization or phrasing — the contains("schema drift") guard silently fails to match, and the fallback is never invoked. Codex sessions that lack a SQLite threads/logs table would then return a permanent Unresolved error instead of falling back to the FS scan, preventing any session from binding. The PR description already tracks this as a followup: "Refactor to typed LocatorError::SchemaDrift variant". Fix: add LocatorError::SchemaDrift as a dedicated variant, produce it from the two callsites, and match on Err(LocatorError::SchemaDrift) in CompositeLocator.

💡 IDEA
  • I — Intent: Distinguish 'the Codex SQLite schema changed or the expected table does not exist yet' from other Unresolved conditions so the composite locator can fall back to FS scanning instead of hard-failing.
  • D — Danger: If a refactor renames either error string at lines 351 or 364 without updating the contains check, the fallback is silently disabled. No test currently covers the case where the substring does NOT match a schema-drift message but the reason happens to contain a different word — meaning the inverse (false-positive dispatch to the fallback) is also untested.
  • E — Explain: The author chose the substring approach as the minimal-change solution to land the fallback dispatch quickly; the ADR explicitly flags it as a known fragility and defers the typed-variant refactor to a follow-up. The implementation works today because both producers are co-located.
  • A — Alternatives: Add SchemaDrift to LocatorError before or alongside this PR, or at minimum use a const SCHEMA_DRIFT_TAG: &str = "schema drift" so all three sites — two producers and one consumer — reference the same constant, making a rename a compile error.

Overall: ✅ patch is correct (confidence: 84%)

The Codex adapter implementation is architecturally sound: SQLite queries are fully parameterized, path security uses canonicalization before starts_with checks, proc-FD paths are only used as SQLite query keys (never opened directly), and blocking I/O is correctly moved to spawn_blocking. The total_cost_usd: Option<f64> promotion and the watcherStartInFlightRef/watcherStartGenerationRef frontend guards are correct by construction. Three issues stand out: a regression test whose primary assertion is guarded by a condition that is never true in the test setup (leaving the newerSameSidWatcherIsActive guard entirely uncovered), a hand-rolled ISO 8601 formatter that duplicates chrono functionality already in the workspace, and a string-substring dispatch for the fallback locator that the PR's own ADR flags as fragile — but which should be addressed in-code rather than only in docs.

@winoooops winoooops merged commit 759d1d0 into main May 4, 2026
14 checks passed
@winoooops winoooops deleted the feat/codex-adapter-stage-2-impl branch May 4, 2026 13:36
winoooops added a commit that referenced this pull request May 4, 2026
…le README (#157)

PR #154 landed Codex support behind the AgentAdapter trait. Bring
the user-facing documentation up to date so the as-merged contract
is visible without having to read the spec or the adapter source.

Changes by file:

README.md (+ README.zh-CN.md mirror)
- Tagline: "AI coding agents like Claude Code and Codex" (was Claude only)
- Phase 4 section rewritten to describe the multi-agent surface:
  AgentAdapter trait, Claude vs Codex sourcing paths (statusline JSON
  vs SQLite locator + rollout JSONL), per-variant BudgetMetrics
  routing (Subscriber for Codex, ApiKey for Claude API-key), and
  ContextBucket source semantics (last_token_usage for Codex,
  total_input_tokens for Claude)
- Caption under the panel screenshot: now noted as agent-agnostic
- Module table row for `agent-status`: trait-based multi-agent
- Added cross-links to the Stage 1 / Stage 2 specs and the
  scope-expansion ADR

src-tauri/src/agent/README.md
- Top-of-file framing: Stage 1+2 both shipped, Aider future
- Module tree adds the codex/ subtree (mod, locator, parser, transcript)
  and its responsibilities
- types.rs entry calls out the BindContext / BindError / ValidateTranscriptError
  additions from Stage 2
- for_type returns CodexAdapter for AgentType::Codex (was Stage 2
  pending); cross-references issue #156 for the planned shape
  simplification (BindContext + retry into CodexAdapter)
- start signature reflects the as-merged shape (pid, pty_start added);
  documents that pid is the detected agent PID (not shell PID) and
  pty_start filters PID reuse / stale-loaded-thread matches

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
winoooops added a commit that referenced this pull request May 8, 2026
Codifies the rule "a PR's diff must answer one question — what does
the spec/plan/issue say to do?" Anything not answering that question
splits into a separate PR.

Why now: PR #154 (codex adapter stage 2) bundled 17 pure-rustfmt
drive-by files plus a 999-LOC out-of-scope codex transcript tailer
in with the spec-aligned attach work. The reviewer caught it; we
reverted the formatting drive-bys and ratified the tailer expansion
in an ADR. Both fixes were after-the-fact and cost cycles. The rule
codifies the pre-PR checklist that would have caught it sooner:
walk the diff, reconcile each file against the plan's File-touch
list, surface and revert drive-bys, escalate genuine deviations
into an ADR rather than letting them ride silently.

Lives in rules/common/ because scope discipline shapes every PR,
not just the current one — auto-loaded into every coding session.
References git-workflow.md, code-review.md, and the codex-adapter
scope-expansion ADR as a worked example of "when a deviation is
unavoidable, ratify it explicitly".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
winoooops added a commit that referenced this pull request May 8, 2026
Codifies the rule "a PR's diff must answer one question — what does
the spec/plan/issue say to do?" Anything not answering that question
splits into a separate PR.

Why now: PR #154 (codex adapter stage 2) bundled 17 pure-rustfmt
drive-by files plus a 999-LOC out-of-scope codex transcript tailer
in with the spec-aligned attach work. The reviewer caught it; we
reverted the formatting drive-bys and ratified the tailer expansion
in an ADR. Both fixes were after-the-fact and cost cycles. The rule
codifies the pre-PR checklist that would have caught it sooner:
walk the diff, reconcile each file against the plan's File-touch
list, surface and revert drive-bys, escalate genuine deviations
into an ADR rather than letting them ride silently.

Lives in rules/common/ because scope discipline shapes every PR,
not just the current one — auto-loaded into every coding session.
References git-workflow.md, code-review.md, and the codex-adapter
scope-expansion ADR as a worked example of "when a deviation is
unavoidable, ratify it explicitly".

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant