Skip to content

fix(codex): surface thread system errors#519

Merged
tiann merged 1 commit intotiann:mainfrom
Liu-KM:liu/surface-codex-thread-system-error
Apr 23, 2026
Merged

fix(codex): surface thread system errors#519
tiann merged 1 commit intotiann:mainfrom
Liu-KM:liu/surface-codex-thread-system-error

Conversation

@Liu-KM
Copy link
Copy Markdown
Contributor

@Liu-KM Liu-KM commented Apr 23, 2026

Summary

  • convert Codex app-server thread/status/changed systemError notifications into terminal task failures
  • allow explicitly thread-scoped terminal failures to complete the active turn even when they do not carry a turn id
  • show the failure as a visible HAPI session message and avoid resurrecting a stale turn id after the terminal event races with startTurn()

Why

Codex app-server can report thread-level systemError without a turn id. Previously HAPI ignored/filtered that terminal signal while a turn was in flight, leaving the UI looking disconnected instead of surfacing the failure and returning to ready.

Test plan

  • cd cli && bun run vitest run src/codex/codexRemoteLauncher.test.ts src/codex/utils/appServerEventConverter.test.ts src/codex/utils/terminalEventGuard.test.ts
  • cd cli && bun run typecheck

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Thread remains reusable after a thread-scoped systemError — the new thread/status/changed mapping only emits task_failed, but the launcher keeps hasThread/currentThreadId and will reuse that same thread on the next turn (cli/src/codex/utils/appServerEventConverter.ts:269, cli/src/codex/codexRemoteLauncher.ts:629, cli/src/codex/codexRemoteLauncher.ts:687). See inline comment for a minimal fix.

Summary
Review mode: initial

  • 1 Major issue found in the Codex thread-system-error recovery path.

Testing

  • Not run (automation; review instruction forbids executing PR code)

HAPI Bot

events.push({
type: 'task_failed',
...(threadId ? { thread_id: threadId } : {}),
terminal_source: 'thread_status',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] thread/status/changed is a thread-scoped failure, but this new path only converts it into task_failed. The launcher never clears currentThreadId or hasThread, so the next queued message will still call turn/start on the same thread that already entered systemError (cli/src/codex/codexRemoteLauncher.ts:629, cli/src/codex/codexRemoteLauncher.ts:687). That leaves the session stuck on a broken thread instead of recovering on the next turn.

Suggested fix:

const isThreadStatusFailure = msgType === 'task_failed' && msg.terminal_source === 'thread_status';

if (isTerminalEvent) {
    this.currentTurnId = null;
    allowAnonymousTerminalEvent = false;
    if (isThreadStatusFailure) {
        this.currentThreadId = null;
        hasThread = false;
    }
}

@Liu-KM Liu-KM force-pushed the liu/surface-codex-thread-system-error branch from bad9c69 to 7085137 Compare April 23, 2026 06:15
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No actionable issues found in the updated diff.

Summary
Review mode: follow-up after new commits

  • No actionable issues found on the latest head.
  • Residual risk: added coverage is unit-level only; not found in repo/docs: integration coverage against a real app-server thread/status/changed sequence.

Testing

  • Not run (automation)

HAPI Bot

@tiann tiann merged commit 19e7445 into tiann:main Apr 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants