fix(codex): surface thread system errors#519
Merged
tiann merged 1 commit intotiann:mainfrom Apr 23, 2026
Merged
Conversation
There was a problem hiding this comment.
Findings
- [Major] Thread remains reusable after a thread-scoped
systemError— the newthread/status/changedmapping only emitstask_failed, but the launcher keepshasThread/currentThreadIdand will reuse that same thread on the next turn (cli/src/codex/utils/appServerEventConverter.ts:269,cli/src/codex/codexRemoteLauncher.ts:629,cli/src/codex/codexRemoteLauncher.ts:687). See inline comment for a minimal fix.
Summary
Review mode: initial
- 1 Major issue found in the Codex thread-system-error recovery path.
Testing
- Not run (automation; review instruction forbids executing PR code)
HAPI Bot
| events.push({ | ||
| type: 'task_failed', | ||
| ...(threadId ? { thread_id: threadId } : {}), | ||
| terminal_source: 'thread_status', |
There was a problem hiding this comment.
[MAJOR] thread/status/changed is a thread-scoped failure, but this new path only converts it into task_failed. The launcher never clears currentThreadId or hasThread, so the next queued message will still call turn/start on the same thread that already entered systemError (cli/src/codex/codexRemoteLauncher.ts:629, cli/src/codex/codexRemoteLauncher.ts:687). That leaves the session stuck on a broken thread instead of recovering on the next turn.
Suggested fix:
const isThreadStatusFailure = msgType === 'task_failed' && msg.terminal_source === 'thread_status';
if (isTerminalEvent) {
this.currentTurnId = null;
allowAnonymousTerminalEvent = false;
if (isThreadStatusFailure) {
this.currentThreadId = null;
hasThread = false;
}
}bad9c69 to
7085137
Compare
There was a problem hiding this comment.
Findings
- No actionable issues found in the updated diff.
Summary
Review mode: follow-up after new commits
- No actionable issues found on the latest head.
- Residual risk: added coverage is unit-level only; not found in repo/docs: integration coverage against a real app-server
thread/status/changedsequence.
Testing
- Not run (automation)
HAPI Bot
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
thread/status/changedsystemError notifications into terminal task failuresstartTurn()Why
Codex app-server can report thread-level
systemErrorwithout a turn id. Previously HAPI ignored/filtered that terminal signal while a turn was in flight, leaving the UI looking disconnected instead of surfacing the failure and returning to ready.Test plan
cd cli && bun run vitest run src/codex/codexRemoteLauncher.test.ts src/codex/utils/appServerEventConverter.test.ts src/codex/utils/terminalEventGuard.test.tscd cli && bun run typecheck