Skip to content

refactor: unify the LLM retry classifier across client + judge retry#74

Merged
drewstone merged 1 commit into
mainfrom
refactor/unify-llm-retry-classifier
May 22, 2026
Merged

refactor: unify the LLM retry classifier across client + judge retry#74
drewstone merged 1 commit into
mainfrom
refactor/unify-llm-retry-classifier

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

Merge B of the agent-eval consolidation. callLlm and withJudgeRetry each carried a private retry-classification surface — two backoff functions, two retryable-status sets, two divergent transient-error pattern lists. One concern, three copies.

Collapse to one exported primitive in llm-client.ts:

  • isTransientLlmError(err) — THE retry classifier. Inspects an error's name/message/code, honors a numeric HTTP status (LlmCallError or a foreign SDK error that duck-types .status), and recurses into error.cause — undici nests the real socket fault under .cause.
  • backoffMs(attempt) — exported; withJudgeRetry's default backoff.

judge-retry.ts drops its ABORT_PATTERNS, RETRYABLE_HTTP_STATUS, DEFAULT_BACKOFF, and defaultIsRetryable in favor of the shared pair. callLlm's catch path routes through the same function.

Bug class fixed

Neither old classifier matched undici HTTP/2 transport faultsterminated, NGHTTP2_INTERNAL_ERROR, UND_ERR_*, other side closed. llm-client's regex was only fetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN; judge-retry's list was broader but still HTTP/1-shaped. An HTTP/2 keep-alive connection dropping mid-response escaped the retry loop — surfacing as an uncaught rejection in the HTTP client and a silently non-retried trial failure in TCloud-backed judges. The unified pattern list covers them and the .cause chain is followed to the real fault.

Test plan

  • callLlm retries an HTTP/2 terminated / NGHTTP2_INTERNAL_ERROR fault to recovery
  • withJudgeRetry does the same via the shared classifier
  • isTransientLlmError unit-tested: HTTP/2, undici .code, .cause chain, network/abort, retryable vs non-retryable status, deterministic failures (JSON parse, schema reject), self-referential cause chain
  • pnpm typecheck — clean
  • pnpm test — 1273 passed (131 files)
  • pnpm exec biome check src — 0 errors
  • pnpm build — green

Merge B of the agent-eval consolidation. callLlm and withJudgeRetry each
carried a private retry-classification surface — two backoff functions,
two retryable-status sets, two divergent transient-error pattern lists.

Collapse to one exported primitive in llm-client.ts:

- isTransientLlmError(err) — THE retry classifier. Inspects an error's
  name/message/code, honors a numeric HTTP status (LlmCallError or a
  foreign SDK error), and recurses into error.cause (undici nests the
  real socket fault under .cause). withJudgeRetry's default predicate
  and callLlm's catch path both route through it.
- backoffMs(attempt) — exported; withJudgeRetry's default backoff.

judge-retry.ts drops its ABORT_PATTERNS, RETRYABLE_HTTP_STATUS,
DEFAULT_BACKOFF, and defaultIsRetryable in favor of the shared pair.

Bug class fixed: neither old classifier matched undici HTTP/2 transport
faults (`terminated`, NGHTTP2_INTERNAL_ERROR, UND_ERR_*, `other side
closed`). An HTTP/2 connection dropping mid-response escaped the retry
loop — surfacing as an uncaught rejection in the HTTP client and a
silent non-retried trial failure in TCloud-backed judges. The unified
pattern list covers them and the cause chain is followed.

Regression tests: callLlm retries an HTTP/2 fault to recovery;
withJudgeRetry does the same via the shared classifier;
isTransientLlmError unit-tested across HTTP/2, network, abort, status,
deterministic-failure, and self-referential-cause inputs.

typecheck + 1273 tests + biome + build all green.
@tangletools
Copy link
Copy Markdown
Contributor

tangletools commented May 21, 2026

✅ No Blockers — 7351f0fd

Readiness 95/100 · Confidence 95/100 · 0 findings (none)

kimi-code: Correctness 95 · Security 95 · Testing 95 · Architecture 95

I read every changed file in full, traced callers/callees, ran the full test suite (1272 tests passed), and verified the build. The PR cleanly deduplicates retry logic between llm-client and judge-retry into a single exported classifier, adds HTTP/2 transport fault detection with cause-chain traversal, and covers the new paths with regression tests. No runtime defects were found.

No findings.


tangletools · 2026-05-21T21:16:07Z · trace

Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Clean — 7351f0fd

I read every changed file in full, traced callers/callees, ran the full test suite (1272 tests passed), and verified the build. The PR cleanly deduplicates retry logic between llm-client and judge-retry into a single exported classifier, adds HTTP/2 transport fault detection with cause-chain traversal, and covers the new paths with regression tests. No runtime defects were found.

Full findings and scores: review summary


tangletools · 2026-05-21T21:16:07Z · trace

@drewstone drewstone merged commit d420216 into main May 22, 2026
1 check passed
@drewstone drewstone deleted the refactor/unify-llm-retry-classifier branch May 22, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants