refactor: unify the LLM retry classifier across client + judge retry by drewstone · Pull Request #74 · tangle-network/agent-eval

drewstone · 2026-05-21T21:07:45Z

Summary

Merge B of the agent-eval consolidation. callLlm and withJudgeRetry each carried a private retry-classification surface — two backoff functions, two retryable-status sets, two divergent transient-error pattern lists. One concern, three copies.

Collapse to one exported primitive in llm-client.ts:

isTransientLlmError(err) — THE retry classifier. Inspects an error's name/message/code, honors a numeric HTTP status (LlmCallError or a foreign SDK error that duck-types .status), and recurses into error.cause — undici nests the real socket fault under .cause.
backoffMs(attempt) — exported; withJudgeRetry's default backoff.

judge-retry.ts drops its ABORT_PATTERNS, RETRYABLE_HTTP_STATUS, DEFAULT_BACKOFF, and defaultIsRetryable in favor of the shared pair. callLlm's catch path routes through the same function.

Bug class fixed

Neither old classifier matched undici HTTP/2 transport faults — terminated, NGHTTP2_INTERNAL_ERROR, UND_ERR_*, other side closed. llm-client's regex was only fetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN; judge-retry's list was broader but still HTTP/1-shaped. An HTTP/2 keep-alive connection dropping mid-response escaped the retry loop — surfacing as an uncaught rejection in the HTTP client and a silently non-retried trial failure in TCloud-backed judges. The unified pattern list covers them and the .cause chain is followed to the real fault.

Test plan

callLlm retries an HTTP/2 terminated / NGHTTP2_INTERNAL_ERROR fault to recovery
withJudgeRetry does the same via the shared classifier
isTransientLlmError unit-tested: HTTP/2, undici .code, .cause chain, network/abort, retryable vs non-retryable status, deterministic failures (JSON parse, schema reject), self-referential cause chain
pnpm typecheck — clean
pnpm test — 1273 passed (131 files)
pnpm exec biome check src — 0 errors
pnpm build — green

Merge B of the agent-eval consolidation. callLlm and withJudgeRetry each carried a private retry-classification surface — two backoff functions, two retryable-status sets, two divergent transient-error pattern lists. Collapse to one exported primitive in llm-client.ts: - isTransientLlmError(err) — THE retry classifier. Inspects an error's name/message/code, honors a numeric HTTP status (LlmCallError or a foreign SDK error), and recurses into error.cause (undici nests the real socket fault under .cause). withJudgeRetry's default predicate and callLlm's catch path both route through it. - backoffMs(attempt) — exported; withJudgeRetry's default backoff. judge-retry.ts drops its ABORT_PATTERNS, RETRYABLE_HTTP_STATUS, DEFAULT_BACKOFF, and defaultIsRetryable in favor of the shared pair. Bug class fixed: neither old classifier matched undici HTTP/2 transport faults (`terminated`, NGHTTP2_INTERNAL_ERROR, UND_ERR_*, `other side closed`). An HTTP/2 connection dropping mid-response escaped the retry loop — surfacing as an uncaught rejection in the HTTP client and a silent non-retried trial failure in TCloud-backed judges. The unified pattern list covers them and the cause chain is followed. Regression tests: callLlm retries an HTTP/2 fault to recovery; withJudgeRetry does the same via the shared classifier; isTransientLlmError unit-tested across HTTP/2, network, abort, status, deterministic-failure, and self-referential-cause inputs. typecheck + 1273 tests + biome + build all green.

tangletools · 2026-05-21T21:12:54Z

✅ No Blockers — `7351f0fd`

Readiness 95/100 · Confidence 95/100 · 0 findings (none)

kimi-code: Correctness 95 · Security 95 · Testing 95 · Architecture 95

I read every changed file in full, traced callers/callees, ran the full test suite (1272 tests passed), and verified the build. The PR cleanly deduplicates retry logic between llm-client and judge-retry into a single exported classifier, adds HTTP/2 transport fault detection with cause-chain traversal, and covers the new paths with regression tests. No runtime defects were found.

No findings.

_{tangletools · 2026-05-21T21:16:07Z · trace}

tangletools

✅ Clean — `7351f0fd`

I read every changed file in full, traced callers/callees, ran the full test suite (1272 tests passed), and verified the build. The PR cleanly deduplicates retry logic between llm-client and judge-retry into a single exported classifier, adds HTTP/2 transport fault detection with cause-chain traversal, and covers the new paths with regression tests. No runtime defects were found.

Full findings and scores: review summary

_{tangletools · 2026-05-21T21:16:07Z · trace}

tangletools approved these changes May 21, 2026

View reviewed changes

drewstone merged commit d420216 into main May 22, 2026
1 check passed

drewstone deleted the refactor/unify-llm-retry-classifier branch May 22, 2026 00:02

drewstone mentioned this pull request May 22, 2026

chore(0.34.0): release — eval scorecard + agent profile cells #84

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: unify the LLM retry classifier across client + judge retry#74

refactor: unify the LLM retry classifier across client + judge retry#74
drewstone merged 1 commit into
mainfrom
refactor/unify-llm-retry-classifier

drewstone commented May 21, 2026

Uh oh!

tangletools commented May 21, 2026 •

edited

Loading

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented May 21, 2026

Summary

Bug class fixed

Test plan

Uh oh!

tangletools commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ No Blockers — 7351f0fd

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Clean — 7351f0fd

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tangletools commented May 21, 2026 •

edited

Loading

✅ No Blockers — `7351f0fd`

✅ Clean — `7351f0fd`