fix(agent): bound cached resume transcript by max_history_messages#2224
Conversation
Added a new method to limit the number of cached transcript messages to the configured maximum while preserving the leading system message if present. Updated the resume logic to utilize this method, ensuring that the cached messages do not exceed the defined history window. Added logging to warn when messages are trimmed.
…essages Introduced a new test to verify that the transcript resume functionality correctly limits the number of cached messages to the configured maximum, ensuring the leading system message is preserved. This test checks the integrity of the resumed messages after persisting a session transcript with more messages than the limit.
Enhanced the test suite by adding two new tests to verify the behavior of the transcript message bounding functionality. The first test ensures that the history window limit is respected when resuming messages, while the second test checks that the cached messages retain the correct tail when exceeding the maximum limit. These tests help confirm the integrity of message handling in various scenarios.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR enforces history window limits on resumed cached transcripts. A new helper method bounds ChatMessage vectors to ChangesTranscript history window bounding
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
CodeGhost21
left a comment
There was a problem hiding this comment.
Non-blocking review — fix is sound, scoped, and tested. Two observations worth surfacing:
1. Message-count bound vs. token-count root cause. The Sentry case is a 203,783 / 202,752 token overflow. With any non-trivial max_history_messages, a handful of large tool-result messages can still exceed the context window on iteration 1, because reduce_before_call only fires on iteration 2+. This change reduces the failure rate substantially but doesn't fully eliminate the class of bug. The PR description acknowledges this and defers a token-aware bound — worth filing a follow-up issue for that path if one doesn't already exist, so it doesn't get lost.
2. max_history_messages = 1 + system prefix is degenerate. With max=1 and a leading system message, bound_cached_transcript_messages returns [system] only — no user/assistant message at all, which most providers will reject with a different 400. The .max(1) floor protects against underflow but the implicit contract is really "max ≥ 2 when a system message is present." Nobody sets max=1 in practice, so this is theoretical, but a .max(2) floor in the system-prefix branch would make the helper self-consistent. Trivial change if desired.
LGTM otherwise — helper is well-placed, doc-commented for the non-obvious "why," and the symmetric application at both resume entry points is the right shape for the fix.
Summary
config.max_history_messageswhenever a resumed session is primed, so resume paths can no longer ship an unbounded message log to the provider on iteration 1.seed_resume_from_messages(cold-boot priming from caller-supplied messages) andtry_load_session_transcript(transcript-file load).systemmessage is preserved when present; otherwise the tailNmessages are kept.bound_cached_transcript_messageshelper onAgentwith three unit tests covering: system-prefix bound on seed, system-prefix bound on transcript load, and the no-system-message tail-keep branch.warnlog when the bound actually trims, so over-long transcripts are visible in diagnostics.Problem
Sentry issue OPENHUMAN-TAURI-QX —
custom_openai400 Bad Request: "Requested token count exceeds the model's maximum context length of 202752 tokens. You requested a total of 203783 tokens". 48 events, first seen 2026-05-15, last 2026-05-17, releaseopenhuman@0.53.43.Root cause: on resume,
cached_transcript_messagesis consumed verbatim on the first iteration of the agent loop (turn.rs:561) — only a single new-tail user message is appended. This path bypasses bothtrim_historyandreduce_before_call, so a long persisted transcript blows straight through the model's context window on the very first provider call after a resume.Solution
Agent::bound_cached_transcript_messages(Vec<ChatMessage>) -> Vec<ChatMessage>that caps the slice atconfig.max_history_messages(.max(1)floor). When the first message issystem, it is preserved and the lastmax-1messages are kept; otherwise the lastmaxmessages are kept.Design notes / tradeoffs:
trim_historysemantics — not by token count. A single oversized tool-result message can still in theory exceed context; the autocompactor (reduce_before_call) is what handles token-level pressure on subsequent iterations. A token-aware bound for the cached path is a sensible follow-up but was intentionally deferred to keep this change behavior-conservative and Sentry-targeted.ChatMessageisrole+contentonly (no structuredtool_calls), so slicing in the middle is structurally safe at the wire level; same orphan-tool-reference risk thattrim_historyalready carries.Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/coverage.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge.docs/TEST-COVERAGE-MATRIX.mdreflect this change (orN/A: behaviour-only change) — N/A: behaviour-only change to an existing resume code path; no new feature surface.## Related— N/A: no matrix rows affected.docs/RELEASE-MANUAL-SMOKE.md) — N/A: internal agent harness bound; no release-cut surface touched.Closes #NNNin the## Relatedsection — N/A: Sentry issue OPENHUMAN-TAURI-QX, no GitHub issue tracking item.Impact
src/openhuman/agent/harness/session/). Affects desktop (Tauri host) andopenhuman-coreCLI equally — both paths share the agent harness. No frontend, no mobile, no web surface touched.Vecslice/clone on resume entry; negligible.max_history_messagesof that transcript on iteration 1. Older context is dropped at the wire layer, mirroring whattrim_historyalready does for in-process histories. Logged atwarnwhen this actually triggers.Related
Summary by CodeRabbit
Bug Fixes
Tests