Skip to content

fix(agent): bound cached resume transcript by max_history_messages#2224

Merged
senamakel merged 3 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/agent-resume-bound-cached-transcript
May 19, 2026
Merged

fix(agent): bound cached resume transcript by max_history_messages#2224
senamakel merged 3 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/agent-resume-bound-cached-transcript

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 19, 2026

Summary

  • Cap the cached transcript prefix at config.max_history_messages whenever a resumed session is primed, so resume paths can no longer ship an unbounded message log to the provider on iteration 1.
  • Applied at both resume entry points: seed_resume_from_messages (cold-boot priming from caller-supplied messages) and try_load_session_transcript (transcript-file load).
  • Leading system message is preserved when present; otherwise the tail N messages are kept.
  • Added bound_cached_transcript_messages helper on Agent with three unit tests covering: system-prefix bound on seed, system-prefix bound on transcript load, and the no-system-message tail-keep branch.
  • Resume paths now emit a warn log when the bound actually trims, so over-long transcripts are visible in diagnostics.

Problem

Sentry issue OPENHUMAN-TAURI-QXcustom_openai 400 Bad Request: "Requested token count exceeds the model's maximum context length of 202752 tokens. You requested a total of 203783 tokens". 48 events, first seen 2026-05-15, last 2026-05-17, release openhuman@0.53.43.

Root cause: on resume, cached_transcript_messages is consumed verbatim on the first iteration of the agent loop (turn.rs:561) — only a single new-tail user message is appended. This path bypasses both trim_history and reduce_before_call, so a long persisted transcript blows straight through the model's context window on the very first provider call after a resume.

Solution

  • Introduce Agent::bound_cached_transcript_messages(Vec<ChatMessage>) -> Vec<ChatMessage> that caps the slice at config.max_history_messages (.max(1) floor). When the first message is system, it is preserved and the last max-1 messages are kept; otherwise the last max messages are kept.
  • Wire the helper at both resume entry points so the bound applies before the cached transcript is handed to the provider.
  • Diagnostics: warn-level log when bounding actually trims, plus the existing info log now reports the post-bound count so logs don't overstate what was primed.

Design notes / tradeoffs:

  • Bound is by message count, mirroring the existing trim_history semantics — not by token count. A single oversized tool-result message can still in theory exceed context; the autocompactor (reduce_before_call) is what handles token-level pressure on subsequent iterations. A token-aware bound for the cached path is a sensible follow-up but was intentionally deferred to keep this change behavior-conservative and Sentry-targeted.
  • ChatMessage is role + content only (no structured tool_calls), so slicing in the middle is structurally safe at the wire level; same orphan-tool-reference risk that trim_history already carries.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
  • Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change) — N/A: behaviour-only change to an existing resume code path; no new feature surface.
  • All affected feature IDs from the matrix are listed in the PR description under ## RelatedN/A: no matrix rows affected.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: internal agent harness bound; no release-cut surface touched.
  • Linked issue closed via Closes #NNN in the ## Related section — N/A: Sentry issue OPENHUMAN-TAURI-QX, no GitHub issue tracking item.

Impact

  • Runtime/platform: Rust core only (src/openhuman/agent/harness/session/). Affects desktop (Tauri host) and openhuman-core CLI equally — both paths share the agent harness. No frontend, no mobile, no web surface touched.
  • Performance: net positive — resume requests with long transcripts will now fit inside the context window instead of 400ing. Bounding cost is a single Vec slice/clone on resume entry; negligible.
  • Security: none.
  • Migration / compatibility: no schema, RPC, or persistence changes. Cached transcripts on disk are unaffected — they're just truncated in memory at load time when oversized.
  • Behavioral change for resumed sessions: a resumed agent that previously sent (and provider-rejected) a 200k+-token transcript will now see only the most recent max_history_messages of that transcript on iteration 1. Older context is dropped at the wire layer, mirroring what trim_history already does for in-process histories. Logged at warn when this actually triggers.

Related

Summary by CodeRabbit

  • Bug Fixes

    • Session transcript history is now properly limited to the configured maximum message count when resuming conversations. This ensures that long prior conversation histories don't unnecessarily consume memory or impact performance during session restoration.
  • Tests

    • Added tests covering transcript history bounding and session resume behavior.

Review Change Stack

Added a new method to limit the number of cached transcript messages to the configured maximum while preserving the leading system message if present. Updated the resume logic to utilize this method, ensuring that the cached messages do not exceed the defined history window. Added logging to warn when messages are trimmed.
…essages

Introduced a new test to verify that the transcript resume functionality correctly limits the number of cached messages to the configured maximum, ensuring the leading system message is preserved. This test checks the integrity of the resumed messages after persisting a session transcript with more messages than the limit.
Enhanced the test suite by adding two new tests to verify the behavior of the transcript message bounding functionality. The first test ensures that the history window limit is respected when resuming messages, while the second test checks that the cached messages retain the correct tail when exceeding the maximum limit. These tests help confirm the integrity of message handling in various scenarios.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29437015-b149-439c-831a-029a3f4efcd1

📥 Commits

Reviewing files that changed from the base of the PR and between 1f98614 and 3cee419.

📒 Files selected for processing (4)
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/tests.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/turn_tests.rs

📝 Walkthrough

Walkthrough

This PR enforces history window limits on resumed cached transcripts. A new helper method bounds ChatMessage vectors to max_history_messages while preserving system messages. The helper is integrated into session resume and seed resume paths, each with logging and tests confirming tail-message preservation.

Changes

Transcript history window bounding

Layer / File(s) Summary
Bounded transcript cache helper
src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/session/tests.rs
New bound_cached_transcript_messages method limits ChatMessage vectors to max_history_messages, preserving a leading system message and otherwise keeping only the most recent messages. Unit test validates behavior when input lacks system prefix.
Session transcript resume bounding
src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/session/turn_tests.rs
try_load_session_transcript applies the bounding helper to loaded session.messages and logs when truncation occurs. Integration test persists an overlong transcript, resumes with max_history_messages=5, and asserts the cached window contains system message plus last two user/assistant pairs.
Seed resume context bounding
src/openhuman/agent/harness/session/runtime.rs, src/openhuman/agent/harness/session/tests.rs
seed_resume_from_messages bounds the prepared cached transcript before storage and logs with before/after counts when history is reduced. Unit test configures the history limit, seeds multiple prior turns, and asserts truncation preserves system message and tail turns only.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 When old transcripts seek to return,
We trim their tales at history's turn,
System whispers stay pristine and clear,
While recent echoes ring most dear,
Messages bounded by the cap,
No lengthy past can close that gap!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: bounding cached resume transcripts by the max_history_messages configuration, which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 19, 2026 15:52
@YellowSnnowmann YellowSnnowmann requested a review from a team May 19, 2026 15:52
@CodeGhost21 CodeGhost21 self-requested a review May 19, 2026 18:08
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking review — fix is sound, scoped, and tested. Two observations worth surfacing:

1. Message-count bound vs. token-count root cause. The Sentry case is a 203,783 / 202,752 token overflow. With any non-trivial max_history_messages, a handful of large tool-result messages can still exceed the context window on iteration 1, because reduce_before_call only fires on iteration 2+. This change reduces the failure rate substantially but doesn't fully eliminate the class of bug. The PR description acknowledges this and defers a token-aware bound — worth filing a follow-up issue for that path if one doesn't already exist, so it doesn't get lost.

2. max_history_messages = 1 + system prefix is degenerate. With max=1 and a leading system message, bound_cached_transcript_messages returns [system] only — no user/assistant message at all, which most providers will reject with a different 400. The .max(1) floor protects against underflow but the implicit contract is really "max ≥ 2 when a system message is present." Nobody sets max=1 in practice, so this is theoretical, but a .max(2) floor in the system-prefix branch would make the helper self-consistent. Trivial change if desired.

LGTM otherwise — helper is well-placed, doc-commented for the non-obvious "why," and the symmetric application at both resume entry points is the right shape for the fix.

@senamakel senamakel merged commit 525d7c7 into tinyhumansai:main May 19, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants