Skip to content

fix(agent): guard agent prompts against model max-context limit (#2074)#2100

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
CodeGhost21:cursor/a01-2074-agent-token-budget
May 20, 2026
Merged

fix(agent): guard agent prompts against model max-context limit (#2074)#2100
senamakel merged 4 commits into
tinyhumansai:mainfrom
CodeGhost21:cursor/a01-2074-agent-token-budget

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 18, 2026

Summary

  • Add inference::model_context to resolve known per-model context windows (OpenHuman tier aliases, Copilot Haiku 200K, common Claude/GPT patterns).
  • Add agent::harness::token_budget to estimate prompt tokens (~4 chars/token) and trim oldest non-system history before LLM dispatch.
  • Wire pre-dispatch trimming + warning logs into run_tool_call_loop and the main Agent session turn path.
  • Populate Route::context_window when building model routes from config.

Problem

The agent harness built conversation context (system prompt + history + tool results) without checking total token count against the target model's context window before dispatching to the LLM. Long conversations or large tool outputs exceeded limits (e.g. github_copilot/claude-haiku-4.5 with a 200K cap), causing upstream 400 BadRequest errors (~160 Sentry events on Windows).

Solution

Before each provider chat() call, look up the model's context window via context_window_for_model, estimate total prompt tokens, and drop the oldest non-system messages until the payload fits (reserving ~10% / 8K tokens for completion and tool-schema overhead). When trimming occurs, emit a log::warn! with model id, context window, original vs final token estimates, and messages removed. ContextGuard is seeded with the known window when available so post-response utilization checks remain meaningful.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — enforced server-side by .github/workflows/coverage.yml (Vitest + cargo-llvm-cov merged via diff-cover). Local full pnpm test:coverage / pnpm test:rust not run in this session; new Rust modules include dedicated unit tests for pass-through, over-limit truncation, and model lookup edge cases. CI is the authoritative gate.
  • Coverage matrix updated — N/A: Rust-only agent/inference change; no user-facing feature row added/removed
  • All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no matrix feature IDs
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Desktop agent turns and channel/tool-loop paths now proactively trim oversized histories before upstream LLM calls when the model window is known.
  • Unknown models skip pre-dispatch trimming (no guessing); existing post-response ContextGuard / autocompaction behavior is unchanged.
  • No migration or config changes required.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: cursor/a01-2074-agent-token-budget
  • Commit SHA: 168f0cd5976111b3fb5d71b951918bc56bad98fe

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: cargo test --lib token_budget:: (5 passed), cargo test --lib model_context:: (4 passed)
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml --all --check; pre-push pnpm rust:check (Tauri shell) passed
  • Tauri fmt/check (if changed): included in pre-push hook

Validation Blocked

  • command: pnpm test:coverage and pnpm test:rust (full diff-cover gate)
  • error: Not run locally in this session — CI will enforce ≥80% on changed lines
  • impact: Merge gate depends on GitHub Actions coverage workflow; new Rust modules include dedicated unit tests for pass-through, over-limit truncation, and model lookup edge cases

Behavior Changes

  • Intended behavior change: Pre-dispatch token budget enforcement for known models
  • User-visible effect: Long agent conversations should stop failing with upstream context-length 400 errors; users may see slightly shorter retained history after trimming (logged at warn level)

Parity Contract

  • Legacy behavior preserved: Unknown models and post-response compaction/summarization paths unchanged
  • Guard/fallback/dispatch parity checks: ContextGuard still updated from provider usage; trimming only runs when context_window_for_model returns Some

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): N/A

Made with Cursor

Summary by CodeRabbit

  • New Features
    • Automatic conversation-history trimming before each model request to respect model context limits.
    • Per-model context-window detection and application, including a mapping of known models to their token windows.
    • Preservation of important system messages when trimming; trimming events are logged with token/message statistics.
    • Enhanced routing metadata now carries model context-window information for more accurate dispatching.

Review Change Stack

…humansai#2074)

Estimate conversation tokens before LLM dispatch using known per-model
context windows, trim oldest non-system history when over budget, and log
warnings with original vs trimmed counts.

Co-authored-by: Cursor <cursoragent@cursor.com>
@CodeGhost21 CodeGhost21 requested a review from a team May 18, 2026 12:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 829d6233-ec59-4f98-8d7e-07888a224d72

📥 Commits

Reviewing files that changed from the base of the PR and between 168f0cd and 1c0cc95.

📒 Files selected for processing (3)
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/token_budget.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/token_budget.rs

📝 Walkthrough

Walkthrough

Adds model-context-window discovery, token estimation, and pre-dispatch budgeting: resolves per-model context windows, estimates tokens, trims oldest non-system messages (preserving system messages) in conversation history and provider-bound chat before each LLM request, and surfaces trimming outcomes via logs.

Changes

Token Budgeting and Context Window Integration

Layer / File(s) Summary
Model context window resolution
src/openhuman/inference/model_context.rs, src/openhuman/inference/mod.rs
Adds context_window_for_model(model: &str) -> Option<u64> with tier constants and substring mappings; includes unit tests and re-exports the function from inference.
Token estimation and message trimming
src/openhuman/agent/harness/token_budget.rs, src/openhuman/agent/harness/mod.rs
Implements ~4-char-per-token estimators for ChatMessage and ConversationMessage, DEFAULT_OUTPUT_RESERVE_TOKENS, and two public trim entrypoints (trim_chat_messages_to_budget, trim_conversation_history_to_budget) that remove oldest non-system messages until budget fits; includes TokenBudgetOutcome and unit tests.
Provider routing context window integration
src/openhuman/inference/provider/router.rs, src/openhuman/inference/provider/ops.rs, src/openhuman/inference/provider/router_test.rs
Adds context_window: Option<u64> to Route and populates it during provider creation via context_window_for_model for each configured model route.
Agent harness pre-dispatch trimming
src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/mod.rs
Imports context lookup and trimming helpers, computes per-iteration context_window, trims self.history and provider-bound messages to budget before dispatch, and derives ContextGuard using the model context window when available.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus

Tokens tallied, old lines gently pruned, 🐇
Context windows set so requests stay tuned,
System lines kept, oldest go to sleep,
Trimmed with a hop so responses fit deep.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding token-budget guards to prevent agent prompts from exceeding model context limits, directly addressing the issue reference #2074.
Linked Issues check ✅ Passed All objectives from #2074 are met: per-model context limits added to routing, pre-dispatch token estimation and truncation implemented, oldest messages trimmed first, and warning logs added for truncation events.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing token-budget guards: model context resolution, token estimation/trimming logic, integration into dispatch paths, and routing configuration updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@CodeGhost21 CodeGhost21 added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug labels May 18, 2026
@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/inference/provider/router.rs (1)

63-69: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Route.context_window is discarded during router construction.

At Line 63 onward, RouterProvider only stores (provider_index, model), so the new context_window metadata is dropped immediately. That prevents downstream budget enforcement from using the configured routed-model window.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/router.rs` around lines 63 - 69, The router
is dropping Route.context_window because resolved_routes is built as
HashMap<String, (usize, String)> and only stores (provider_index, model); update
the resolved_routes shape and the RouterProvider construction to preserve the
context window (e.g., include a third element or a small struct/tuple containing
the context_window), propagate Route.context_window from routes.into_iter()
(using name_to_index to map provider_name) into that stored value, and adjust
any uses of RouterProvider/resolved_routes to read the context window for
downstream budget enforcement (referencing resolved_routes,
Route.context_window, name_to_index, and RouterProvider).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 427-440: The pre-dispatch trimming is using
context_window_for_model(&effective_model) (and the same pattern around the
later block at 528-541) which uses the alias/tier’s window instead of the actual
routed provider model’s window; change the call sites to compute the context
window from the resolved provider model ID used for the provider call in this
iteration (the same resolved model/route value used when making the provider
request) and pass that into trim_conversation_history_to_budget instead of
&effective_model so trimming reflects the real upstream model limit.

In `@src/openhuman/agent/harness/token_budget.rs`:
- Around line 111-143: The current logic in trim_messages_to_budget (the block
using system_messages and other_messages, is_system, estimate, and max_tokens)
rebuilds messages as system_messages followed by other_messages which breaks
original ordering when system messages appear after user/other messages; instead
identify non-system message indices in the original messages vector, compute how
many oldest non-system indices must be removed to meet max_tokens (using
estimate and keeping all system messages), then reconstruct *messages by
iterating the original messages in order and skipping only the selected
non-system indices (removing oldest-first) so relative order of all remaining
messages is preserved; update any counters like removed accordingly.

---

Outside diff comments:
In `@src/openhuman/inference/provider/router.rs`:
- Around line 63-69: The router is dropping Route.context_window because
resolved_routes is built as HashMap<String, (usize, String)> and only stores
(provider_index, model); update the resolved_routes shape and the RouterProvider
construction to preserve the context window (e.g., include a third element or a
small struct/tuple containing the context_window), propagate
Route.context_window from routes.into_iter() (using name_to_index to map
provider_name) into that stored value, and adjust any uses of
RouterProvider/resolved_routes to read the context window for downstream budget
enforcement (referencing resolved_routes, Route.context_window, name_to_index,
and RouterProvider).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7a6f7f6d-758a-44a2-8ea3-7b6460aae33f

📥 Commits

Reviewing files that changed from the base of the PR and between 70fdedc and 168f0cd.

📒 Files selected for processing (9)
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/token_budget.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/inference/mod.rs
  • src/openhuman/inference/model_context.rs
  • src/openhuman/inference/provider/ops.rs
  • src/openhuman/inference/provider/router.rs
  • src/openhuman/inference/provider/router_test.rs

Comment on lines +427 to +440
if let Some(context_window) = context_window_for_model(&effective_model) {
let budget_outcome =
trim_conversation_history_to_budget(&mut self.history, context_window);
if budget_outcome.trimmed {
log::warn!(
"[agent_loop] pre-dispatch history trimmed model={} context_window={} original_tokens={} final_tokens={} messages_removed={}",
effective_model,
context_window,
budget_outcome.original_tokens,
budget_outcome.final_tokens,
budget_outcome.messages_removed
);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Budgeting uses alias model window, not the actually routed model window.

Both trims use context_window_for_model(&effective_model). When effective_model is a tier alias (e.g., reasoning-v1) but routing resolves to a different provider model, the budget can be wrong and still exceed the real upstream limit.

The pre-dispatch budget should use the resolved route/model context window (or resolved model id) used by the provider call in the same iteration.

Also applies to: 528-541

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/session/turn.rs` around lines 427 - 440, The
pre-dispatch trimming is using context_window_for_model(&effective_model) (and
the same pattern around the later block at 528-541) which uses the alias/tier’s
window instead of the actual routed provider model’s window; change the call
sites to compute the context window from the resolved provider model ID used for
the provider call in this iteration (the same resolved model/route value used
when making the provider request) and pass that into
trim_conversation_history_to_budget instead of &effective_model so trimming
reflects the real upstream model limit.

Comment on lines +111 to +143
let mut system_messages: Vec<T> = Vec::new();
let mut other_messages: Vec<T> = Vec::new();
for msg in messages.drain(..) {
if is_system(&msg) {
system_messages.push(msg);
} else {
other_messages.push(msg);
}
}

let mut removed = 0usize;
while !other_messages.is_empty() {
let total: usize = system_messages
.iter()
.map(&estimate)
.chain(other_messages.iter().map(&estimate))
.sum();
if total <= max_tokens {
break;
}
other_messages.remove(0);
removed += 1;
}

let final_tokens: usize = system_messages
.iter()
.map(&estimate)
.chain(other_messages.iter().map(&estimate))
.sum();

*messages = system_messages;
messages.extend(other_messages);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve original message order while trimming.

trim_messages_to_budget currently rebuilds the vector as system_messages + other_messages, which reorders history if any system message appears after non-system messages. That can change prompt semantics unexpectedly. Keep original relative order and only drop selected non-system indices oldest-first.

Suggested fix
-    let mut system_messages: Vec<T> = Vec::new();
-    let mut other_messages: Vec<T> = Vec::new();
-    for msg in messages.drain(..) {
-        if is_system(&msg) {
-            system_messages.push(msg);
-        } else {
-            other_messages.push(msg);
-        }
-    }
+    let mut kept: Vec<T> = messages.drain(..).collect();
+    let mut removable_positions: Vec<usize> = kept
+        .iter()
+        .enumerate()
+        .filter_map(|(idx, msg)| (!is_system(msg)).then_some(idx))
+        .collect();

-    let mut removed = 0usize;
-    while !other_messages.is_empty() {
-        let total: usize = system_messages
-            .iter()
-            .map(&estimate)
-            .chain(other_messages.iter().map(&estimate))
-            .sum();
+    let mut removed = 0usize;
+    while !removable_positions.is_empty() {
+        let total: usize = kept.iter().map(&estimate).sum();
         if total <= max_tokens {
             break;
         }
-        other_messages.remove(0);
+        let remove_at = removable_positions.remove(0) - removed;
+        kept.remove(remove_at);
         removed += 1;
     }

-    let final_tokens: usize = system_messages
-        .iter()
-        .map(&estimate)
-        .chain(other_messages.iter().map(&estimate))
-        .sum();
-
-    *messages = system_messages;
-    messages.extend(other_messages);
+    let final_tokens: usize = kept.iter().map(&estimate).sum();
+    *messages = kept;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/token_budget.rs` around lines 111 - 143, The
current logic in trim_messages_to_budget (the block using system_messages and
other_messages, is_system, estimate, and max_tokens) rebuilds messages as
system_messages followed by other_messages which breaks original ordering when
system messages appear after user/other messages; instead identify non-system
message indices in the original messages vector, compute how many oldest
non-system indices must be removed to meet max_tokens (using estimate and
keeping all system messages), then reconstruct *messages by iterating the
original messages in order and skipping only the selected non-system indices
(removing oldest-first) so relative order of all remaining messages is
preserved; update any counters like removed accordingly.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — graycyrus

Solid fix for a real production issue (160 Sentry events from context-window overflows). The approach is sound: look up per-model context windows, estimate tokens, trim oldest non-system messages before dispatch, and log when trimming occurs. Tests cover the key paths, CI is green, and the ContextGuard integration in tool_loop.rs is a nice touch.

CodeRabbit already flagged the alias-vs-routed-model mismatch in turn.rs and the message reordering in trim_messages_to_budget — both valid, won't repeat here.

Change summary

File Change Description
agent/harness/mod.rs module registration Adds mod token_budget
agent/harness/session/turn.rs feature Pre-dispatch trimming of conversation history + provider messages
agent/harness/token_budget.rs new module Generic trim-to-budget with ~4 chars/token estimation, 5 unit tests
agent/harness/tool_loop.rs feature Pre-dispatch trimming + ContextGuard seeded with known window
inference/mod.rs wiring Exports model_context module
inference/model_context.rs new module Model-to-context-window lookup table + tier alias resolution, 4 unit tests
inference/provider/ops.rs wiring Populates Route::context_window from context_window_for_model
inference/provider/router.rs schema Adds context_window: Option<u64> to Route
inference/provider/router_test.rs test fix Adds context_window: None to test fixture

Additional finding

Trim + warn log pattern duplication (minor): The same lookup → trim → warn-log block is repeated three times across turn.rs (lines 427-438 and 528-541) and tool_loop.rs (lines 243-264). Consider extracting a small helper (e.g. try_trim_to_budget(messages, model, label)) to keep the log format and reserve logic in one place — right now a change to the log fields or reserve policy requires updating three call sites.

("gpt-4-turbo", 128_000),
("gpt-4", 128_000),
("gpt-3.5", 16_385),
("o1", 200_000),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] "o1" and "o3" are dangerously broad for a contains() match.

Any model ID that happens to include the substring o1 or o3 will get a 200K context window — e.g. a local model named "solo1-7b", "proto3-chat", or a provider-prefixed path like "ollama/mistral-for-o1-benchmark". Since this sets the budget for trimming, a false positive means either:

  • Under-trimming (real window < 200K) → the exact 400 errors this PR fixes
  • Over-trimming (real window > 200K) → unnecessary context loss

Suggestion: use word-boundary-aware matching or exact prefixes:

// Option A: require the pattern to be a full segment
("o1", 200_000),   // match "o1", "o1-mini", "o1-preview", NOT "proto3"
("o3", 200_000),   // match "o3", "o3-mini", NOT "solo3"
// In the matching logic:
let segments: Vec<&str> = lower.split(&['/', '-', '_', ':'][..]).collect();
for (pattern, window) in MODEL_CONTEXT_PATTERNS {
    if segments.iter().any(|s| s == pattern) || lower.contains(pattern) {
        // ...
    }
}

Or simply use longer patterns: "o1-", "o1" (exact), "/o1" depending on how model IDs are formatted in practice.

@senamakel senamakel self-assigned this May 19, 2026
senamakel added 2 commits May 19, 2026 16:47
trim_messages_to_budget previously rebuilt the vector as
system_messages ++ other_messages, which reordered history whenever a
system message appeared after non-system messages — changing prompt
semantics unexpectedly. Walk the original vector instead and drop only
the oldest non-system indices, preserving relative order of every
retained message. Adds a regression test for a late-positioned system
message.

Addresses CodeRabbit review on token_budget.rs:143.
…ives

The "o1" and "o3" substring patterns matched any model id containing
those two characters — e.g. "solo1-7b", "proto3-chat", or
"ollama/octo3thing" would inherit OpenAI's 200K window and either
under-trim (real window < 200K, exact 400-error class this PR is meant
to fix) or over-trim. Mark these patterns as segment-matched so they
only trigger when "o1"/"o3" appears as a full `-`/`_`/`/`/`:`/`.`
delimited segment. Adds tests for legitimate ids (`o1`, `o1-mini`,
`o3-mini`, `openai/o1-preview`) and false-positive guards (`solo1-7b`,
`proto3-chat`, `octo3thing`).

Addresses graycyrus review on model_context.rs:32.
"[agent_loop] pre-dispatch token budget ok"
);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — deferring to a follow-up.

Agreed the trim+warn block is duplicated across three call sites. The cleanest extraction needs to handle both ChatMessage and ConversationMessage (different trim functions), plus the ContextGuard seeding in tool_loop, so it's a small refactor rather than a one-liner. Keeping this PR scoped to the production fix; will follow up with a try_trim_to_budget(label, model, …) helper in a separate change.

self.history.len()
);

if let Some(context_window) = context_window_for_model(&effective_model) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — needs router plumbing.

This is a valid concern, but fixing it cleanly requires plumbing the resolved-route context window from RouterProvider out to the harness (the harness only sees the alias model id; route resolution happens inside Provider::chat). Route::context_window is now populated, but no public API exposes it to the caller.

For the OpenHuman backend path (no model_routes), context_window_for_model(alias) already returns the correct tier window (200K for reasoning-v1 / agentic-v1 / coding-v1) via tier_context_window, so this PR's behavior is correct there.

The remaining gap — user-configured model_routes pointing a tier at a smaller-context custom-provider model (e.g. reasoning-v1gpt-3.5-turbo) — would trim too late. Worth a follow-up that exposes the resolved context window through Provider (e.g. fn resolve_context_window(&self, model: &str) -> Option<u64>) and threads it into the harness. Tracking separately to keep this PR scoped to the 200K-haiku Sentry fix.

@senamakel senamakel merged commit 389d699 into tinyhumansai:main May 20, 2026
29 checks passed
mtkik pushed a commit to mtkik/openhuman-meet that referenced this pull request May 21, 2026
…humansai#2074) (tinyhumansai#2100)

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>
CodeGhost21 added a commit to CodeGhost21/openhuman that referenced this pull request May 22, 2026
…humansai#2074) (tinyhumansai#2100)

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Steven Enamakel <enamakel@tinyhumans.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(agent): no token count guard — prompt exceeds model max context

3 participants