Skip to content

fix(llm): include system prompt tokens in memory compressor budget#381

Open
0xhis wants to merge 2 commits intousestrix:mainfrom
0xhis:fix/memory-compressor-token-budget
Open

fix(llm): include system prompt tokens in memory compressor budget#381
0xhis wants to merge 2 commits intousestrix:mainfrom
0xhis:fix/memory-compressor-token-budget

Conversation

@0xhis
Copy link

@0xhis 0xhis commented Mar 21, 2026

Summary

The memory compressor was not accounting for system prompt and agent identity tokens when calculating the conversation budget. This caused premature history truncation on long scans with large system prompts.

Changes

  • Add reserved_tokens parameter to compress_history() that subtracts already-accounted tokens from the budget before applying limits
  • Calculate reserved_tokens from system prompt and agent identity messages in _prepare_messages()

Files Changed

  • strix/llm/llm.py (+7/-2)
  • strix/llm/memory_compressor.py (+8/-1)

Split from #328.

@0xhis 0xhis marked this pull request as ready for review March 21, 2026 08:02
Copilot AI review requested due to automatic review settings March 21, 2026 08:02
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR fixes a budget-accounting gap in the memory compressor: the system prompt and agent-identity messages were not included in the token total, so compression could be triggered too late (or not at all) on long scans with large system prompts.

Changes:

  • compress_history() gains a reserved_tokens: int = 0 parameter that is added to the conversation-history token count before the 90 % threshold check. The default of 0 is fully backward-compatible.
  • _prepare_messages() computes reserved_tokens from the frame messages it builds (system prompt + optional agent-identity block) before calling compress_history(), correctly separating the two accounting domains.
  • _get_message_tokens is re-exported from memory_compressor.py and imported in llm.py to avoid duplicating the token-counting logic. The symbol carries a leading _, which conventionally marks it as a private/internal helper — consider renaming it to get_message_tokens or exposing it through the MemoryCompressor class to make the public-export intent clear.

Confidence Score: 5/5

  • Safe to merge; the fix is correct, backward-compatible, and well-scoped.
  • The root cause (missing frame-token accounting) is addressed directly and the approach is sound. The reserved_tokens default of 0 preserves existing behavior for all other callers. The only finding is a non-blocking style suggestion about exporting a private symbol across module boundaries, which does not affect correctness or safety.
  • No files require special attention.

Important Files Changed

Filename Overview
strix/llm/llm.py Calculates reserved_tokens from the system-prompt and agent-identity messages before compression, and forwards them to compress_history(). Logic is correct; minor style concern around importing the private _get_message_tokens symbol.
strix/llm/memory_compressor.py Adds reserved_tokens: int = 0 parameter to compress_history() and folds it into the total_tokens sum before the 90 % threshold check. Backward-compatible default and clear docstring. No issues found.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 13

Comment:
**Importing a private helper across module boundaries**

`_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:
1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or
2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic.

```suggestion
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix(llm): include sy..."

strix/llm/llm.py Outdated
from strix.config import Config
from strix.llm.config import LLMConfig
from strix.llm.memory_compressor import MemoryCompressor
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Importing a private helper across module boundaries

_get_message_tokens is prefixed with _, which conventionally signals it is an internal implementation detail of memory_compressor.py. Importing it directly into llm.py couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:

  1. Renaming it to get_message_tokens (dropping the leading _) to make the public-export intent explicit, or
  2. Exposing a small helper on MemoryCompressor itself (e.g. MemoryCompressor.count_tokens(messages)) so the compressor owns all token-counting logic.
Suggested change
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 13

Comment:
**Importing a private helper across module boundaries**

`_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:
1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or
2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic.

```suggestion
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes token budgeting in the LLM memory compressor so that system prompt and agent identity “framing” tokens are accounted for when deciding whether (and how much) to compress conversation history, preventing unexpected truncation behavior on long runs with large system prompts.

Changes:

  • Added a reserved_tokens parameter to MemoryCompressor.compress_history() to incorporate non-history prompt tokens into the budget calculation.
  • Calculated reserved_tokens in LLM._prepare_messages() from the system prompt + agent identity messages and passed it into the compressor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
strix/llm/memory_compressor.py Extends history compression budgeting with a reserved_tokens parameter and uses it in token calculations.
strix/llm/llm.py Computes reserved/system framing tokens during message preparation and passes them to the memory compressor.
Comments suppressed due to low confidence (1)

strix/llm/memory_compressor.py:216

  • compress_history() can still return a history that exceeds the token budget when reserved_tokens (or the non-compressible system/recent messages) already consume most/all of the limit. Consider explicitly handling reserved_tokens >= MAX_TOTAL_TOKENS * 0.9 (and/or re-checking after summarization) by dropping more of the history (e.g., reduce recent messages, return only system messages, or return an empty history) so the final prompt stays within limits.
        total_tokens = reserved_tokens + sum(
            _get_message_tokens(msg, model_name) for msg in system_msgs + regular_msgs
        )

        if total_tokens <= MAX_TOTAL_TOKENS * 0.9:
            return messages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +173 to +178
Args:
messages: Conversation history messages to compress.
reserved_tokens: Tokens already reserved for system prompt and
other framing messages outside the conversation history.
Subtracted from the budget before checking limits.

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says reserved_tokens is “Subtracted from the budget before checking limits”, but the implementation adds it into total_tokens and compares against the fixed budget. Either update the wording to match the behavior (reserved tokens are included in the total prompt token count) or change the logic to compute an available_budget = MAX_TOTAL_TOKENS * 0.9 - reserved_tokens and compare history tokens against that.

Copilot uses AI. Check for mistakes.
strix/llm/llm.py Outdated
from strix.config import Config
from strix.llm.config import LLMConfig
from strix.llm.memory_compressor import MemoryCompressor
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm.py imports _get_message_tokens from memory_compressor.py, but the leading underscore indicates this is intended as a private helper. To reduce coupling, consider exposing a public token-counting helper (e.g., get_message_tokens) or adding a MemoryCompressor.get_message_tokens() method, and have LLM call that instead of importing a private symbol.

Suggested change
from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
from strix.llm.memory_compressor import MemoryCompressor

Copilot uses AI. Check for mistakes.
@0xhis 0xhis force-pushed the fix/memory-compressor-token-budget branch from 2502a59 to 22d978e Compare March 21, 2026 08:28
0xhis added 2 commits March 21, 2026 01:49
The memory compressor was not accounting for system prompt and agent
identity tokens when calculating the conversation budget. This caused
premature history truncation on long scans with large system prompts.

Adds a reserved_tokens parameter to compress_history() that subtracts
already-accounted tokens from the budget before applying limits.
@0xhis 0xhis force-pushed the fix/memory-compressor-token-budget branch from 22d978e to 63035db Compare March 21, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants