fix(llm): include system prompt tokens in memory compressor budget by 0xhis · Pull Request #381 · usestrix/strix

0xhis · 2026-03-21T08:01:35Z

Summary

The memory compressor was not accounting for system prompt and agent identity tokens when calculating the conversation budget. This caused premature history truncation on long scans with large system prompts.

Changes

Add reserved_tokens parameter to compress_history() that subtracts already-accounted tokens from the budget before applying limits
Calculate reserved_tokens from system prompt and agent identity messages in _prepare_messages()

Files Changed

strix/llm/llm.py (+7/-2)
strix/llm/memory_compressor.py (+8/-1)

Split from #328.

greptile-apps · 2026-03-21T08:04:52Z

Greptile Summary

This PR fixes a budget-accounting gap in the memory compressor: the system prompt and agent-identity messages were not included in the token total, so compression could be triggered too late (or not at all) on long scans with large system prompts.

Changes:

compress_history() gains a reserved_tokens: int = 0 parameter that is added to the conversation-history token count before the 90 % threshold check. The default of 0 is fully backward-compatible.
_prepare_messages() computes reserved_tokens from the frame messages it builds (system prompt + optional agent-identity block) before calling compress_history(), correctly separating the two accounting domains.
_get_message_tokens is re-exported from memory_compressor.py and imported in llm.py to avoid duplicating the token-counting logic. The symbol carries a leading _, which conventionally marks it as a private/internal helper — consider renaming it to get_message_tokens or exposing it through the MemoryCompressor class to make the public-export intent clear.

Confidence Score: 5/5

Safe to merge; the fix is correct, backward-compatible, and well-scoped.
The root cause (missing frame-token accounting) is addressed directly and the approach is sound. The reserved_tokens default of 0 preserves existing behavior for all other callers. The only finding is a non-blocking style suggestion about exporting a private symbol across module boundaries, which does not affect correctness or safety.
No files require special attention.

Important Files Changed

Filename	Overview
strix/llm/llm.py	Calculates `reserved_tokens` from the system-prompt and agent-identity messages before compression, and forwards them to `compress_history()`. Logic is correct; minor style concern around importing the private `_get_message_tokens` symbol.
strix/llm/memory_compressor.py	Adds `reserved_tokens: int = 0` parameter to `compress_history()` and folds it into the `total_tokens` sum before the 90 % threshold check. Backward-compatible default and clear docstring. No issues found.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 13

Comment:
**Importing a private helper across module boundaries**

`_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:
1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or
2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic.

```suggestion
from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "fix(llm): include sy..."}

greptile-apps · 2026-03-21T08:04:55Z

strix/llm/llm.py

 from strix.config import Config
 from strix.llm.config import LLMConfig
-from strix.llm.memory_compressor import MemoryCompressor
+from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens


Importing a private helper across module boundaries

_get_message_tokens is prefixed with _, which conventionally signals it is an internal implementation detail of memory_compressor.py. Importing it directly into llm.py couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice.

Consider either:

Renaming it to get_message_tokens (dropping the leading _) to make the public-export intent explicit, or

Exposing a small helper on MemoryCompressor itself (e.g. MemoryCompressor.count_tokens(messages)) so the compressor owns all token-counting logic.

Suggested change

from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens

from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/llm/llm.py Line: 13 Comment: **Importing a private helper across module boundaries** `_get_message_tokens` is prefixed with `_`, which conventionally signals it is an internal implementation detail of `memory_compressor.py`. Importing it directly into `llm.py` couples the two modules to an internal symbol that could be freely refactored or removed without a public-API change notice. Consider either: 1. Renaming it to `get_message_tokens` (dropping the leading `_`) to make the public-export intent explicit, or 2. Exposing a small helper on `MemoryCompressor` itself (e.g. `MemoryCompressor.count_tokens(messages)`) so the compressor owns all token-counting logic. ```suggestion from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens ``` How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Copilot

Pull request overview

This PR fixes token budgeting in the LLM memory compressor so that system prompt and agent identity “framing” tokens are accounted for when deciding whether (and how much) to compress conversation history, preventing unexpected truncation behavior on long runs with large system prompts.

Changes:

Added a reserved_tokens parameter to MemoryCompressor.compress_history() to incorporate non-history prompt tokens into the budget calculation.
Calculated reserved_tokens in LLM._prepare_messages() from the system prompt + agent identity messages and passed it into the compressor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`strix/llm/memory_compressor.py`	Extends history compression budgeting with a `reserved_tokens` parameter and uses it in token calculations.
`strix/llm/llm.py`	Computes reserved/system framing tokens during message preparation and passes them to the memory compressor.

Comments suppressed due to low confidence (1)

strix/llm/memory_compressor.py:216

compress_history() can still return a history that exceeds the token budget when reserved_tokens (or the non-compressible system/recent messages) already consume most/all of the limit. Consider explicitly handling reserved_tokens >= MAX_TOTAL_TOKENS * 0.9 (and/or re-checking after summarization) by dropping more of the history (e.g., reduce recent messages, return only system messages, or return an empty history) so the final prompt stays within limits.

        total_tokens = reserved_tokens + sum(
            _get_message_tokens(msg, model_name) for msg in system_msgs + regular_msgs
        )

        if total_tokens <= MAX_TOTAL_TOKENS * 0.9:
            return messages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T08:06:15Z

strix/llm/memory_compressor.py

+        Args:
+            messages: Conversation history messages to compress.
+            reserved_tokens: Tokens already reserved for system prompt and
+                other framing messages outside the conversation history.
+                Subtracted from the budget before checking limits.
+


The docstring says reserved_tokens is “Subtracted from the budget before checking limits”, but the implementation adds it into total_tokens and compares against the fixed budget. Either update the wording to match the behavior (reserved tokens are included in the total prompt token count) or change the logic to compute an available_budget = MAX_TOTAL_TOKENS * 0.9 - reserved_tokens and compare history tokens against that.

Copilot · 2026-03-21T08:06:15Z

strix/llm/llm.py

 from strix.config import Config
 from strix.llm.config import LLMConfig
-from strix.llm.memory_compressor import MemoryCompressor
+from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens


llm.py imports _get_message_tokens from memory_compressor.py, but the leading underscore indicates this is intended as a private helper. To reduce coupling, consider exposing a public token-counting helper (e.g., get_message_tokens) or adding a MemoryCompressor.get_message_tokens() method, and have LLM call that instead of importing a private symbol.

Suggested change

from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens

from strix.llm.memory_compressor import MemoryCompressor

The memory compressor was not accounting for system prompt and agent identity tokens when calculating the conversation budget. This caused premature history truncation on long scans with large system prompts. Adds a reserved_tokens parameter to compress_history() that subtracts already-accounted tokens from the budget before applying limits.

0xhis mentioned this pull request Mar 21, 2026

feat: OWASP WSTG methodology alignment & TUI live status #328

Closed

0xhis marked this pull request as ready for review March 21, 2026 08:02

Copilot AI review requested due to automatic review settings March 21, 2026 08:02

Copilot started reviewing on behalf of 0xhis March 21, 2026 08:03 View session

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Copilot AI reviewed Mar 21, 2026

View reviewed changes

0xhis force-pushed the fix/memory-compressor-token-budget branch from 2502a59 to 22d978e Compare March 21, 2026 08:28

0xhis added 2 commits March 21, 2026 01:49

refactor(llm): rename _get_message_tokens to public API name

63035db

0xhis force-pushed the fix/memory-compressor-token-budget branch from 22d978e to 63035db Compare March 21, 2026 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): include system prompt tokens in memory compressor budget#381

fix(llm): include system prompt tokens in memory compressor budget#381
0xhis wants to merge 2 commits intousestrix:mainfrom
0xhis:fix/memory-compressor-token-budget

0xhis commented Mar 21, 2026

Uh oh!

greptile-apps bot commented Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	from strix.llm.memory_compressor import MemoryCompressor, _get_message_tokens
	from strix.llm.memory_compressor import MemoryCompressor, get_message_tokens

Conversation

0xhis commented Mar 21, 2026

Summary

Changes

Files Changed

Uh oh!

greptile-apps bot commented Mar 21, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants