Skip to content

Feature/configurable memory compression#454

Open
seanturner83 wants to merge 4 commits into
usestrix:mainfrom
seanturner83:feature/configurable-memory-compression
Open

Feature/configurable memory compression#454
seanturner83 wants to merge 4 commits into
usestrix:mainfrom
seanturner83:feature/configurable-memory-compression

Conversation

@seanturner83
Copy link
Copy Markdown
Contributor

Summary

Adds three environment variables to tune the memory compressor for large-scale scanning campaigns, plus an additional prompt cache breakpoint for improved cache hit rates on Anthropic models.

New environment variables

Variable Default Purpose
STRIX_MAX_CONTEXT_TOKENS 100000 Token threshold before compression triggers
STRIX_MIN_RECENT_MESSAGES 15 Messages preserved from compression
STRIX_MAX_TOOL_OUTPUT_CHARS 0 (off) Truncate oversized tool outputs at ingestion, keeping first 60% + last 40% with notice

Prompt caching improvement

Adds a second cache_control breakpoint on the agent identity message (<agent_identity> tag), which is stable for the lifetime of each agent. This complements the existing system prompt breakpoint. Related to #279.

Motivation

Strix's agentic architecture (6 agents, ~600 LLM calls per scan, full history resent every call) can produce 1.5M–26M input tokens per scan on large repos. The memory compressor currently only triggers at 90% of 100K tokens — by which point the cost damage is already done. Oversized tool outputs (nmap scans, large file reads) accumulate in history and get resent hundreds of times.

These changes make compression tunable and add tool output truncation at ingestion to prevent context bloat at the source.

A/B test results

Tested on a production corpus of 800 repositories at a crypto infrastructure company. Settings: MAX_CONTEXT_TOKENS=40000, MIN_RECENT_MESSAGES=10, MAX_TOOL_OUTPUT_CHARS=8000.

Per-scan comparison (large TypeScript repo with confirmed critical findings):

Metric Stock Optimized Delta
Findings 15 (10C/5H) 16 (9C/7H) +1 finding
Cost $20.62 $9.55 −54%
Input tokens 8.7M 5.5M −37%
Cache hit rate 34% 56% +22pp

At scale (15-scan sample): Average cost dropped from $8.66 to $4.52 per scan.

No findings were lost. The optimised configuration actually found one additional vulnerability that the stock configuration missed (likely because the stock run hit context limits and lost relevant earlier context).

Changes

  • strix/config/config.py — 3 new config variables
  • strix/llm/memory_compressor.py — Configurable thresholds, truncate_tool_outputs() method called before compression in compress_history(), _truncate_tool_output() helper preserving head + tail
  • strix/llm/llm.py — Second cache breakpoint on agent identity message

All changes are backwards compatible — default behaviour is unchanged when env vars are not set.

Test plan

  • A/B tested on production corpus (800 repos, multiple repo sizes/languages)
  • Verified no finding quality regression
  • Confirmed defaults match current behaviour (no env vars = no change)
  • Unit tests for _truncate_tool_output() edge cases (happy to add if wanted)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR adds three environment variables (STRIX_MAX_CONTEXT_TOKENS, STRIX_MIN_RECENT_MESSAGES, STRIX_MAX_TOOL_OUTPUT_CHARS) to make the memory compressor tunable, plus a second cache_control breakpoint on the agent identity message. All defaults preserve existing behaviour.

Confidence Score: 5/5

Safe to merge; only P2 findings, defaults are unchanged, and the previous-thread blocking issues have all been addressed.

No P0 or P1 findings. The double-truncation original_len inaccuracy and the persisted-config note are both P2 quality/observability concerns that don't affect correctness or security. The earlier concerns about role-guard mutations and missing list-branch caching are resolved in this revision.

No files require special attention.

Important Files Changed

Filename Overview
strix/config/config.py Adds three new nullable config class attributes for memory/context tuning; correctly picked up by _tracked_names and thus persisted/loaded. Not added to _LLM_CANONICAL_NAMES, which is intentional.
strix/llm/llm.py Adds a second cache_control breakpoint for the agent identity message; previous-thread issues (list-content path not caching, in-place mutation) are now addressed with an elif branch and idempotent mutation.
strix/llm/memory_compressor.py Configurable thresholds and tool-output truncation are implemented correctly with role/type guards; truncation is called pre-threshold-check so it always runs, causing original_len in the notice to go stale on repeated compress_history calls.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 271

Comment:
**Tool output `original_len` becomes stale after first truncation**

`compress_history` is called on every LLM iteration with the full (already-mutated) history. Since `truncate_tool_outputs` runs before the early-return token check, a message that was truncated on a previous call (resulting in `head + notice + tail ≈ max_chars + len(notice)` chars) will be slightly above `max_tool_output_chars` on the next call and will be re-processed. The data bytes stabilise (same head/tail are re-selected), but `original_len` in the notice will reflect the already-truncated size rather than the true original length from the second call onwards — e.g., "of 8140-character output" instead of "of 10000-character output".

A simple guard avoids re-processing content that already carries the marker:

```python
# Direct tool-role messages (string content)
if (
    role == "tool"
    and isinstance(content, str)
    and len(content) > self.max_tool_output_chars
    and "[Output truncated:" not in content
):
    msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
```

The same guard should be applied in the `tool_result` branches inside the `elif isinstance(content, list)` block.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/config/config.py
Line: 24-26

Comment:
**New vars missing from `_LLM_CANONICAL_NAMES` persisted-config clearing logic**

The three new variables are correctly picked up by `_tracked_names()` (lowercase, `None` default) so they are saved and loaded through `Config.save_current()` / `Config.apply_saved()`. However, `_LLM_CANONICAL_NAMES` drives the stale-config reset in `_llm_env_changed()`: when a saved config file is detected alongside different live LLM env vars the whole LLM block is cleared. The memory-tuning variables are **not** LLM-auth config, so excluding them here is correct.

Worth noting though: a user who has `STRIX_MAX_CONTEXT_TOKENS` persisted in `~/.strix/cli-config.json` and then unsets it by clearing the env var will have it silently re-applied on the next run (the `cleared_vars` logic only strips vars that are set to `""` in the environment, not vars that are simply absent). This is existing behaviour for all nullable config vars and not unique to this PR, but may surprise operators tuning these settings interactively.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "chore: retrigger review" | Re-trigger Greptile

Comment thread strix/llm/memory_compressor.py
Comment thread strix/llm/memory_compressor.py
Comment thread strix/llm/llm.py
@bearsyankees
Copy link
Copy Markdown
Collaborator

@greptile

seanturner83 added a commit to seanturner83/strix that referenced this pull request Apr 29, 2026
Pulls usestrix#467 (feat: add resume
session feature) ahead of upstream merge. Enables unattended scans
to survive crashes or forced termination (e.g. AWS STS session
expiry mid-scan) by appending every LLM message to
strix_runs/<run>/conversation.jsonl and replaying on --resume.

New CLI:
  strix --list-sessions         # tabulate past scans
  strix --continue (or -c)      # resume most recent
  strix --resume <run_name>     # resume by name
  strix --resume                # open interactive picker

Motivates:
  https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499
  — trade-api flag-aware scan exited at 71min with
    APIConnectionError ("security token included in the request is
    expired"). Session had produced 10 findings; all lost once the
    report upload also hit ExpiredToken. With this commit merged,
    the CI composite action can wrap strix in a re-assume + --continue
    loop so each sub-session fits inside the 60min STS budget and
    total scan duration is bounded only by the overall workflow
    timeout.

Upstream integration:
  - Once usestrix#467 merges, drop from the seedcx-build recipe
  - Until then, include alongside usestrix#454 (memory compression), usestrix#460
    (truncate retry), usestrix#468 (thinking retry), this one, and the
    adaptive-thinking commit on this branch

Clean 3-way merge, no conflicts. Smoke tests:
  - uv sync succeeds
  - strix --help surfaces --resume / --continue / --list-sessions
  - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK

Follow-up (separate): strix-scan-workflow README recipe + composite
action consumer changes to invoke resume on re-dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants