Feature/configurable memory compression#454
Conversation
Greptile SummaryThis PR adds three environment variables ( Confidence Score: 5/5Safe to merge; only P2 findings, defaults are unchanged, and the previous-thread blocking issues have all been addressed. No P0 or P1 findings. The double-truncation original_len inaccuracy and the persisted-config note are both P2 quality/observability concerns that don't affect correctness or security. The earlier concerns about role-guard mutations and missing list-branch caching are resolved in this revision. No files require special attention. Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 271
Comment:
**Tool output `original_len` becomes stale after first truncation**
`compress_history` is called on every LLM iteration with the full (already-mutated) history. Since `truncate_tool_outputs` runs before the early-return token check, a message that was truncated on a previous call (resulting in `head + notice + tail ≈ max_chars + len(notice)` chars) will be slightly above `max_tool_output_chars` on the next call and will be re-processed. The data bytes stabilise (same head/tail are re-selected), but `original_len` in the notice will reflect the already-truncated size rather than the true original length from the second call onwards — e.g., "of 8140-character output" instead of "of 10000-character output".
A simple guard avoids re-processing content that already carries the marker:
```python
# Direct tool-role messages (string content)
if (
role == "tool"
and isinstance(content, str)
and len(content) > self.max_tool_output_chars
and "[Output truncated:" not in content
):
msg["content"] = _truncate_tool_output(content, self.max_tool_output_chars)
```
The same guard should be applied in the `tool_result` branches inside the `elif isinstance(content, list)` block.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: strix/config/config.py
Line: 24-26
Comment:
**New vars missing from `_LLM_CANONICAL_NAMES` persisted-config clearing logic**
The three new variables are correctly picked up by `_tracked_names()` (lowercase, `None` default) so they are saved and loaded through `Config.save_current()` / `Config.apply_saved()`. However, `_LLM_CANONICAL_NAMES` drives the stale-config reset in `_llm_env_changed()`: when a saved config file is detected alongside different live LLM env vars the whole LLM block is cleared. The memory-tuning variables are **not** LLM-auth config, so excluding them here is correct.
Worth noting though: a user who has `STRIX_MAX_CONTEXT_TOKENS` persisted in `~/.strix/cli-config.json` and then unsets it by clearing the env var will have it silently re-applied on the next run (the `cleared_vars` logic only strips vars that are set to `""` in the environment, not vars that are simply absent). This is existing behaviour for all nullable config vars and not unique to this PR, but may surprise operators tuning these settings interactively.
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "chore: retrigger review" | Re-trigger Greptile |
|
@greptile |
Pulls usestrix#467 (feat: add resume session feature) ahead of upstream merge. Enables unattended scans to survive crashes or forced termination (e.g. AWS STS session expiry mid-scan) by appending every LLM message to strix_runs/<run>/conversation.jsonl and replaying on --resume. New CLI: strix --list-sessions # tabulate past scans strix --continue (or -c) # resume most recent strix --resume <run_name> # resume by name strix --resume # open interactive picker Motivates: https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499 — trade-api flag-aware scan exited at 71min with APIConnectionError ("security token included in the request is expired"). Session had produced 10 findings; all lost once the report upload also hit ExpiredToken. With this commit merged, the CI composite action can wrap strix in a re-assume + --continue loop so each sub-session fits inside the 60min STS budget and total scan duration is bounded only by the overall workflow timeout. Upstream integration: - Once usestrix#467 merges, drop from the seedcx-build recipe - Until then, include alongside usestrix#454 (memory compression), usestrix#460 (truncate retry), usestrix#468 (thinking retry), this one, and the adaptive-thinking commit on this branch Clean 3-way merge, no conflicts. Smoke tests: - uv sync succeeds - strix --help surfaces --resume / --continue / --list-sessions - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK Follow-up (separate): strix-scan-workflow README recipe + composite action consumer changes to invoke resume on re-dispatch.
Summary
Adds three environment variables to tune the memory compressor for large-scale scanning campaigns, plus an additional prompt cache breakpoint for improved cache hit rates on Anthropic models.
New environment variables
STRIX_MAX_CONTEXT_TOKENSSTRIX_MIN_RECENT_MESSAGESSTRIX_MAX_TOOL_OUTPUT_CHARSPrompt caching improvement
Adds a second
cache_controlbreakpoint on the agent identity message (<agent_identity>tag), which is stable for the lifetime of each agent. This complements the existing system prompt breakpoint. Related to #279.Motivation
Strix's agentic architecture (6 agents, ~600 LLM calls per scan, full history resent every call) can produce 1.5M–26M input tokens per scan on large repos. The memory compressor currently only triggers at 90% of 100K tokens — by which point the cost damage is already done. Oversized tool outputs (nmap scans, large file reads) accumulate in history and get resent hundreds of times.
These changes make compression tunable and add tool output truncation at ingestion to prevent context bloat at the source.
A/B test results
Tested on a production corpus of 800 repositories at a crypto infrastructure company. Settings:
MAX_CONTEXT_TOKENS=40000,MIN_RECENT_MESSAGES=10,MAX_TOOL_OUTPUT_CHARS=8000.Per-scan comparison (large TypeScript repo with confirmed critical findings):
At scale (15-scan sample): Average cost dropped from $8.66 to $4.52 per scan.
No findings were lost. The optimised configuration actually found one additional vulnerability that the stock configuration missed (likely because the stock run hit context limits and lost relevant earlier context).
Changes
strix/config/config.py— 3 new config variablesstrix/llm/memory_compressor.py— Configurable thresholds,truncate_tool_outputs()method called before compression incompress_history(),_truncate_tool_output()helper preserving head + tailstrix/llm/llm.py— Second cache breakpoint on agent identity messageAll changes are backwards compatible — default behaviour is unchanged when env vars are not set.
Test plan
_truncate_tool_output()edge cases (happy to add if wanted)