Skip to content

feat: File-backed tool results to reduce context token waste #145

@yokoszn

Description

@yokoszn

Problem

Tool outputs (nmap, ffuf, sqlmap, etc.) are currently kept in conversation history and sent to the LLM on every subsequent iteration until memory compression triggers at 90K tokens.

Current behavior:

Iteration 1:  [system] + [nmap_output]              → 50KB sent
Iteration 2:  [system] + [nmap_output] + [response] → 55KB sent  
Iteration 3:  [system] + [nmap_output] + [...]      → 60KB sent
...
Iteration 50: Still sending that nmap output        → 150KB sent

This leads to:

  • Wasted tokens: Same output sent 50+ times before compression
  • Increased latency: Larger context = slower LLM responses
  • Higher costs: Paying for redundant input tokens
  • Late compression: 90K threshold means ~50-80 iterations of waste

Current Mitigations (Insufficient)

  1. Truncation (executor.py:187-190): Outputs > 10K chars truncated to 8K
  2. Memory compression (memory_compressor.py): Summarizes at 90K tokens

These help but don't address the core issue: tool outputs remain in context long after the agent has processed them.

Proposed Solution: File-Backed Tool Results

Concept

Store large tool outputs to disk immediately, include only a reference + summary in conversation history.

Implementation Sketch

# In executor.py - process_tool_invocations()

async def _format_tool_result(tool_name: str, result: str, run_dir: Path) -> str:
    """Format tool result, backing large outputs to file."""
    
    INLINE_THRESHOLD = 2000  # chars
    
    if len(result) <= INLINE_THRESHOLD:
        return f"<tool_result><tool_name>{tool_name}</tool_name><result>{result}</result></tool_result>"
    
    # Store full output to file
    tool_results_dir = run_dir / "tool_results"
    tool_results_dir.mkdir(exist_ok=True)
    
    result_id = f"{tool_name}_{uuid4().hex[:8]}"
    result_file = tool_results_dir / f"{result_id}.txt"
    result_file.write_text(result)
    
    # Generate summary for context
    summary = await _summarize_tool_output(tool_name, result)  # Or use heuristics
    
    return f"""<tool_result>
<tool_name>{tool_name}</tool_name>
<result_file>{result_file}</result_file>
<summary>{summary}</summary>
<hint>Use read_tool_result("{result_id}") to access full output if needed</hint>
</tool_result>"""

New Tool: read_tool_result

Allow agents to retrieve full output when needed:

@register_tool
async def read_tool_result(result_id: str, lines: str = "all") -> str:
    """Retrieve stored tool output.
    
    Args:
        result_id: ID from tool_result reference
        lines: "all", "first:100", "last:50", "grep:pattern", etc.
    """
    ...

Benefits

Metric Current With File-Backing
Context per iteration 50KB+ ~2KB (summary only)
Token cost (100 iter) ~500K input tokens ~50K input tokens
Full data accessible ✅ Always in context ✅ On-demand via tool
Agent autonomy N/A Can request details when needed

Alternative Approaches

1. Immediate Summarization

Summarize tool outputs right after execution instead of at 90K threshold.

  • Pro: Simpler, no new tool needed
  • Con: Lossy, agent can't access original details

2. Sliding Window

Only keep last N tool outputs in context.

  • Pro: Simple to implement
  • Con: May lose relevant earlier outputs

3. RAG-based Retrieval

Embed tool outputs, retrieve semantically relevant chunks.

  • Pro: Smart retrieval based on current task
  • Con: Complex, adds latency, embedding costs

Recommendation

File-backed with summaries (Option 1 above) provides the best balance:

  • Preserves full data (no information loss)
  • Drastically reduces context size
  • Gives agent control over when to access details
  • Simple implementation with clear mental model

Implementation Considerations

  • Threshold tuning: 2KB inline vs file-backed
  • Summary generation: LLM-based vs heuristic (first/last N lines + stats)
  • File cleanup: Delete after run or retain for debugging
  • Tool schema for read_tool_result with filtering options
  • Backwards compatibility: Existing tool handlers unchanged

Related

  • Memory compression: strix/llm/memory_compressor.py
  • Tool execution: strix/tools/executor.py
  • Agent state: strix/agents/state.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions