feat: File-backed tool results to reduce context token waste

## Problem

Tool outputs (nmap, ffuf, sqlmap, etc.) are currently kept in conversation history and sent to the LLM on every subsequent iteration until memory compression triggers at 90K tokens.

**Current behavior:**
```
Iteration 1:  [system] + [nmap_output]              → 50KB sent
Iteration 2:  [system] + [nmap_output] + [response] → 55KB sent  
Iteration 3:  [system] + [nmap_output] + [...]      → 60KB sent
...
Iteration 50: Still sending that nmap output        → 150KB sent
```

This leads to:
- **Wasted tokens**: Same output sent 50+ times before compression
- **Increased latency**: Larger context = slower LLM responses
- **Higher costs**: Paying for redundant input tokens
- **Late compression**: 90K threshold means ~50-80 iterations of waste

### Current Mitigations (Insufficient)

1. **Truncation** (`executor.py:187-190`): Outputs > 10K chars truncated to 8K
2. **Memory compression** (`memory_compressor.py`): Summarizes at 90K tokens

These help but don't address the core issue: tool outputs remain in context long after the agent has processed them.

## Proposed Solution: File-Backed Tool Results

### Concept

Store large tool outputs to disk immediately, include only a **reference + summary** in conversation history.

### Implementation Sketch

```python
# In executor.py - process_tool_invocations()

async def _format_tool_result(tool_name: str, result: str, run_dir: Path) -> str:
    """Format tool result, backing large outputs to file."""
    
    INLINE_THRESHOLD = 2000  # chars
    
    if len(result) <= INLINE_THRESHOLD:
        return f"<tool_result><tool_name>{tool_name}</tool_name><result>{result}</result></tool_result>"
    
    # Store full output to file
    tool_results_dir = run_dir / "tool_results"
    tool_results_dir.mkdir(exist_ok=True)
    
    result_id = f"{tool_name}_{uuid4().hex[:8]}"
    result_file = tool_results_dir / f"{result_id}.txt"
    result_file.write_text(result)
    
    # Generate summary for context
    summary = await _summarize_tool_output(tool_name, result)  # Or use heuristics
    
    return f"""<tool_result>
<tool_name>{tool_name}</tool_name>
<result_file>{result_file}</result_file>
<summary>{summary}</summary>
<hint>Use read_tool_result("{result_id}") to access full output if needed</hint>
</tool_result>"""
```

### New Tool: `read_tool_result`

Allow agents to retrieve full output when needed:

```python
@register_tool
async def read_tool_result(result_id: str, lines: str = "all") -> str:
    """Retrieve stored tool output.
    
    Args:
        result_id: ID from tool_result reference
        lines: "all", "first:100", "last:50", "grep:pattern", etc.
    """
    ...
```

### Benefits

| Metric | Current | With File-Backing |
|--------|---------|-------------------|
| Context per iteration | 50KB+ | ~2KB (summary only) |
| Token cost (100 iter) | ~500K input tokens | ~50K input tokens |
| Full data accessible | ✅ Always in context | ✅ On-demand via tool |
| Agent autonomy | N/A | Can request details when needed |

## Alternative Approaches

### 1. Immediate Summarization
Summarize tool outputs right after execution instead of at 90K threshold.
- **Pro**: Simpler, no new tool needed
- **Con**: Lossy, agent can't access original details

### 2. Sliding Window
Only keep last N tool outputs in context.
- **Pro**: Simple to implement
- **Con**: May lose relevant earlier outputs

### 3. RAG-based Retrieval
Embed tool outputs, retrieve semantically relevant chunks.
- **Pro**: Smart retrieval based on current task
- **Con**: Complex, adds latency, embedding costs

## Recommendation

**File-backed with summaries** (Option 1 above) provides the best balance:
- Preserves full data (no information loss)
- Drastically reduces context size
- Gives agent control over when to access details
- Simple implementation with clear mental model

## Implementation Considerations

- [ ] Threshold tuning: 2KB inline vs file-backed
- [ ] Summary generation: LLM-based vs heuristic (first/last N lines + stats)
- [ ] File cleanup: Delete after run or retain for debugging
- [ ] Tool schema for `read_tool_result` with filtering options
- [ ] Backwards compatibility: Existing tool handlers unchanged

## Related

- Memory compression: `strix/llm/memory_compressor.py`
- Tool execution: `strix/tools/executor.py`
- Agent state: `strix/agents/state.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: File-backed tool results to reduce context token waste #145

Problem

Current Mitigations (Insufficient)

Proposed Solution: File-Backed Tool Results

Concept

Implementation Sketch

New Tool: `read_tool_result`

Benefits

Alternative Approaches

1. Immediate Summarization

2. Sliding Window

3. RAG-based Retrieval

Recommendation

Implementation Considerations

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Current	With File-Backing
Context per iteration	50KB+	~2KB (summary only)
Token cost (100 iter)	~500K input tokens	~50K input tokens
Full data accessible	✅ Always in context	✅ On-demand via tool
Agent autonomy	N/A	Can request details when needed

feat: File-backed tool results to reduce context token waste #145

Description

Problem

Current Mitigations (Insufficient)

Proposed Solution: File-Backed Tool Results

Concept

Implementation Sketch

New Tool: read_tool_result

Benefits

Alternative Approaches

1. Immediate Summarization

2. Sliding Window

3. RAG-based Retrieval

Recommendation

Implementation Considerations

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New Tool: `read_tool_result`