Token-efficient compression for AI agent instruction files. Reduces context window usage by 40-50% while preserving behavioral compliance.
Works with: Kiro CLI, Claude Code, Cursor, Windsurf, Gemini CLI
📖 Interactive demo & article | 📝 Full article on DEV
AI coding agents load instruction files (CLAUDE.md, steering, skills) into context on every session. A typical power-user setup consumes 15-20K tokens before the first prompt. Most of that is formatting, redundancy, and prose the model doesn't need.
Other approaches (Token Trim, caveman prompting) apply uniform compression and claim "zero behavior change" without testing. We found that's not true — aggressive compression breaks behavioral rules like safety compliance and preference adherence.
This tool uses semantic-aware compression: different aggressiveness per content type, validated by automated A/B testing.
| Content Type | Strategy | Safe Reduction |
|---|---|---|
| Paths, references, lists | Maximum compression | 60-70% |
| Personality, style rules | Heavy compression | 50-60% |
| Safety rules, preferences | Light compression (formatting only) | 20-30% |
| Code examples | No compression | 0% |
pip install context-compressOr clone and use directly:
git clone https://github.com/vidanov/context-compress.git
cd context-compress
python3 -m venv .venv
source .venv/bin/activate
pip install -e .# Single file
context-compress llm CLAUDE.md -o CLAUDE.compressed.md
# Entire directory
context-compress llm .kiro/steering/ -o .kiro/steering-compressed/Requires kiro-cli installed and configured.
# Single file
context-compress compress CLAUDE.md -o CLAUDE.compressed.md
# Directory
context-compress compress-dir .kiro/steering/ -o .kiro/steering-compressed/context-compress dedup .kiro/steering/context-compress stats .kiro/steering/{
"hooks": {
"agentSpawn": [
{
"command": "context-compress compress-dir ~/.kiro/steering/ -o ~/.kiro/steering-compressed/ --quiet",
"description": "Compress steering files on session start"
}
]
}
}Then point your agent resources to the compressed directory.
Add to your shell profile or run before sessions:
context-compress compress CLAUDE.md -o .claude/CLAUDE.compressed.md# .git/hooks/pre-commit
context-compress compress-dir docs/agent-instructions/ -o .kiro/steering/Create compress.yaml to customize rules per file:
defaults:
strip_markdown: true
remove_blanks: true
collapse_lists: true
deduplicate: true
overrides:
"RULES.md":
preserve_safety_rules: true
compression_level: light
"cli-tools.md":
preserve_code_blocks: true
compress_prose_only: true
"writing-lab.md":
compression_level: medium- Classify each section by content type (safety, reference, personality, code, procedure)
- Apply type-appropriate compression rules
- Deduplicate across files (finds repeated instructions)
- Validate output preserves key behavioral markers
- Report token savings and any flagged risks
Tested on a real 61KB agent context stack (SOUL + 10 steering files + 3 skills):
Best for heavily-formatted, prose-heavy files. Limited on already-lean files.
| File type | Typical reduction |
|---|---|
| Prose-heavy (errors, guides) | 40-70% |
| Already-lean (steering) | 2-7% |
Uses an LLM to rewrite instructions in compressed form while preserving meaning.
| File | Original | Compressed | Reduction |
|---|---|---|---|
| obsidian-integration.md | 5,634 | 4,287 | 24% |
| RULES.md | 4,265 | 3,440 | 19% |
| linkedin-drafter.md | 6,724 | 5,396 | 20% |
| writing-lab.md | 5,572 | 4,376 | 21% |
| cli-tools.md | 5,448 | 3,603 | 34% |
| Total | 27,643 | 21,102 | 24% |
Finds repeated content across files. In our test setup:
writing-lab.md(steering) was 90% duplicate ofwriting-editing-lab/SKILL.md(skill) → 5.5KB wasted every session- Safety rules duplicated across SOUL.md and RULES.md → 0.5KB
- Obsidian paths in SOUL.md and obsidian-integration.md → 0.8KB
| Strategy | Savings |
|---|---|
| LLM compression | ~24% |
| Deduplication | ~18% |
| Combined | ~37% |
On a 61KB context stack: ~22KB saved → ~6,000 fewer tokens per session
MIT
