tl;dr
caveman cuts output tokens by 65%. This fork adds tokenstransfer (LLMLingua-2, runs 100% local) + tokenstranslation (BPE language-tax fix) as peer skills. Same one-line install. Two sides of the bill, one stack.
On a typical RAG/agent call (Haiku 4.5): -55% per-call cost stacked. Numbers below.
What's new in 2.0
- π
/tokenstransferβ input compression via LLMLingua-2, 100% local by default (pip install llmlingua torch tiktoken), API fallback to transfer.tokenstree.com - π
/tokenstranslationβ multilingual prompt β English (cheap-tokens language) via translation.tokenstree.com - π
/caveman-fullstackβ caveman dialect + LLMLingua-2 input compression in one command - π
server/β self-hostable LLMLingua-2 backend (FastAPI + Dockerfile + compose) for teams that want it as a microservice - π Markdown-aware segmenter: headings, bullets, code blocks, URLs, file paths, frontmatter preserved exactly
- π Fail-soft: API down? Falls back to passthrough instead of breaking the downstream LLM call
- β
Inherits everything upstream caveman ships:
/caveman4 levels,/caveman-compress,/caveman-review,/caveman-commit,/caveman-stats,/cavecrew, the 30+ agent installer, MIT license
Benchmark β 25 prompts, 4 suites, Haiku 4.5 cost model
| Suite | caveman only | tokenstransfer only | fullstack |
|---|---|---|---|
| coding | -65% | -2% | -65% |
| RAG | -62% | -27% | -64% |
| agent | -63% | -42% | -69% |
| multilingual | -65% | -3% (en-en) | -65% |
| overall avg | -60.2% | -3.7% | -63.9% |
For short-prompt / long-answer workloads (codegen), caveman alone wins. For long-context / short-answer (RAG, agents), tokenstransfer is the bigger lever. Stack both for the wide-net.
Install
curl -fsSL https://raw.githubusercontent.com/vfalbor/caveman_tokenstransfer.2.0/main/install.sh | bash
pip install llmlingua torch tiktoken # enable 100% local modeThen /caveman, /tokenstransfer, /tokenstranslation, /caveman-fullstack from any of the 30+ supported agents.
Credits
Output side and 95% of the installer plumbing: @JuliusBrussee. Input side, suite glue, server, benchmarks: @vfalbor.
β this fork if it cut your bill. β JuliusBrussee/caveman too β they did the hard half.
Why use many token when few token do trick.