Release v2.0.0 — input + output, both local, both free · vfalbor/caveman_tokenstransfer.2.0

tl;dr

caveman cuts output tokens by 65%. This fork adds tokenstransfer (LLMLingua-2, runs 100% local) + tokenstranslation (BPE language-tax fix) as peer skills. Same one-line install. Two sides of the bill, one stack.

On a typical RAG/agent call (Haiku 4.5): -55% per-call cost stacked. Numbers below.

What's new in 2.0

🆕 /tokenstransfer — input compression via LLMLingua-2, 100% local by default (pip install llmlingua torch tiktoken), API fallback to transfer.tokenstree.com
🆕 /tokenstranslation — multilingual prompt → English (cheap-tokens language) via translation.tokenstree.com
🆕 /caveman-fullstack — caveman dialect + LLMLingua-2 input compression in one command
🆕 server/ — self-hostable LLMLingua-2 backend (FastAPI + Dockerfile + compose) for teams that want it as a microservice
🆕 Markdown-aware segmenter: headings, bullets, code blocks, URLs, file paths, frontmatter preserved exactly
🆕 Fail-soft: API down? Falls back to passthrough instead of breaking the downstream LLM call
✅ Inherits everything upstream caveman ships: /caveman 4 levels, /caveman-compress, /caveman-review, /caveman-commit, /caveman-stats, /cavecrew, the 30+ agent installer, MIT license

Benchmark — 25 prompts, 4 suites, Haiku 4.5 cost model

Suite	caveman only	tokenstransfer only	fullstack
coding	-65%	-2%	-65%
RAG	-62%	-27%	-64%
agent	-63%	-42%	-69%
multilingual	-65%	-3% (en-en)	-65%
overall avg	-60.2%	-3.7%	-63.9%

For short-prompt / long-answer workloads (codegen), caveman alone wins. For long-context / short-answer (RAG, agents), tokenstransfer is the bigger lever. Stack both for the wide-net.

Install

curl -fsSL https://raw.githubusercontent.com/vfalbor/caveman_tokenstransfer.2.0/main/install.sh | bash
pip install llmlingua torch tiktoken    # enable 100% local mode

Then /caveman, /tokenstransfer, /tokenstranslation, /caveman-fullstack from any of the 30+ supported agents.

Credits

Output side and 95% of the installer plumbing: @JuliusBrussee. Input side, suite glue, server, benchmarks: @vfalbor.

⭐ this fork if it cut your bill. ⭐ JuliusBrussee/caveman too — they did the hard half.

Why use many token when few token do trick.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0 — input + output, both local, both free

Choose a tag to compare

Sorry, something went wrong.