Skip to content

Releases: vfalbor/caveman_tokenstransfer.2.0

v2.0.0 β€” input + output, both local, both free

27 May 17:21

Choose a tag to compare

tl;dr

caveman cuts output tokens by 65%. This fork adds tokenstransfer (LLMLingua-2, runs 100% local) + tokenstranslation (BPE language-tax fix) as peer skills. Same one-line install. Two sides of the bill, one stack.

On a typical RAG/agent call (Haiku 4.5): -55% per-call cost stacked. Numbers below.

What's new in 2.0

  • πŸ†• /tokenstransfer β€” input compression via LLMLingua-2, 100% local by default (pip install llmlingua torch tiktoken), API fallback to transfer.tokenstree.com
  • πŸ†• /tokenstranslation β€” multilingual prompt β†’ English (cheap-tokens language) via translation.tokenstree.com
  • πŸ†• /caveman-fullstack β€” caveman dialect + LLMLingua-2 input compression in one command
  • πŸ†• server/ β€” self-hostable LLMLingua-2 backend (FastAPI + Dockerfile + compose) for teams that want it as a microservice
  • πŸ†• Markdown-aware segmenter: headings, bullets, code blocks, URLs, file paths, frontmatter preserved exactly
  • πŸ†• Fail-soft: API down? Falls back to passthrough instead of breaking the downstream LLM call
  • βœ… Inherits everything upstream caveman ships: /caveman 4 levels, /caveman-compress, /caveman-review, /caveman-commit, /caveman-stats, /cavecrew, the 30+ agent installer, MIT license

Benchmark β€” 25 prompts, 4 suites, Haiku 4.5 cost model

Suite caveman only tokenstransfer only fullstack
coding -65% -2% -65%
RAG -62% -27% -64%
agent -63% -42% -69%
multilingual -65% -3% (en-en) -65%
overall avg -60.2% -3.7% -63.9%

For short-prompt / long-answer workloads (codegen), caveman alone wins. For long-context / short-answer (RAG, agents), tokenstransfer is the bigger lever. Stack both for the wide-net.

Install

curl -fsSL https://raw.githubusercontent.com/vfalbor/caveman_tokenstransfer.2.0/main/install.sh | bash
pip install llmlingua torch tiktoken    # enable 100% local mode

Then /caveman, /tokenstransfer, /tokenstranslation, /caveman-fullstack from any of the 30+ supported agents.

Credits

Output side and 95% of the installer plumbing: @JuliusBrussee. Input side, suite glue, server, benchmarks: @vfalbor.

⭐ this fork if it cut your bill. ⭐ JuliusBrussee/caveman too β€” they did the hard half.


Why use many token when few token do trick.