The public, Apache-2.0 audit copy of Tokenmin. This is the code that decides
what (if anything) leaves your machine when you run tokenmin. About 5 minutes
of reading, end to end.
The deal: Tokenmin is free during the friends-and-family preview in exchange for your anonymized usage data. The scanner that does the anonymization is open precisely so the trade is verifiable, not just promised. Read it. Diff it against your install. Then decide if you trust the bargain.
curl --proto '=https' --tlsv1.2 -fsSL https://tokenmin.ai/install.sh | bash~ tokenmin
▶ scanning ~/.claude
✓ found 57 sessions in last 14 days
✓ anonymized
✓ analyzed
Tokenmin Claude usage audit
────────────────────────────────────────────────────────────────────────
scanned 57 sessions over 14 days
est. spend (window): $6,860
model mix: Opus 99% · Sonnet 1%
────────────────────────────────────────────────────────────────────────
Headline ~$7,151/mo recoverable across 7 fix(es), ~4.8 hrs total
1. A lot of your spend is on Opus — route by tier
$$$$ ▮▮▮▮▮▮▮▮▮▮ $7,055/mo 0.1 hrs · conf 55% · model routing
evidence: 100% of $6,860 weekly spend on Opus across 52 sessions.
→ tokenmin show model_overspend
...
A rich terminal card with the headline dollar figure, ranked findings,
severity pills, per-finding next-action. Then tokenmin show <id> drills
into one finding's evidence + fix. Then tokenmin watch runs a live
dashboard while you work.
| Concern | Where |
|---|---|
Walking ~/.claude sessions, settings, agents, skills, MCP config |
This repo — skills/tokenmin/analyzer.py |
| Parsing claude.ai / Claude Desktop chat exports | This repo — skills/tokenmin/analyzer_chat_export.py |
| Anonymization (paths, secrets, labels, identifiers) | This repo — skills/tokenmin/anonymize.py |
| Orchestrator CLI: collect → anonymize → submit → render | This repo — skills/tokenmin/tokenmin.py |
| Wrapper script + auto-update + version + doctor | This repo — tokenmin |
| Tests + CI | This repo — tests/, .github/workflows/ci.yml |
| Detection rules, scoring, report rendering | Not here. Lives in proprietary watsonrm/tokenmin-core. |
| Hosted submission server | Not here. Bundled in the F&F preview. |
This repo is scanner-only. Running it produces an anonymized snapshot. It
does not produce a report (that's the engine's job). The scanner is fully
functional without the engine — you can write the snapshot to disk with
--snapshot snap.json and audit what would be sent before deciding to submit
anywhere.
Quick (trusts the network all the way to GitHub):
curl --proto '=https' --tlsv1.2 -fsSL https://tokenmin.ai/install.sh | bashVerify-then-run (recommended if you don't trust the network all the way to GitHub):
curl --proto '=https' --tlsv1.2 -fsSL -o install.sh https://tokenmin.ai/install.sh
curl --proto '=https' --tlsv1.2 -fsSL -o install.sh.sha256 https://tokenmin.ai/install.sh.sha256
shasum -a 256 -c install.sh.sha256
less install.sh
bash install.shThe installer detects every Claude variant on your machine (Code / Desktop on
macOS / Linux / Windows), drops a single tokenmin command on PATH, and offers
to add it to your shell rc with consent. No gh, no brew, no auth setup.
After install:
tokenmin --selfcheck # see the anonymizer rules without reading Python
tokenmin # scan + render inline (the magic moment)
tokenmin watch # live dashboard
tokenmin show <id> # drill into one finding
tokenmin help # 30-second walkthroughIdentifiers (file paths, project names, MCP server names, custom agent /
skill / command names) hash with HMAC-SHA256 keyed by a 32-byte salt
generated on first run at ~/.tokenmin/.salt (chmod 0600, refuses to
overwrite via O_EXCL). Output is 16 hex chars (64 bits) — collision-resistant
for any realistic corpus.
An adversary who guesses common path names like ~/.ssh/known_hosts cannot
precompute its hash without your salt. Cross-snapshot correlation works within
your install (so the engine can flag "same file re-read 12×"); cross-user
correlation is broken.
Stricter mode: TOKENMIN_STRICT_ANONYMIZE=1 adds an additional per-run salt.
Breaks within-user cross-run correlation too at the cost of losing
across-days findings.
Pathological JSONL inputs (oversized lines, regex-bomb strings, malformed JSON) can't hang the scrubber: every regex sees inputs truncated to 64 KiB max; bad lines are dropped, not raised on; recursion depth is capped.
Every snapshot built + every submission writes a JSON line to
~/.tokenmin/audit.log (chmod 0600) with UTC timestamp, event, and SHA-256
digest of the payload. Never user content. After the fact you can
reconstruct exactly what bytes you sent and when.
- Default
tokenminmode is local — no network calls at all --submit-urlrefuseshttp://for non-localhost--api-key-env VARkeeps bearer tokens out ofps/ shell history--no-anonymizerequires--i-know-what-im-doingAND refuses to combine with--submit-url--snapshot FILEwrites chmod 0600 + refuses to overwrite without--force
Every push runs across Python 3.10 / 3.11 / 3.12:
- 13 property + CLI tests including idempotent scrub, secret-pattern coverage
(Anthropic / OpenAI / Stripe / JWT / npm / Google / AWS / GitHub / Slack),
ReDoS input cap, salt sensitivity + stability, HTTPS-only enforcement,
double-flag on
--no-anonymize, chmod 0600 on snapshot writes - Deterministic
--selfcheckoutput diffed againsttests/fixtures/selfcheck.expected.json - Synthetic-leak gate: builds a fake
~/.claude/with planted client names + paths, runs the scanner, fails CI if any plaintext survives the scrubber
The F&F bundle mirrors these scanner files; its CI fails if they drift.
main is protected: no force-push, no branch deletion, linear history required.
SECURITY.md covers threat model, response targets (2-day ack,
5-day triage, 14-day patch for confirmed high-severity), supported versions,
named limitations.
| Field | Form |
|---|---|
| Session counts, turn counts | integer |
| Tool call counts by name | integer per tool name (MCP tool names hashed) |
| File paths from Read/Write/Edit | whole-string HMAC hash, no suffix leak |
| Project field, MCP servers, custom agents/skills/commands | whole-string HMAC hash |
| Models used, token usage, USD cost estimate | as-is (public info) |
| Permission denies, error results, redo signals | integer count |
| Timestamps | session start/end |
What never reaches the snapshot:
| Field | Why not |
|---|---|
| Raw text of user prompts | only lowercased + keyword-counted in memory for the redo-signal scan, then discarded |
| Raw assistant responses | scanner never reads them |
| Tool results | scanner never reads them |
Anything outside ~/.claude/ |
not in scan scope |
| Secrets (Anthropic / OpenAI / Stripe / JWT / npm / Google / AWS / GitHub / Slack tokens, PEM blocks, emails, IPs) | scrubbed by anonymize.py before any write |
Run tokenmin --selfcheck to see the exact anonymization output for a fixed
set of sample inputs. No collection happens.
The F&F bargain ("free for anonymized data") only works if the install is trivial. That's the lead priority.
- Native Claude Desktop adapter — today Desktop users go through the chat-export path (same as web). Live Electron-store parsing is in progress.
- Hosted endpoint — today the bundled
server/stub runs locally for testing. A realhttps://api.tokenmin.ai/analyzeendpoint with persistence- auth lands when F&F invitees cross ~5 yes-RSVPs.
- Engine v0.5 detectors — cache hit ratio, mid-session
/modelcache flush, output style presence,ENABLE_TOOL_SEARCHnot set, subagent return-size overruns, 1M-context trap past 256K. (See the public guide for the underlying patterns.) - GPG-signed releases for
TOKENMIN_REQUIRE_SIGNED=1users. - One-line installer with embedded signature verification.
| Repo | Visibility | Purpose |
|---|---|---|
watsonrm/tokenmin-scanner (this) |
public, Apache-2.0 | scanner + anonymizer + CLI |
watsonrm/tokenmin |
private | F&F preview bundle (mirrors scanner files + ships engine + server) |
watsonrm/tokenmin-core |
private | proprietary engine + rule base + positioning |
watsonrm/tokenmin-site |
public | static site served at https://tokenmin.ai |
Apache-2.0. See LICENSE and NOTICE. The proprietary
engine in watsonrm/tokenmin-core is under a separate, non-OSS license. See
LICENSING.md for the boundary.