A transparent proxy that strips IPs, credentials, hostnames, and PII from every request before it reaches the AI β and restores them on the way back.
flowchart TD
shell["π₯οΈ Your Shell\nnmap -sV dc01.acmecorp.local"]
proxy["π‘οΈ DontFeedTheAI\ndc01.acmecorp.local β srv-0042.pentest.local\n10.20.0.10 β 203.0.113.47\nAdmin@Acme2024! β [CRED_XK9A2B3C]"]
api["βοΈ LLM API\nsees only\nsrv-0042.pentest.local\n203.0.113.47"]
shell -- "β real data" --> proxy
proxy -- "β‘ surrogates only" --> api
api -- "β’ response + surrogates" --> proxy
proxy -- "β£ real data restored" --> shell
| Layer | Detects |
|---|---|
| π§ Ollama (local LLM) | hostnames, org names, credentials in prose |
| π Regex | IPs, hashes, tokens, API keys |
Both run on your machine. Nothing sensitive crosses the boundary.
| Who | How it helps |
|---|---|
| Pentesters | Run nmap, mimikatz, bloodhound output through Claude without exposing client infrastructure |
| Developers & SREs | Debug with production data or internal configs in regulated environments |
| Legal & consulting | Anonymize client contracts, case files, or proprietary IP in AI-assisted reviews |
| Finance & compliance | Analyze reports or audit scripts without exposing account details |
| Researchers | Query LLMs on confidential datasets |
β Cloud anonymization API + LLM β two bills, two third parties. Your sensitive data still leaves the machine, just through more hands.
flowchart LR
s0["π₯οΈ Your Shell\nreal data"] --> a0["βοΈ Anonymization API\nsees everything\nbill #1"]
a0 --> c0["βοΈ LLM API\nbill #2"]
β Ollama alone β your data never leaves the machine, but Ollama has no awareness of what's sensitive. It reasons on whatever you paste: real IPs, real credentials, real hostnames.
flowchart LR
s1["π₯οΈ Your Shell\nreal data"] --> o1["π§ Ollama\nno interception\nreasons on real data"]
β Claude / OpenAI directly β best reasoning quality, but everything lands in their infrastructure. Real client IPs, credentials, org names in API logs β one policy change or breach away from a problem.
flowchart LR
s2["π₯οΈ Your Shell\nreal data"] --> c1["βοΈ LLM API\nsees everything\nlogs your real data"]
β DontFeedTheAI β cloud reasoning quality, local detection, nothing sensitive crosses the boundary. Works with Claude Code, OpenAI SDK, OpenRouter, or any OpenAI-compatible client.
flowchart LR
s3["π₯οΈ Your Shell\nreal data"] --> p["π‘οΈ DontFeedTheAI"]
o2["π§ Ollama\nlocal detector\nnever leaves machine"] --> p
p --> c2["βοΈ LLM API\nsees only surrogates"]
β See docs/architecture.md for the full technical breakdown. For supported LLM clients and upstream configuration, see docs/providers.md.
With a VPS (recommended for team use or persistent engagements):
git clone https://github.com/zeroc00I/DontFeedTheAI
cd DontFeedTheAI
python3 wizard.pyThe wizard asks everything β engagement name, VPS address, model β then deploys, opens the SSH tunnel, and launches Claude with the proxy active.
Locally without a VPS:
python3 wizard.py setup # create venv + install dependencies
python3 wizard.py docker up # start proxy + Ollama in Docker
export ANTHROPIC_BASE_URL=http://localhost:8080
export ENGAGEMENT_ID=my-engagement
claude # or any OpenAI-compatible clientWorks on Windows, macOS, and Linux.
python3 wizard.py --help # all available commands| Doc | About |
|---|---|
| Architecture | Two-layer pipeline, what gets anonymized and what doesn't, config reference |
| Providers | Supported LLM clients: Claude Code, OpenAI SDK, OpenRouter |
| Contributing | How to add fixtures, run the improvement loop, open areas |
| Threat Model | What this protects against, what it doesn't, limitations, roadmap |
Two tools ship with DontFeedTheAI to help you validate coverage and extend it.
Visual audit β open in browser while the proxy is running:
python3 wizard.py tunnel --auditShows every ORIGINAL β SURROGATE mapping logged during the session, filterable by entity type (DOMAIN, CREDENTIAL, TOKEN, HASHβ¦) with per-request timing breakdown. Use it to spot leaks at a glance instead of grepping logs.
The audit page is a debug tool. It exposes the full surrogate β original lookup table, which is why it only runs behind the SSH tunnel. Making this write-only (no reverse lookup over HTTP) is on the roadmap β see Threat Model.
Testing the full pipeline β requires Ollama running:
python3 wizard.py test --integrationRuns all 53 fixtures through the complete pipeline (LLM + regex) and asserts zero leaks. Without --integration, the LLM is mocked and only the regex layer is validated β useful for fast iteration but not a substitute for the full run.
Auto-improvement loop β regex layer only, no Ollama required:
python3 wizard.py improve --cycles 3Runs all fixtures through the regex layer, reports leaks and false positives, and tells you exactly which strings slipped through. The contribution cycle is: add a fixture for a real tool you use β run the loop β add a regex pattern for each leak β repeat. See Contributing.
The two commands complement each other: improve tightens the regex floor fast; test --integration confirms the full pipeline holds.
I'm a pentester, not a software architect.
This wasn't built to be innovative β there are already cloud APIs that do LLM-based anonymization. But that means sending your data to yet another third party, and I refuse. If you work in security, you already know why.
I built this so the architecture would be available to everyone, and so the community could help expand its effectiveness for free. You're paying for context processing β the AI doesn't need your real data for that.
β zeroc00I

