Skip to content

thepixelabs/sanitai

Repository files navigation

Nix the Raccoon — SanitAI mascot

Find secrets in your LLM chat history before someone else does.

SanitAI scans your Claude and ChatGPT conversation exports for leaked API keys, credentials, and personal data — entirely on your machine. No network calls. No cloud. No copies of your data leave the device.

License: MIT Platform: Linux macOS


The problem

You paste a .env file into Claude to debug a config issue. You ask ChatGPT to review a script that contains a database URL. Six months later, you export your conversation history and realise those secrets have been sitting in plain text inside a JSON file on your laptop — and in every backup, sync, and export you've made since.

SanitAI finds them. In seconds.


Install

Homebrew (macOS and Linux)

brew install thepixelabs/tap/sanitai

curl installer (Linux and macOS)

curl -fsSL https://releases.sanitai.dev/install.sh | sh

Verify the binary signature before running anything (see Trust model).

Cargo (build from source)

cargo install sanitai

Requires Rust 1.78 or later.


Your first scan

sanitai scan ~/Downloads/claude_export.zip
SanitAI v0.1.0 — local scan, no network
Parsing: claude_export.zip (Claude format)

Scanning 1,842 messages across 94 conversations...

FINDINGS (4)
────────────────────────────────────────────────────────────────
 HIGH  AWS Access Key        conversation_47.json:line 312
       AKIA[REDACTED]        matched: aws_access_key_id pattern

 HIGH  Generic API Key       conversation_12.json:line 88
       sk-[REDACTED]         matched: high-entropy bearer token

 MED   Email address         conversation_91.json:line 201
       j****@example.com     matched: RFC 5322 address heuristic

 LOW   Internal hostname     conversation_03.json:line 44
       db.internal.corp      matched: internal TLD heuristic
────────────────────────────────────────────────────────────────
4 findings in 1,842 messages (0.8 s)

Run `sanitai redact` to remove findings from a copy of the export.

No finding left the machine. The original file was not modified.


Supported sources

Source Export format Parser
Claude (claude.ai) conversations.json ZIP export Built-in
ChatGPT (chat.openai.com) conversations.json ZIP export Built-in

Additional parsers are planned for future releases. The parser is separate from the detector — adding a new source does not change detection logic.


Detection capabilities

Category Examples Method
Cloud provider keys AWS, GCP, Azure, Stripe, Twilio Regex against known prefixes
Generic secrets High-entropy strings in key=value context Shannon entropy + heuristic
Bearer tokens Authorization: headers, sk- prefixes Regex + context heuristic
Database URLs postgres://, mysql://, mongodb+srv:// Regex
Private keys PEM blocks (BEGIN RSA PRIVATE KEY, etc.) Regex
Email addresses RFC 5322 addresses Heuristic
Phone numbers E.164 and regional formats Regex + heuristic
Custom patterns User-defined YAML rules Regex + optional entropy

Detectors run locally in process. No pattern or matched text is sent anywhere.


Redaction modes

# Write a redacted copy alongside the original (default)
sanitai redact ~/Downloads/claude_export.zip

# Redact in place (replaces the original — use with caution)
sanitai redact --in-place ~/Downloads/claude_export.zip

# Replace matched text with a fixed mask
sanitai redact --mask "***REDACTED***" ~/Downloads/claude_export.zip

# Replace matched text with the finding type label
sanitai redact --mask-with-type ~/Downloads/claude_export.zip
# e.g. "[AWS_ACCESS_KEY]", "[EMAIL_ADDRESS]"

Configuration

SanitAI reads ~/.config/sanitai/config.toml (XDG-compliant). All fields are optional.

# ~/.config/sanitai/config.toml

[scan]
min_severity = "low"   # "low" | "medium" | "high"
disable_detectors = []

[redact]
mask = "[REDACTED]"
mask_with_type = false

[rules]
extra_rules_dirs = ["~/.config/sanitai/rules"]

Validate your config:

sanitai config validate

Custom rules

# ~/.config/sanitai/rules/acme-api-key.yaml
id: acme_api_key
name: "ACME Corp API Key"
severity: high
pattern: "acme_[a-zA-Z0-9]{32}"
min_entropy: 3.5
context_keywords: ["api_key", "apikey", "token"]

See docs/custom-rules.md for the complete schema and examples.


How it works

1. PARSE                  2. DETECT                  3. REPORT
─────────────             ─────────────              ─────────────
Read export ZIP     →     For each message:    →     Print findings
Identify format           run regex detectors         by severity
Extract message           run entropy scorer
text per turn             run heuristics
                          apply custom rules

Everything happens in a single local process. No state is persisted between runs.


Trust model

Offline by design. SanitAI makes zero network connections at runtime. Verify:

# Linux
strace -e trace=network sanitai scan /path/to/export.zip 2>&1 | grep -v "^strace"

# macOS
sudo fs_usage -w -f network sanitai

A clean scan produces no output from either command.

No telemetry. No analytics, no crash reporting, no usage pinging.

Signed binaries. Release binaries are signed with cosign.

cosign verify-blob \
  --certificate sanitai-linux-amd64.cert \
  --signature sanitai-linux-amd64.sig \
  --certificate-identity \
    "https://github.com/sanitai/sanitai/.github/workflows/release.yml@refs/heads/main" \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  sanitai-linux-amd64

Contributing

See CONTRIBUTING.md. Security vulnerabilities: see SECURITY.md.


License

MIT — see LICENSE.

About

Find secrets in your LLM chat history before someone else does.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages