A git pre-commit and pre-push hook that scans what you're about to
commit or push with OpenAI's
privacy-filter (opf) model and
blocks the operation when it detects any of:
| category | covers |
|---|---|
account_number |
bank / card numbers |
private_address |
street addresses |
private_date |
birthdates, etc. |
private_email |
email addresses |
private_person |
names |
private_phone |
phone numbers |
private_url |
URLs treated as private (e.g. webhooks) |
secret |
API keys, tokens, passwords, credentials |
Unlike regex-based tools (detect-secrets, gitleaks, ggshield), this
uses opf's learned 1.5 B-param sparse-MoE classifier, so it catches novel
credential formats and PII the regex rules don't know about — at the cost of
a ~3 GB first-run model download and ~seconds of per-commit inference.
You need Python 3.11+ and the opf
package available at hook-invocation time. git-privacy-filter declares opf
as a dependency, so installing this tool pulls opf with it.
Pick one of:
# uv users (recommended)
uv tool install git+https://github.com/<owner>/git-privacy-filter
# pipx users
pipx install git+https://github.com/<owner>/git-privacy-filter
# pre-commit framework users — see "Via the pre-commit framework" belowThen preflight:
git-privacy-filter doctordoctor reports whether opf is importable and whether the model weights
are on disk. First run triggers a ~3 GB download from HuggingFace into
~/.opf/privacy_filter/; doctor --load-model warms torch caches in addition.
git-privacy-filter pins torch to the CPU-only PyTorch wheels
(torch==X.Y.Z+cpu), routed through https://download.pytorch.org/whl/cpu
via [tool.uv.sources] in our pyproject.toml. This cuts ~3.7 GB of
transitive NVIDIA CUDA libraries (nvidia-cudnn, nvidia-cublas,
nvidia-cusolver, cuda-toolkit, triton, …) that the default PyPI
torch wheel drags in on Linux.
Rough footprint with the CPU-only default:
| component | size |
|---|---|
| uv tool venv | ~900 MB |
| opf model weights | ~2.8 GB |
| total | ~3.7 GB |
Versus roughly ~7.5 GB if you install with the default CUDA-enabled
torch.
GPU users — at runtime we auto-detect via torch.cuda.is_available()
(PyTorch exposes both NVIDIA CUDA and AMD ROCm devices under the
same cuda namespace), so once the right torch wheel is in the
environment no code changes are needed. GIT_PRIVACY_FILTER_DEVICE
is the env-var escape hatch if auto-detect picks wrong.
uv tool install \
--index https://download.pytorch.org/whl/cu121 \
git+https://github.com/<owner>/git-privacy-filterShort version — host side, then Python side.
Host-side (Fedora 40+, no third-party repos):
sudo dnf install rocminfo rocm-hip rocm-smi rocm-clinfo rocm-opencl
sudo usermod -aG render,video "$USER" # log out / back in for groups
rocminfo | grep -A1 "Marketing Name" # confirm your GPU shows up
rpm -q rocm-hip --qf '%{VERSION}\n' # note the major.minor (e.g. 6.2)On Ubuntu/Debian: sudo apt install rocm reaches roughly the same
state (AMD's own apt repo is the fallback when distros don't package
ROCm natively).
Python-side — dedicated venv (recommended for ROCm):
uv tool install on this project doesn't work cleanly with the rocm6.2
torch wheels today: those wheels only cover cp39-cp312, and
requires-python = ">=3.11" forces uv's lockfile to also resolve on cp313,
which has no rocm6.2 torch wheel. Until the rocm index catches up, use a
dedicated venv for the ROCm setup instead:
cd /path/to/git-privacy-filter
# Pin to Python 3.12 (rocm6.2 torch wheels don't exist for 3.13).
uv venv --python 3.12 .venv-rocm
source .venv-rocm/bin/activate
# Install torch + its rocm-only transitive straight from the rocm index.
# Match the rocm6.x minor to whatever `rpm -q rocm-hip` printed.
uv pip install \
--extra-index-url https://download.pytorch.org/whl/rocm6.2 \
--index-strategy unsafe-best-match \
'torch>=2.5' pytorch-triton-rocm
# Install opf + git-privacy-filter itself (torch is already satisfied).
uv pip install \
--extra-index-url https://download.pytorch.org/whl/rocm6.2 \
--index-strategy unsafe-best-match \
-e .
# Verify
python -c 'import torch; print(torch.__version__, torch.cuda.is_available())'
# Expect: 2.5.1+rocm6.2 True (PyTorch exposes HIP devices under "cuda")To use it globally, symlink the venv's binary onto your PATH (or source the venv in your shell init):
ln -sf "$PWD/.venv-rocm/bin/git-privacy-filter" ~/.local/bin/git-privacy-filterThe hook scripts call plain git-privacy-filter, so whichever copy is first
on $PATH wins — the ROCm-venv symlink above takes precedence over any
previously uv tool install-ed CPU version.
The commented pytorch-rocm blocks in pyproject.toml are kept as a
reference for when the rocm6.2 index catches up to cp313; at that point
the uv tool install --reinstall --python 3.12 flow becomes the cleaner
path.
Troubleshooting: if torch.cuda.is_available() prints False,
check group membership (id -nG should include render),
/dev/kfd permissions, and dmesg | grep amdgpu. If you see
HIP error: no kernel image is available, your GPU's gfx target
isn't in the wheel's compiled set — set
HSA_OVERRIDE_GFX_VERSION=11.0.0 (RDNA3) or 10.3.0 (RDNA2).
Note for PyPI consumers: [tool.uv.sources] is a uv-only directive;
pip install git-privacy-filter (once we publish) will still pull the
default CUDA-enabled torch. If you're a pip user on a CPU-only machine
and want the small install, pre-install the CPU wheel:
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install git-privacy-filterAdd to your repo's .pre-commit-config.yaml:
repos:
- repo: https://github.com/<owner>/git-privacy-filter
rev: v0.1.0
hooks:
- id: git-privacy-filter # runs on every commit
- id: git-privacy-filter-push # runs on every pushThen pre-commit install --hook-type pre-commit --hook-type pre-push.
From inside any git repo:
git-privacy-filter installThis writes small shell wrappers into the repo's hooks directory (respecting
core.hooksPath) that exec git-privacy-filter precommit / prepush. A
magic comment on the second line lets git-privacy-filter uninstall safely
remove our hooks while leaving any other hooks you have in place untouched.
Drop a .git-privacy-filter.toml at your repo root:
[allowlist]
# Skip these paths entirely — the scanner never even runs on them.
# Useful for test fixtures that intentionally contain credential-like
# patterns, lockfiles, vendored third-party code, etc.
paths = [
"tests/fixtures/**",
"**/*.lock",
"vendor/**",
]
[categories]
# Optional. Omit both keys to enable all 8 categories.
# Use EITHER `enabled` OR `disabled`, not both.
enabled = ["secret", "private_email"] # only these two block
# disabled = ["private_person"] # or: block everything except theseMalformed config is a hard error, not a silent "revert to defaults" —
a mistyped allowlist_paths that disabled every filter would be the worst
kind of bug.
For one-off exceptions, annotate the offending line:
# Bare form — allows all categories on this line:
AWS_FAKE_KEY = "AKIA..." # git-privacy-filter: allow
# Category-scoped — allows only `secret` here; private_email on the same
# line would still block:
USER_AND_TOKEN = ("alice@example.com", "ghp_...") # git-privacy-filter: allow secret
# For a line that can't carry a trailing comment (e.g. a generated blob),
# use the previous line:
# git-privacy-filter: allow-next-line
1600_Pennsylvania_Avenue_NW_Washington_DC_20500Any comment prefix works — #, //, --, ;, /* — we match on the
git-privacy-filter: token, not on the comment syntax.
git commit --no-verify skips all hooks including this one. Use sparingly;
there's always a paper trail (the commit exists) but no block.
# Scan the currently staged diff without committing:
git-privacy-filter scan
# Scan a range of commits (what a pre-push would see):
# (no CLI flag yet — invoke the pre-push code path directly):
echo "refs/heads/feature $(git rev-parse HEAD) refs/heads/feature $(git rev-parse origin/main)" \
| git-privacy-filter prepushClone + sync:
git clone https://github.com/<owner>/git-privacy-filter
cd git-privacy-filter
uv sync --group test --group devFast unit tests (the default — mocked opf, milliseconds):
uv run pytestTier-2 integration tests (tmp git repos, still mocked opf):
uv run pytest -m integrationTier-3 real-model tests (loads the actual ~3 GB model — slow, requires the model to be cached, requires opf + torch installed):
uv run pytest -m real_modelOn a machine without a GPU, set GIT_PRIVACY_FILTER_TEST_DEVICE=cpu (the
default) so torch doesn't try to initialize CUDA.
Lint + format:
uv run ruff check
uv run ruff formatFor git commit:
git diff --cached -U0 --diff-filter=ACMR→ the lines this commit would add (context and deletions are ignored).- Group additions by path, drop allowlisted paths, join per-file additions
with
\n, feed each file's blob toopf.OPF.redact(text)once. - Map each returned span's character offset back to
(file, line, col)via the per-file line-start table we built during the join. - Drop findings whose category is disabled in config, or whose line has a matching inline allow marker.
- Print a
path:line:col [category]report and exit 1 if anything is left.
For git push, step 1 is git log -p --reverse -U0 <remote_sha>..<local_sha>
per ref being pushed, and everything else is the same. A brand-new branch
(zero remote_sha) is scanned as "everything reachable from local_sha";
a branch deletion (zero local_sha) is a no-op.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option. This is the same dual-licensing convention used by
rust-lang/rust, lightningdevkit/*, and most of the Rust ecosystem —
it lets consumers pick the license that matches their own project's
licensing obligations.
Upstream opf (code and model weights) is Apache-2.0; pulling it in as
a runtime dependency doesn't change the licensing of our own code, but
downstream redistributors of a combined work obviously inherit opf's
Apache-2.0 requirements.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.