git-privacy-filter

A git pre-commit and pre-push hook that scans what you're about to commit or push with OpenAI's privacy-filter (opf) model and blocks the operation when it detects any of:

category	covers
`account_number`	bank / card numbers
`private_address`	street addresses
`private_date`	birthdates, etc.
`private_email`	email addresses
`private_person`	names
`private_phone`	phone numbers
`private_url`	URLs treated as private (e.g. webhooks)
`secret`	API keys, tokens, passwords, credentials

Unlike regex-based tools (detect-secrets, gitleaks, ggshield), this uses opf's learned 1.5 B-param sparse-MoE classifier, so it catches novel credential formats and PII the regex rules don't know about — at the cost of a ~3 GB first-run model download and ~seconds of per-commit inference.

Install

You need Python 3.11+ and the opf package available at hook-invocation time. git-privacy-filter declares opf as a dependency, so installing this tool pulls opf with it.

Pick one of:

# uv users (recommended)
uv tool install git+https://github.com/<owner>/git-privacy-filter

# pipx users
pipx install git+https://github.com/<owner>/git-privacy-filter

# pre-commit framework users — see "Via the pre-commit framework" below

Then preflight:

git-privacy-filter doctor

doctor reports whether opf is importable and whether the model weights are on disk. First run triggers a ~3 GB download from HuggingFace into ~/.opf/privacy_filter/; doctor --load-model warms torch caches in addition.

Install size and CPU-only default

git-privacy-filter pins torch to the CPU-only PyTorch wheels (torch==X.Y.Z+cpu), routed through https://download.pytorch.org/whl/cpu via [tool.uv.sources] in our pyproject.toml. This cuts ~3.7 GB of transitive NVIDIA CUDA libraries (nvidia-cudnn, nvidia-cublas, nvidia-cusolver, cuda-toolkit, triton, …) that the default PyPI torch wheel drags in on Linux.

Rough footprint with the CPU-only default:

component	size
uv tool venv	~900 MB
opf model weights	~2.8 GB
total	~3.7 GB

Versus roughly ~7.5 GB if you install with the default CUDA-enabled torch.

GPU users — at runtime we auto-detect via torch.cuda.is_available() (PyTorch exposes both NVIDIA CUDA and AMD ROCm devices under the same cuda namespace), so once the right torch wheel is in the environment no code changes are needed. GIT_PRIVACY_FILTER_DEVICE is the env-var escape hatch if auto-detect picks wrong.

NVIDIA / CUDA

uv tool install \
    --index https://download.pytorch.org/whl/cu121 \
    git+https://github.com/<owner>/git-privacy-filter

AMD / ROCm

Short version — host side, then Python side.

Host-side (Fedora 40+, no third-party repos):

sudo dnf install rocminfo rocm-hip rocm-smi rocm-clinfo rocm-opencl
sudo usermod -aG render,video "$USER"   # log out / back in for groups
rocminfo | grep -A1 "Marketing Name"    # confirm your GPU shows up
rpm -q rocm-hip --qf '%{VERSION}\n'     # note the major.minor (e.g. 6.2)

On Ubuntu/Debian: sudo apt install rocm reaches roughly the same state (AMD's own apt repo is the fallback when distros don't package ROCm natively).

Python-side — dedicated venv (recommended for ROCm):

uv tool install on this project doesn't work cleanly with the rocm6.2 torch wheels today: those wheels only cover cp39-cp312, and requires-python = ">=3.11" forces uv's lockfile to also resolve on cp313, which has no rocm6.2 torch wheel. Until the rocm index catches up, use a dedicated venv for the ROCm setup instead:

cd /path/to/git-privacy-filter

# Pin to Python 3.12 (rocm6.2 torch wheels don't exist for 3.13).
uv venv --python 3.12 .venv-rocm
source .venv-rocm/bin/activate

# Install torch + its rocm-only transitive straight from the rocm index.
# Match the rocm6.x minor to whatever `rpm -q rocm-hip` printed.
uv pip install \
    --extra-index-url https://download.pytorch.org/whl/rocm6.2 \
    --index-strategy unsafe-best-match \
    'torch>=2.5' pytorch-triton-rocm

# Install opf + git-privacy-filter itself (torch is already satisfied).
uv pip install \
    --extra-index-url https://download.pytorch.org/whl/rocm6.2 \
    --index-strategy unsafe-best-match \
    -e .

# Verify
python -c 'import torch; print(torch.__version__, torch.cuda.is_available())'
# Expect: 2.5.1+rocm6.2  True   (PyTorch exposes HIP devices under "cuda")

To use it globally, symlink the venv's binary onto your PATH (or source the venv in your shell init):

ln -sf "$PWD/.venv-rocm/bin/git-privacy-filter" ~/.local/bin/git-privacy-filter

The hook scripts call plain git-privacy-filter, so whichever copy is first on $PATH wins — the ROCm-venv symlink above takes precedence over any previously uv tool install-ed CPU version.

The commented pytorch-rocm blocks in pyproject.toml are kept as a reference for when the rocm6.2 index catches up to cp313; at that point the uv tool install --reinstall --python 3.12 flow becomes the cleaner path.

Troubleshooting: if torch.cuda.is_available() prints False, check group membership (id -nG should include render), /dev/kfd permissions, and dmesg | grep amdgpu. If you see HIP error: no kernel image is available, your GPU's gfx target isn't in the wheel's compiled set — set HSA_OVERRIDE_GFX_VERSION=11.0.0 (RDNA3) or 10.3.0 (RDNA2).

Note for PyPI consumers: [tool.uv.sources] is a uv-only directive; pip install git-privacy-filter (once we publish) will still pull the default CUDA-enabled torch. If you're a pip user on a CPU-only machine and want the small install, pre-install the CPU wheel:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install git-privacy-filter

Via the pre-commit framework

Add to your repo's .pre-commit-config.yaml:

repos:
  - repo: https://github.com/<owner>/git-privacy-filter
    rev: v0.1.0
    hooks:
      - id: git-privacy-filter         # runs on every commit
      - id: git-privacy-filter-push    # runs on every push

Then pre-commit install --hook-type pre-commit --hook-type pre-push.

Standalone (without the pre-commit framework)

From inside any git repo:

git-privacy-filter install

This writes small shell wrappers into the repo's hooks directory (respecting core.hooksPath) that exec git-privacy-filter precommit / prepush. A magic comment on the second line lets git-privacy-filter uninstall safely remove our hooks while leaving any other hooks you have in place untouched.

Configure

Drop a .git-privacy-filter.toml at your repo root:

[allowlist]
# Skip these paths entirely — the scanner never even runs on them.
# Useful for test fixtures that intentionally contain credential-like
# patterns, lockfiles, vendored third-party code, etc.
paths = [
    "tests/fixtures/**",
    "**/*.lock",
    "vendor/**",
]

[categories]
# Optional. Omit both keys to enable all 8 categories.
# Use EITHER `enabled` OR `disabled`, not both.
enabled  = ["secret", "private_email"]   # only these two block
# disabled = ["private_person"]          # or: block everything except these

Malformed config is a hard error, not a silent "revert to defaults" — a mistyped allowlist_paths that disabled every filter would be the worst kind of bug.

Inline allow markers

For one-off exceptions, annotate the offending line:

# Bare form — allows all categories on this line:
AWS_FAKE_KEY = "AKIA..."  # git-privacy-filter: allow

# Category-scoped — allows only `secret` here; private_email on the same
# line would still block:
USER_AND_TOKEN = ("alice@example.com", "ghp_...")  # git-privacy-filter: allow secret

# For a line that can't carry a trailing comment (e.g. a generated blob),
# use the previous line:
# git-privacy-filter: allow-next-line
1600_Pennsylvania_Avenue_NW_Washington_DC_20500

Any comment prefix works — #, //, --, ;, /* — we match on the git-privacy-filter: token, not on the comment syntax.

Emergency bypass

git commit --no-verify skips all hooks including this one. Use sparingly; there's always a paper trail (the commit exists) but no block.

Manual scan

# Scan the currently staged diff without committing:
git-privacy-filter scan

# Scan a range of commits (what a pre-push would see):
# (no CLI flag yet — invoke the pre-push code path directly):
echo "refs/heads/feature $(git rev-parse HEAD) refs/heads/feature $(git rev-parse origin/main)" \
    | git-privacy-filter prepush

Development

Clone + sync:

git clone https://github.com/<owner>/git-privacy-filter
cd git-privacy-filter
uv sync --group test --group dev

Fast unit tests (the default — mocked opf, milliseconds):

uv run pytest

Tier-2 integration tests (tmp git repos, still mocked opf):

uv run pytest -m integration

Tier-3 real-model tests (loads the actual ~3 GB model — slow, requires the model to be cached, requires opf + torch installed):

uv run pytest -m real_model

On a machine without a GPU, set GIT_PRIVACY_FILTER_TEST_DEVICE=cpu (the default) so torch doesn't try to initialize CUDA.

Lint + format:

uv run ruff check
uv run ruff format

How it works, briefly

For git commit:

git diff --cached -U0 --diff-filter=ACMR → the lines this commit would add (context and deletions are ignored).
Group additions by path, drop allowlisted paths, join per-file additions with \n, feed each file's blob to opf.OPF.redact(text) once.
Map each returned span's character offset back to (file, line, col) via the per-file line-start table we built during the join.
Drop findings whose category is disabled in config, or whose line has a matching inline allow marker.
Print a path:line:col [category] report and exit 1 if anything is left.

For git push, step 1 is git log -p --reverse -U0 <remote_sha>..<local_sha> per ref being pushed, and everything else is the same. A brand-new branch (zero remote_sha) is scanned as "everything reachable from local_sha"; a branch deletion (zero local_sha) is a no-op.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option. This is the same dual-licensing convention used by rust-lang/rust, lightningdevkit/*, and most of the Rust ecosystem — it lets consumers pick the license that matches their own project's licensing obligations.

Upstream opf (code and model weights) is Apache-2.0; pulling it in as a runtime dependency doesn't change the licensing of our own code, but downstream redistributors of a combined work obviously inherit opf's Apache-2.0 requirements.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
src/git_privacy_filter		src/git_privacy_filter
tests		tests
.git-privacy-filter.toml		.git-privacy-filter.toml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

git-privacy-filter

Install

Install size and CPU-only default

NVIDIA / CUDA

AMD / ROCm

Via the pre-commit framework

Standalone (without the pre-commit framework)

Configure

Inline allow markers

Emergency bypass

Manual scan

Development

How it works, briefly

License

Contribution

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

git-privacy-filter

Install

Install size and CPU-only default

NVIDIA / CUDA

AMD / ROCm

Via the pre-commit framework

Standalone (without the pre-commit framework)

Configure

Inline allow markers

Emergency bypass

Manual scan

Development

How it works, briefly

License

Contribution

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages