Skip to content

wfps60412/Preupload-Guard

PreUpload Guard

Local-only sensitive-data scanner and sanitizer for files, ZIP/Office archives, and folders.
完全在本機執行的敏感資料檢查與去識別化工具,可掃描檔案、ZIP/Office 壓縮文件與資料夾。

CI

Before publishing, replace OWNER/REPOSITORY in the badge URL, package metadata, and GitHub templates.

發布前請將 Badge URL、套件 metadata 與 GitHub 範本中的 OWNER/REPOSITORY 改成實際儲存庫位置。

What it does / 功能

  • Scans a file, a folder, ZIP archives, and ZIP-based Office formats such as DOCX/XLSX/PPTX.

  • Detects common keys, tokens, private-key files, passwords, credentialed connection strings, payment data, personal data, device/network identifiers, local paths, diagnostics, and selected proprietary-data indicators.

  • Shows nearby source lines locally in the GUI and highlights the matching span.

  • Lets you select all, select none, or act on an individual category or finding.

  • Creates a sanitized copy by masking text, excluding files, or removing archive metadata.

  • Creates a full timestamped backup before any overwrite operation.

  • Provides a headless CLI with exit codes for CI or pre-upload gates.

  • Supports Traditional Chinese and English UI switching.

  • 可掃描單一檔案、資料夾、ZIP,以及 DOCX/XLSX/PPTX 等 ZIP 結構文件。

  • 可偵測常見金鑰/Token、私鑰檔、密碼、含帳密連線字串、支付資料、個資、裝置/網路識別、本機路徑、診斷資料與部分專有資料跡象。

  • GUI 僅在本機顯示命中行前後原始碼,並醒目標示命中範圍。

  • 支援全部選取、全部取消、依類別選取,以及逐項調整處理方式。

  • 可建立遮罩、排除檔案、移除壓縮檔中繼資料後的安全副本。

  • 覆寫前會先建立完整時間戳備份。

  • 具備無頭 CLI 與結束碼,可作為 CI 或上傳前 Gate。

  • GUI 可切換繁體中文與 English。

Privacy and security posture / 隱私與安全原則

  • No network calls. The scanner and sanitizer use Python’s standard library only. They do not upload source files, telemetry, or reports.

  • No automatic dependency installation. The launchers do not create environments or run pip.

  • No raw matched text in JSON reports. Reports omit matched values, source snippets, absolute target paths, and scan timestamps.

  • Save As is recommended. Scanning never changes source files; overwrite first writes a timestamped backup.

  • Heuristic only. A clean result is not proof that a file is safe to share.

  • 不進行網路連線。 核心掃描與去識別化僅使用 Python 標準庫,不會上傳原始檔、遙測或報告。

  • 不自動安裝相依套件。 啟動器不會建立環境,也不會執行 pip

  • JSON 報告不含命中原文。 報告不保存命中值、原始碼片段、絕對目標路徑或掃描時間。

  • 建議使用另存新檔。 掃描不會修改來源檔;覆寫前一定先備份。

  • 它是啟發式安全網。 沒有發現不等於可保證安全。

Requirements / 需求

  • Python 3.10 or newer.

  • Tkinter is required only for the GUI. The CLI and tests can run without Tkinter.

  • No required third-party Python packages.

  • Python 3.10 以上。

  • GUI 才需要 Tkinter;CLI 與測試不需要 Tkinter。

  • 沒有必要的第三方 Python 相依套件。

Quick start / 快速開始

GUI / 圖形介面

macOS:

chmod +x run_mac.command
./run_mac.command

Windows:

Double-click run_windows.bat

The optional tkinterdnd2 package enables native drag-and-drop on many systems. It is not required; file/folder chooser buttons always work. Install it only if you explicitly choose to do so:

python -m pip install -r requirements-optional.txt

tkinterdnd2 可在許多系統啟用原生拖放,但不是必要條件;檔案/資料夾選擇按鈕始終可用。只有你明確選擇時才自行安裝:

python -m pip install -r requirements-optional.txt

CLI / 命令列

# Scan only; never modifies the source.
python -m preupload_guard --scan ./release.zip --report ./scan-report.json

# Fail a CI step when High or Critical findings exist.
python -m preupload_guard --scan ./release.zip --fail-on high --strict-unscannable

# Create a separate sanitized copy. The original is untouched.
python -m preupload_guard --scan ./project --sanitize ./project_sanitized

# Run tests without loading the GUI.
python -m preupload_guard --self-test

完整 CLI 說明請見 docs/CLI.md
For complete CLI details, see docs/CLI.md.

Exit codes / 結束碼

Code Meaning / 意義
0 Passed the selected policy / 通過指定 Gate
10 A finding met --fail-on / 有命中項達到 --fail-on 門檻
11 --strict-unscannable found unresolved manual-review content / 嚴格模式發現未處理的人工審核檔案
12 Sanitization or overwrite failed / 另存或覆寫失敗
13 Invalid arguments or invalid rule pack / 參數或規則包無效
14 Runtime or test failure / 執行或測試失敗

Rule packs / 規則包

The public release intentionally ships with generic rules only. You can add an organization- or project-specific JSON rule pack locally without putting private rules or secrets into the public repository.

公開版刻意只附帶通用規則。你可在本機額外載入組織或專案規則包,不必把私有規則或機密放進公開儲存庫。

python -m preupload_guard --scan ./candidate.zip --rules ./my-local-rules.json --fail-on high

See rules/README.md and rules/example-rules.json.

Limitations / 限制

  • PDF, images, audio, video, encrypted archives, nested archives, unknown binary files, and oversized files are not treated as safe merely because text scanning found nothing.

  • With --strict-unscannable, those files block the command until you exclude or manually review them.

  • Do not use this tool as a substitute for rotating a credential that has already been exposed.

  • Sanitized copies may not compile or run. Their purpose is safe review or sharing, not production deployment.

  • Document placeholders such as /Users/<USER> are not treated as a real local home path; a concrete account path is still reported.

  • PDF、圖片、音訊、影片、加密壓縮檔、巢狀壓縮檔、未知二進位檔與過大檔案,不會因為沒有文字命中就被視為安全。

  • 使用 --strict-unscannable 時,這些檔案會阻擋流程,直到被排除或人工確認。

  • 已外流的憑證仍應立即輪換;遮罩副本無法讓既有外洩重新安全。

  • 遮罩副本可能無法編譯或執行;其用途是安全稽核與分享,不是正式部署。

  • /Users/<USER> 等文件範例不會被當成真實本機家目錄;具體帳號路徑仍會被回報。

Development / 開發

python -m unittest discover -v
python -m compileall preupload_guard
python -m preupload_guard --self-test

See CONTRIBUTING.md, SECURITY.md, and docs/RELEASE_CHECKLIST.md.

License / 授權

MIT License. You may use, modify, distribute, sublicense, and sell copies, subject to retaining the copyright and license notice. See LICENSE.

採 MIT License。可自由使用、修改、散布、再授權與販售副本,但須保留版權與授權聲明。詳見 LICENSE

About

本機離線上傳前檢查工具,在分享檔案給 AI、GitHub 或第三方前,偵測並遮罩可能的金鑰、憑證、個資、敏感檔案與中繼資料。A local-first, offline pre-upload scanner that detects and redacts potential secrets, credentials, personal data, sensitive files, and metadata before sharing files with AI, repositories, or third parties.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages