Local-only sensitive-data scanner and sanitizer for files, ZIP/Office archives, and folders.
完全在本機執行的敏感資料檢查與去識別化工具,可掃描檔案、ZIP/Office 壓縮文件與資料夾。
Before publishing, replace
OWNER/REPOSITORYin the badge URL, package metadata, and GitHub templates.發布前請將 Badge URL、套件 metadata 與 GitHub 範本中的
OWNER/REPOSITORY改成實際儲存庫位置。
-
Scans a file, a folder, ZIP archives, and ZIP-based Office formats such as DOCX/XLSX/PPTX.
-
Detects common keys, tokens, private-key files, passwords, credentialed connection strings, payment data, personal data, device/network identifiers, local paths, diagnostics, and selected proprietary-data indicators.
-
Shows nearby source lines locally in the GUI and highlights the matching span.
-
Lets you select all, select none, or act on an individual category or finding.
-
Creates a sanitized copy by masking text, excluding files, or removing archive metadata.
-
Creates a full timestamped backup before any overwrite operation.
-
Provides a headless CLI with exit codes for CI or pre-upload gates.
-
Supports Traditional Chinese and English UI switching.
-
可掃描單一檔案、資料夾、ZIP,以及 DOCX/XLSX/PPTX 等 ZIP 結構文件。
-
可偵測常見金鑰/Token、私鑰檔、密碼、含帳密連線字串、支付資料、個資、裝置/網路識別、本機路徑、診斷資料與部分專有資料跡象。
-
GUI 僅在本機顯示命中行前後原始碼,並醒目標示命中範圍。
-
支援全部選取、全部取消、依類別選取,以及逐項調整處理方式。
-
可建立遮罩、排除檔案、移除壓縮檔中繼資料後的安全副本。
-
覆寫前會先建立完整時間戳備份。
-
具備無頭 CLI 與結束碼,可作為 CI 或上傳前 Gate。
-
GUI 可切換繁體中文與 English。
-
No network calls. The scanner and sanitizer use Python’s standard library only. They do not upload source files, telemetry, or reports.
-
No automatic dependency installation. The launchers do not create environments or run
pip. -
No raw matched text in JSON reports. Reports omit matched values, source snippets, absolute target paths, and scan timestamps.
-
Save As is recommended. Scanning never changes source files; overwrite first writes a timestamped backup.
-
Heuristic only. A clean result is not proof that a file is safe to share.
-
不進行網路連線。 核心掃描與去識別化僅使用 Python 標準庫,不會上傳原始檔、遙測或報告。
-
不自動安裝相依套件。 啟動器不會建立環境,也不會執行
pip。 -
JSON 報告不含命中原文。 報告不保存命中值、原始碼片段、絕對目標路徑或掃描時間。
-
建議使用另存新檔。 掃描不會修改來源檔;覆寫前一定先備份。
-
它是啟發式安全網。 沒有發現不等於可保證安全。
-
Python 3.10 or newer.
-
Tkinter is required only for the GUI. The CLI and tests can run without Tkinter.
-
No required third-party Python packages.
-
Python 3.10 以上。
-
GUI 才需要 Tkinter;CLI 與測試不需要 Tkinter。
-
沒有必要的第三方 Python 相依套件。
macOS:
chmod +x run_mac.command
./run_mac.commandWindows:
Double-click run_windows.bat
The optional tkinterdnd2 package enables native drag-and-drop on many systems. It is not required; file/folder chooser buttons always work. Install it only if you explicitly choose to do so:
python -m pip install -r requirements-optional.txttkinterdnd2 可在許多系統啟用原生拖放,但不是必要條件;檔案/資料夾選擇按鈕始終可用。只有你明確選擇時才自行安裝:
python -m pip install -r requirements-optional.txt# Scan only; never modifies the source.
python -m preupload_guard --scan ./release.zip --report ./scan-report.json
# Fail a CI step when High or Critical findings exist.
python -m preupload_guard --scan ./release.zip --fail-on high --strict-unscannable
# Create a separate sanitized copy. The original is untouched.
python -m preupload_guard --scan ./project --sanitize ./project_sanitized
# Run tests without loading the GUI.
python -m preupload_guard --self-test完整 CLI 說明請見 docs/CLI.md。
For complete CLI details, see docs/CLI.md.
| Code | Meaning / 意義 |
|---|---|
0 |
Passed the selected policy / 通過指定 Gate |
10 |
A finding met --fail-on / 有命中項達到 --fail-on 門檻 |
11 |
--strict-unscannable found unresolved manual-review content / 嚴格模式發現未處理的人工審核檔案 |
12 |
Sanitization or overwrite failed / 另存或覆寫失敗 |
13 |
Invalid arguments or invalid rule pack / 參數或規則包無效 |
14 |
Runtime or test failure / 執行或測試失敗 |
The public release intentionally ships with generic rules only. You can add an organization- or project-specific JSON rule pack locally without putting private rules or secrets into the public repository.
公開版刻意只附帶通用規則。你可在本機額外載入組織或專案規則包,不必把私有規則或機密放進公開儲存庫。
python -m preupload_guard --scan ./candidate.zip --rules ./my-local-rules.json --fail-on highSee rules/README.md and rules/example-rules.json.
-
PDF, images, audio, video, encrypted archives, nested archives, unknown binary files, and oversized files are not treated as safe merely because text scanning found nothing.
-
With
--strict-unscannable, those files block the command until you exclude or manually review them. -
Do not use this tool as a substitute for rotating a credential that has already been exposed.
-
Sanitized copies may not compile or run. Their purpose is safe review or sharing, not production deployment.
-
Document placeholders such as
/Users/<USER>are not treated as a real local home path; a concrete account path is still reported. -
PDF、圖片、音訊、影片、加密壓縮檔、巢狀壓縮檔、未知二進位檔與過大檔案,不會因為沒有文字命中就被視為安全。
-
使用
--strict-unscannable時,這些檔案會阻擋流程,直到被排除或人工確認。 -
已外流的憑證仍應立即輪換;遮罩副本無法讓既有外洩重新安全。
-
遮罩副本可能無法編譯或執行;其用途是安全稽核與分享,不是正式部署。
-
/Users/<USER>等文件範例不會被當成真實本機家目錄;具體帳號路徑仍會被回報。
python -m unittest discover -v
python -m compileall preupload_guard
python -m preupload_guard --self-testSee CONTRIBUTING.md, SECURITY.md, and docs/RELEASE_CHECKLIST.md.
MIT License. You may use, modify, distribute, sublicense, and sell copies, subject to retaining the copyright and license notice. See LICENSE.
採 MIT License。可自由使用、修改、散布、再授權與販售副本,但須保留版權與授權聲明。詳見 LICENSE。