Skip to content

fix(security): clean up code scanning and runtime findings#1596

Merged
MaojiaSheng merged 2 commits intomainfrom
fix/code-scanning-cleanups
Apr 21, 2026
Merged

fix(security): clean up code scanning and runtime findings#1596
MaojiaSheng merged 2 commits intomainfrom
fix/code-scanning-cleanups

Conversation

@qin-ctx
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx commented Apr 20, 2026

Description

这次 PR 主要不是做单点修补,而是集中收敛当前一批可以直接落地的 GitHub code scanning 安全项,同时把相关的 Python 运行时稳定性和清理项一起处理掉。整体可以归成三类:

  1. 安全风险收敛:优先修复路径注入、敏感信息明文日志、目录 listing/XSS、tar 解包越界、URL 主机判断等高价值告警。
  2. Python 运行时正确性与稳定性:修复一批会影响真实运行行为的问题,例如指标采集失败不应反过来打断主流程、异常吞掉后缺少可排查信息、初始化语义过于隐式等。
  3. Python 清理项与可维护性:清掉一批持续制造扫描噪音和维护成本的问题,例如 unused import、无效变量、默认工厂写法、lazy export / TYPE_CHECKING 调整等。

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • 安全风险收敛
    • 收紧 py/path-injection 相关路径访问,覆盖 vectordb 本地 project / collection / index 的目录恢复与创建逻辑,以及 console / werewolf demo 的文件访问边界。
    • 收敛 py/clear-text-logging-sensitive-data,去掉主代码、示例脚本和测试脚本中对 api_keyroot_api_key、headers、完整配置内容的明文输出,改为脱敏或只显示是否已配置。
    • 修复 py/reflective-xsspy/bad-tag-filter,对 werewolf 目录 listing 输出做 HTML 转义,并将部分 HTML 处理从正则替换为解析器实现。
    • 修复 py/tarslip,为 tar 解包增加路径越界和链接校验。
    • 修复 py/incomplete-url-substring-sanitization,将字符串包含判断改为基于 urlparse 的主机与路径校验。
  • Python 运行时正确性与稳定性
    • 调整 encryption / session / tracing 等链路中的 metrics 与 observability 逻辑,使其变成 best-effort,不再因为采集失败影响主业务流程。
    • 修复部分 empty except / 异常吞掉问题,保留必要日志或明确失败语义,降低排障成本。
    • 调整部分初始化行为,例如 QueueManager 未初始化时不再隐式初始化,避免默认状态掩盖问题。
    • 修正若干 embedder / lock / session / vectordb 相关的初始化与运行时细节问题。
  • Python 清理项与可维护性
    • 清理 unused import、无效变量、无意义 passdefault_factory=lambda: ... 等低价值噪音问题。
    • 通过 lazy import 与 TYPE_CHECKING 调整,缓解部分循环依赖和导出副作用问题。
    • 新增 tests/service/test_session_service_metrics.py,并更新相关测试与脚本,避免这批问题重复出现。

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

本地执行:

  • python -m py_compile openviking/storage/vectordb/project/local_project.py openviking/storage/vectordb/project/project_group.py openviking/storage/vectordb/collection/local_collection.py bot/vikingbot/console/web_console.py openviking/utils/agfs_utils.py bot/demo/werewolf/werewolf_server.py openviking/parse/parsers/feishu.py openviking/models/vlm/backends/litellm_vlm.py bot/vikingbot/agent/tools/web.py openviking/parse/parsers/epub.py benchmark/RAG/scripts/download_dataset.py openviking/eval/ragas/__init__.py examples/cloud/alice.py examples/cloud/bob.py tests/api_test/tools/tests/test_simple_startup.py tests/api_test/tools/tests/test_load_config.py tests/api_test/tools/tests/test_lifespan.py tests/api_test/tools/tests/test_headers.py tests/api_test/tools/tests/test_full_startup.py tests/api_test/tools/tests/test_create_app.py tests/api_test/tools/tests/test_config_value.py tests/api_test/tools/tests/test_admin_api.py tests/api_test/tools/config/generate_config.py bot/tests/test_minimax_provider.py
  • python -m pytest --override-ini addopts='' bot/tests/test_werewolf_server_security.py tests/misc/test_config_validation.py tests/unit/crypto/test_providers_mock.py tests/service/test_session_service_metrics.py -q
  • python -m pytest --override-ini addopts='' bot/tests/test_minimax_provider.py -q

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

这次优先处理的是“可以直接修且风险明确”的部分。py/jinja2/autoescape-falsepy/weak-sensitive-data-hashing 以及 third_party/ 下的 vendor 告警没有在这个 PR 里一并处理,后续建议单独复核和拆分。

Harden path and logging boundaries, remove noisy cleanup issues,
and keep observability failures from breaking runtime flows.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 Security concerns

Zip archive path traversal vulnerability:
The extract_archive function in benchmark/RAG/scripts/download_dataset.py properly validates tar archive members for path traversal, but does not perform equivalent checks for zip archives. A malicious zip file could contain entries with ../ or absolute paths to escape the extraction directory.

✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix werewolf server security issues (path traversal, XSS)

Relevant files:

  • bot/demo/werewolf/werewolf_server.py

Sub-PR theme: Fix dataset downloader security (tar path traversal)

Relevant files:

  • benchmark/RAG/scripts/download_dataset.py

Sub-PR theme: Improve storage path safety and type checking

Relevant files:

  • openviking/storage/vectordb/project/project_group.py
  • openviking/storage/init.py

⚡ Recommended focus areas for review

Zip Archive Path Traversal

The extract_archive function uses zipfile.ZipFile.extractall() without validating zip members, which could allow path traversal attacks via malicious zip archives containing entries with ../ or absolute paths.

if archive_path.suffix == ".zip":
    with zipfile.ZipFile(archive_path, "r") as zip_ref:
        zip_ref.extractall(temp_extract_dir)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Block the remaining path traversal bypass in the werewolf demo,
and validate Feishu hosts on the main parse() entry point.
@MaojiaSheng MaojiaSheng merged commit 38c324b into main Apr 21, 2026
5 of 6 checks passed
@MaojiaSheng MaojiaSheng deleted the fix/code-scanning-cleanups branch April 21, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants