Skip to content

fix(parse): preserve source_name when zip single root differs#1991

Merged
qin-ctx merged 2 commits into
volcengine:mainfrom
zexuyann:fix/zip-parser-collapse-source-name
May 12, 2026
Merged

fix(parse): preserve source_name when zip single root differs#1991
qin-ctx merged 2 commits into
volcengine:mainfrom
zexuyann:fix/zip-parser-collapse-source-name

Conversation

@zexuyann
Copy link
Copy Markdown
Contributor

Description

ZipParser collapsed any zip whose top level held a single directory and unconditionally dropped source_name. For CLI add-resource on a directory whose only child differs in name (e.g. user uploads "access/" containing just "access-dns/"), this silently dropped the user-named parent layer: the resource landed at "/access-dns" instead of "/access/access-dns".

Only collapse when source_name is absent or its stem matches the single root dir name (the legacy "tt_b.zip" wrapping "tt_b/" case). When source_name names a distinct outer layer, parse the extract root and keep source_name so DirectoryParser preserves it as dir_name.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • openviking/parse/parsers/zip_parser.py: gate the single-root-directory collapse on source_name being absent or its stem matching the inner directory's name; otherwise parse the extract root and keep source_name.
  • tests/misc/test_media_processor_zip_root.py: add regression test_zip_single_top_level_dir_preserves_distinct_source_name for the access/ -> access-dns/ case.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

ZipParser collapsed any zip whose top level held a single directory and
unconditionally dropped source_name. For CLI add-resource on a directory
whose only child differs in name (e.g. user uploads "access/" containing
just "access-dns/"), this silently dropped the user-named parent layer:
the resource landed at "<parent>/access-dns" instead of
"<parent>/access/access-dns".

Only collapse when source_name is absent or its stem matches the single
root dir name (the legacy "tt_b.zip" wrapping "tt_b/" case). When
source_name names a distinct outer layer, parse the extract root and
keep source_name so DirectoryParser preserves it as dir_name.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 92
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本次 review 发布 2 条评论:

  • blocking:source_name 使用 Path(...).stem 可能导致 v1.2 这类同名目录不再折叠,产生 v1.2/v1.2 路径回归。
  • non-blocking:建议增加 processor/API 级测试覆盖最终 root_uri,避免只验证 parser 调用参数。

Comment thread openviking/parse/parsers/zip_parser.py Outdated
Comment thread tests/misc/test_media_processor_zip_root.py
Path(source_name).stem drops trailing dotted segments, so source_name
"v1.2" against a zip whose single root is "v1.2/" would fail the
equality check and re-wrap the content, producing
"viking://resources/v1.2/v1.2/...". Compare the leaf name first so
dotted names match exactly, and keep the stem fallback for the
"tt_b.zip" wrapping "tt_b/" case.

Also adds end-to-end regression tests that drive ZipParser +
DirectoryParser + TreeBuilder.finalize_from_temp against an in-memory
VikingFS and assert the final root_uri, so future changes in
DirectoryParser or TreeBuilder can't silently drop the outer layer.
@qin-ctx qin-ctx merged commit 19ae28a into volcengine:main May 12, 2026
5 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 12, 2026
ZaynJarvis pushed a commit that referenced this pull request May 13, 2026
* fix(parse): preserve source_name when zip single root differs

ZipParser collapsed any zip whose top level held a single directory and
unconditionally dropped source_name. For CLI add-resource on a directory
whose only child differs in name (e.g. user uploads "access/" containing
just "access-dns/"), this silently dropped the user-named parent layer:
the resource landed at "<parent>/access-dns" instead of
"<parent>/access/access-dns".

Only collapse when source_name is absent or its stem matches the single
root dir name (the legacy "tt_b.zip" wrapping "tt_b/" case). When
source_name names a distinct outer layer, parse the extract root and
keep source_name so DirectoryParser preserves it as dir_name.

* fix(parse): compare basename before stem in zip single-root collapse

Path(source_name).stem drops trailing dotted segments, so source_name
"v1.2" against a zip whose single root is "v1.2/" would fail the
equality check and re-wrap the content, producing
"viking://resources/v1.2/v1.2/...". Compare the leaf name first so
dotted names match exactly, and keep the stem fallback for the
"tt_b.zip" wrapping "tt_b/" case.

Also adds end-to-end regression tests that drive ZipParser +
DirectoryParser + TreeBuilder.finalize_from_temp against an in-memory
VikingFS and assert the final root_uri, so future changes in
DirectoryParser or TreeBuilder can't silently drop the outer layer.

---------

Co-authored-by: jinze <yanzexu.yzx@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants