Skip to content

feat(parse): extract shared upload utilities#87

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
ze-mu-zhou:feature/shared-upload-utils
Feb 7, 2026
Merged

feat(parse): extract shared upload utilities#87
MaojiaSheng merged 1 commit intovolcengine:mainfrom
ze-mu-zhou:feature/shared-upload-utils

Conversation

@ze-mu-zhou
Copy link
Contributor

@ze-mu-zhou ze-mu-zhou commented Feb 7, 2026

Description

Extract encoding detection and file upload logic from CodeRepositoryParser._upload_directory() into a new shared module openviking/parse/parsers/upload_utils.py for reuse by the upcoming DirectoryParser (Issue #80, Discussion #83 T2).

Related Issue

Relates to #80
Discussion #83 (T2: shared upload utilities)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • New openviking/parse/parsers/upload_utils.py with shared functions: is_text_file, detect_and_convert_encoding, should_skip_file, should_skip_directory, upload_text_files, upload_directory, with OS-independent path traversal protection via _sanitize_rel_path
  • Refactored CodeRepositoryParser._upload_directory() from 108-line inline implementation to 3-line delegation to shared upload_directory
  • Removed unused imports from code.py (ADDITIONAL_TEXT_EXTENSIONS, TEXT_ENCODINGS, UTF8_VARIANTS)

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

52 test cases covering all functions and edge cases (path traversal, encoding conversion, file filtering, upload failures, etc.)

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

This PR is the T2 sub-task from Discussion #83 (DirectoryParser RFC), providing shared infrastructure for the upcoming T3/T5 implementations.

Extract encoding detection and file upload logic from
CodeRepositoryParser._upload_directory() into a new shared module
openviking/parse/parsers/upload_utils.py for reuse by the upcoming
DirectoryParser (Issue volcengine#80, Discussion volcengine#83 T2).

Shared functions: is_text_file, detect_and_convert_encoding,
should_skip_file, should_skip_directory, upload_text_files,
upload_directory, with OS-independent path traversal protection.

CodeRepositoryParser._upload_directory now delegates to the shared
upload_directory function (108 lines -> 3 lines).

Includes 52 test cases covering all functions and edge cases.
@CLAassistant
Copy link

CLAassistant commented Feb 7, 2026

CLA assistant check
All committers have signed the CLA.

@MaojiaSheng MaojiaSheng merged commit 9c1679f into volcengine:main Feb 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants