Skip to content

fix(clp-s): Identify empty files in try_deduce_reader_type; Accept empty files as input in clp-s and log-converter (fixes #1993, #2063).#2138

Merged
gibber9809 merged 2 commits intoy-scope:mainfrom
gibber9809:fix-1993-2063
Mar 30, 2026

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented Mar 26, 2026

Description

This PR fixes an oversight in clp-s' input utilities where the utilities to indentify the type of an input file don't account for empty files. We add support for detecting empty files as a file type, and use it to support ingesting empty files in clp-s and log-converter.

While fixing this issue I noticed that we currently don't support reading archives composed exclusively of empty files -- this issue is tracked in #2137.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. Passed empty file to log-converter and observed that it produces kv-ir file with zero records as expected
    1.a Passed the kv-ir file with zero records to clp-s and observed that it creates an archive with zero records and metadata recording the empty file as expected
  2. Passed empty JSON file to clp-s and observed that it produces an archive with zero records and metadata recording the empty file as expected

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of empty input files. The system now explicitly detects empty files and processes them gracefully without errors during log conversion, enhancing reliability when working with potentially empty source files.

@gibber9809 gibber9809 requested a review from a team as a code owner March 26, 2026 19:09
@gibber9809 gibber9809 linked an issue Mar 26, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Walkthrough

This pull request introduces handling for empty files in the CLP compression pipeline. A new FileType::EmptyFile enum variant was added to detect and properly route empty files through existing ingestion pathways, preventing them from being treated as unknown file types.

Changes

Cohort / File(s) Summary
Type Definition
components/core/src/clp_s/InputConfig.hpp
Added EmptyFile enum variant to FileType, positioned before Unknown.
Detection & Routing Logic
components/core/src/clp_s/InputConfig.cpp
Enhanced peek_start_and_deduce_type to detect empty files via try_read when buffered data is unavailable; returns FileType::EmptyFile if read yields zero bytes with ErrorCode_EndOfFile. Updated try_deduce_reader_type to treat EmptyFile as a terminal deduced type.
Format Handlers
components/core/src/clp_s/JsonParser.cpp, components/core/src/clp_s/log_converter/log_converter.cpp
Routes FileType::EmptyFile through existing JSON ingestion path in parser and treats it as non-error case in log converter, matching LogText handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: identifying empty files and accepting them as input in clp-s and log-converter, with specific issue references.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gibber9809 gibber9809 changed the title feat(clp-s): Identify empty files in try_deduce_reader_type; accept empty files as input in clp-s and log-converter (fixes #1993, #2063). fix(clp-s): Identify empty files in try_deduce_reader_type; accept empty files as input in clp-s and log-converter (fixes #1993, #2063). Mar 26, 2026
@junhaoliao junhaoliao added this to the March 2026 milestone Mar 26, 2026
@LinZhihao-723 LinZhihao-723 changed the title fix(clp-s): Identify empty files in try_deduce_reader_type; accept empty files as input in clp-s and log-converter (fixes #1993, #2063). fix(clp-s): Identify empty files in try_deduce_reader_type; Accept empty files as input in clp-s and log-converter (fixes #1993, #2063). Mar 30, 2026
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Directly modified the PR title to change accept to Accept.

@gibber9809 gibber9809 merged commit ca7b1ce into y-scope:main Mar 30, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compressing a directory with --unstructured fails on empty files with the error [clp-s] Empty files cause compression to stop.

3 participants