Skip to content

fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187).#2192

Merged
gibber9809 merged 10 commits into
y-scope:mainfrom
gibber9809:fix-2187
Apr 17, 2026
Merged

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented Apr 9, 2026

Description

This PR fixes an issue with the timezone regex used by log-converter where " " on its own was accepted as a valid timezone. This leads to log-converter parsing timestamps without a timezone as the timestamp plus a trailing space. Since the new timestamp parser in clp-s is strict (in order to provide lossless compression), clp-s ends up rejecting the timestamp since the trailing space prevents it from matching any known timestamp pattern.

Medium-term, once log-surgeon has stabilized a bit more, we can solve this problem by writing some utilities that convert clp-s timestamp patterns into log-surgeon regex. That way, we eliminate the possibility for log-converter to parse timestamps in a way that clp-s can not interpret.

For now, the fix is to update the timezone regex in log-converter to only accept space as part of the timezone if it is followed by at least some timezone content.

This PR also adds an integration test to validate to give us some basic assurances that the log conversion flow is working as expected.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Validated that log-converter could convert, and clp-s could subsequently compress and search the following unstructured log file, and added this test as a new integration test:
2015-03-23 05:48:30,122 TEST1
2015-03-23 05:48:30,122Z TEST2
2015-03-23 05:48:30,122 Z TEST3
2015-03-23 05:48:30,122+00 TEST4
2015-03-23 05:48:30,122+00Z TEST5
2015-03-23 05:48:30,122 +00 TEST6
2015-03-23 05:48:30,122 +00Z TEST7
2015-03-23 05:48:30,122UTC+00 TEST8
2015-03-23 05:48:30,122UTC+00Z TEST9
2015-03-23 05:48:30,122 UTC+00 TEST10
2015-03-23 05:48:30,122 UTC+00Z TEST11

Summary by CodeRabbit

  • Bug Fixes

    • Improved timestamp parsing to handle additional timezone/UTC/Z variations for more reliable extraction.
  • Tests

    • Added end-to-end integration tests to verify log conversion and compression, including validation of event counts and message markers.
  • Chores

    • Added a reusable test fixture to produce sample unstructured logs and new test path/config utilities to manage test inputs and outputs.

…wed by at least some actual timezone content.
@gibber9809 gibber9809 requested a review from a team as a code owner April 9, 2026 16:30
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Expanded the timestamp regex in LogConverter to accept additional timezone suffix variants; added a session-scoped test fixture that writes 11 varied-timestamp log lines; added an end-to-end integration test that runs log-converter then clp-s; introduced test path/config datatypes and helpers for conversion/compression outputs.

Changes

Cohort / File(s) Summary
Timestamp Regex Enhancement
components/core/src/clp_s/log_converter/LogConverter.cpp
Rewrote the timestamp timezone suffix portion of cTimestampSchema from a single optional fragment to explicit alternation branches covering combinations of UTC, signed offsets (±HH with optional :MM), standalone Z, and variants with leading space. No other parsing or control flow changed.
Integration Tests & Fixture
integration-tests/tests/fixtures/integration_test_logs.py, integration-tests/tests/test_log_converter.py
Added simple_unstructured session fixture that writes 11 hardcoded log lines with varied timestamp formats; added test_log_converter_transform which runs the log-converter binary, compresses using clp-s c --timestamp-key timestamp, optionally inspects compressed output via clp-s s, and asserts event counts and message markers.
Test Path/Config Types
integration-tests/tests/utils/config.py
Added ClpCorePathConfig.log_converter_binary_path property; extended IntegrationTestLogs with `num_log_events: int

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Harness
    participant LC as "log-converter (binary)"
    participant FS as File System (conversion_dir)
    participant CLP as "clp-s (binary)"
    participant FS2 as File System (compression_dir)

    Test->>LC: invoke with source log file(s)
    LC-->>FS: write converted logs (conversion_dir)
    Test->>CLP: invoke 'clp-s c' on conversion_dir (--timestamp-key timestamp)
    CLP-->>FS2: write compressed output (compression_dir)
    alt num_log_events provided
        Test->>CLP: invoke 'clp-s s' to read compressed output (filter timestamp > 0)
        CLP-->>Test: stdout lines (events)
        Test->>Test: count lines and assert equals expected
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and specifically describes the main change: updating the timezone regex to accept spaces only when followed by actual timezone content, with a reference to the fixed issue.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 33-35: Add automated regression tests that exercise the
LogConverter parsing/serialization boundary for the exact timezone edge cases
mentioned: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30".
Create unit tests that call the LogConverter::parseTimestamp (or the equivalent
parsing function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8dd09ec5-4f55-46e3-b28f-8338aaff0a49

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1329d and 4263f24.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/LogConverter.cpp

Comment thread components/core/src/clp_s/log_converter/LogConverter.cpp Outdated
@gibber9809 gibber9809 changed the title fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187)." fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187). Apr 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.
- Around line 75-77: The event counting treats an empty stdout as one event
because len(output.stdout.strip().split("\n")) returns 1 for "", so update the
logic around run_and_log_subprocess/output.stdout to explicitly check for empty
or whitespace-only stdout: set num_events = 0 if output.stdout is None or
output.stdout.strip() == "" else len(output.stdout.strip().split("\n")) (or use
output.stdout.splitlines() after confirming it's non-empty); modify the code
that computes num_events (referencing variables output, num_events and
test_paths.num_log_events) so empty output yields 0.

In `@integration-tests/tests/utils/config.py`:
- Around line 399-400: The docstring for
ConversionTestPathConfig.clear_test_outputs is misleadingly referring to a
"compression test"; update the docstring to reference the conversion workflow
instead—e.g., change "Remove any existing output directories created by this
compression test." to something like "Remove any existing output directories
created by this conversion workflow." Ensure you update the docstring text in
the ConversionTestPathConfig.clear_test_outputs method so it accurately reflects
the class purpose.
- Around line 386-398: The __post_init__ of ConversionTestPathConfig fails to
validate the provided logs_source_dir which leads to later subprocess failures;
update __post_init__ to retrieve integration_test_path_config.logs_source_dir,
call validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
- Around line 318-319: The field num_log_events currently allows negative
values; add a validation on construction in the class that declares
num_log_events to ensure it is either None or a non-negative integer. If this is
a dataclass, implement a __post_init__ that raises ValueError when
self.num_log_events is not None and self.num_log_events < 0; if it’s a pydantic
model add a `@validator` for "num_log_events" performing the same check. Update
any callers/tests expecting errors accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 456b879a-da17-478e-8eab-cfc25919e21a

📥 Commits

Reviewing files that changed from the base of the PR and between 4263f24 and 180ee22.

📒 Files selected for processing (3)
  • integration-tests/tests/fixtures/integration_test_logs.py
  • integration-tests/tests/test_log_converter.py
  • integration-tests/tests/utils/config.py

Comment thread integration-tests/tests/test_log_converter.py Outdated
Comment thread integration-tests/tests/test_log_converter.py Outdated
Comment on lines +318 to +319
#: Optional number of log events in the downloaded logs.
num_log_events: int | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate num_log_events bounds at construction.

num_log_events currently accepts negative values, which can produce invalid expectations in event-count assertions downstream.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set tarball and extraction paths for integration test logs."""
+    if self.num_log_events is not None and self.num_log_events < 0:
+        err_msg = "`num_log_events` cannot be negative."
+        raise ValueError(err_msg)
+
     name = self.name.strip()
     if 0 == len(name):
         err_msg = "`name` cannot be empty."
         raise ValueError(err_msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 318 - 319, The field
num_log_events currently allows negative values; add a validation on
construction in the class that declares num_log_events to ensure it is either
None or a non-negative integer. If this is a dataclass, implement a
__post_init__ that raises ValueError when self.num_log_events is not None and
self.num_log_events < 0; if it’s a pydantic model add a `@validator` for
"num_log_events" performing the same check. Update any callers/tests expecting
errors accordingly.

Comment on lines +386 to +398
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set required directory paths for conversion tests."""
test_name = self.test_name.strip()
if 0 == len(test_name):
err_msg = "`test_name` cannot be empty."
raise ValueError(err_msg)
test_root_dir = integration_test_path_config.test_root_dir
validate_dir_exists(test_root_dir)

object.__setattr__(self, "test_name", test_name)
object.__setattr__(self, "conversion_dir", test_root_dir / f"{test_name}-converted")
object.__setattr__(self, "compression_dir", test_root_dir / f"{test_name}-archives")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add existence validation for logs_source_dir.

ConversionTestPathConfig stores logs_source_dir but does not validate it. Failing fast here gives clearer diagnostics than letting subprocess calls fail later.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set required directory paths for conversion tests."""
     test_name = self.test_name.strip()
     if 0 == len(test_name):
         err_msg = "`test_name` cannot be empty."
         raise ValueError(err_msg)
+    validate_dir_exists(self.logs_source_dir)
     test_root_dir = integration_test_path_config.test_root_dir
     validate_dir_exists(test_root_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 386 - 398, The
__post_init__ of ConversionTestPathConfig fails to validate the provided
logs_source_dir which leads to later subprocess failures; update __post_init__
to retrieve integration_test_path_config.logs_source_dir, call
validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.

Comment thread integration-tests/tests/utils/config.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
integration-tests/tests/utils/config.py (2)

386-394: ⚠️ Potential issue | 🟡 Minor

Fail fast by validating logs_source_dir existence in ConversionTestPathConfig.

Without this check, failures are deferred to subprocess calls with less actionable errors.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set required directory paths for conversion tests."""
     test_name = self.test_name.strip()
     if 0 == len(test_name):
         err_msg = "`test_name` cannot be empty."
         raise ValueError(err_msg)
+    validate_dir_exists(self.logs_source_dir)
     test_root_dir = integration_test_path_config.test_root_dir
     validate_dir_exists(test_root_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 386 - 394, In
ConversionTestPathConfig.__post_init__ (the method currently validating
test_name and test_root_dir), also validate that self.logs_source_dir (or the
logs_source_dir obtained from the provided integration_test_path_config) exists
by calling validate_dir_exists(logs_source_dir) and raise a clear ValueError if
missing; add the check immediately after validate_dir_exists(test_root_dir) so
failures fail fast with an actionable error message referencing logs_source_dir.

318-323: ⚠️ Potential issue | 🟡 Minor

Validate num_log_events as non-negative at construction.

num_log_events currently accepts negative values, which can make downstream event-count checks invalid.

Proposed fix
 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set tarball and extraction paths for integration test logs."""
+    if self.num_log_events is not None and self.num_log_events < 0:
+        err_msg = "`num_log_events` cannot be negative."
+        raise ValueError(err_msg)
+
     name = self.name.strip()
     if 0 == len(name):
         err_msg = "`name` cannot be empty."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 318 - 323, The
num_log_events attribute currently allows negative values; add a validation in
the __post_init__ method to ensure that if self.num_log_events is not None it is
>= 0, otherwise raise a ValueError with a clear message referencing
num_log_events; update the IntegrationTestPathConfig initialization flow (within
__post_init__) to perform this check so invalid negatives are rejected at
construction.
integration-tests/tests/test_log_converter.py (1)

52-55: ⚠️ Potential issue | 🟡 Minor

Ensure cleanup always runs, even when conversion/compression fails.

If _convert_and_compress raises, post-test cleanup is skipped.

Proposed fix
-    _convert_and_compress(clp_core_path_config, test_paths)
-
-    test_paths.clear_test_outputs()
+    try:
+        _convert_and_compress(clp_core_path_config, test_paths)
+    finally:
+        test_paths.clear_test_outputs()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test
currently calls _convert_and_compress(clp_core_path_config, test_paths) directly
so if it raises the subsequent test_paths.clear_test_outputs() is skipped; wrap
the conversion/compression call in a try/finally (or use a pytest fixture
finalizer) so test_paths.clear_test_outputs() always runs in the finally block;
reference the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-4: The module docstring currently claims the tests verify
lossless compression and decompression which is inaccurate; update the
module-level docstring in test_log_converter.py to state that the integration
tests verify CLP core log conversion, clp-s compression, and searchable event
counts (not decompression), so the description matches the actual assertions and
scope of the tests.

---

Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) directly so if it raises
the subsequent test_paths.clear_test_outputs() is skipped; wrap the
conversion/compression call in a try/finally (or use a pytest fixture finalizer)
so test_paths.clear_test_outputs() always runs in the finally block; reference
the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.

In `@integration-tests/tests/utils/config.py`:
- Around line 386-394: In ConversionTestPathConfig.__post_init__ (the method
currently validating test_name and test_root_dir), also validate that
self.logs_source_dir (or the logs_source_dir obtained from the provided
integration_test_path_config) exists by calling
validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing;
add the check immediately after validate_dir_exists(test_root_dir) so failures
fail fast with an actionable error message referencing logs_source_dir.
- Around line 318-323: The num_log_events attribute currently allows negative
values; add a validation in the __post_init__ method to ensure that if
self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a
clear message referencing num_log_events; update the IntegrationTestPathConfig
initialization flow (within __post_init__) to perform this check so invalid
negatives are rejected at construction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b24d69b9-5ffd-43c5-9389-f1cdf9a79cd4

📥 Commits

Reviewing files that changed from the base of the PR and between 180ee22 and a04da68.

📒 Files selected for processing (2)
  • integration-tests/tests/test_log_converter.py
  • integration-tests/tests/utils/config.py

Comment thread integration-tests/tests/test_log_converter.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
integration-tests/tests/test_log_converter.py (1)

52-54: ⚠️ Potential issue | 🟡 Minor

Ensure output cleanup always runs when conversion/compression fails.

If _convert_and_compress(...) throws, Line 54 is skipped and test artifacts can leak across runs. Wrap the call in try/finally.

Proposed fix
-    _convert_and_compress(clp_core_path_config, test_paths)
-
-    test_paths.clear_test_outputs()
+    try:
+        _convert_and_compress(clp_core_path_config, test_paths)
+    finally:
+        test_paths.clear_test_outputs()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 52 - 54, The call
to _convert_and_compress can raise and currently prevents
test_paths.clear_test_outputs() from running; wrap the invocation of
_convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that
test_paths.clear_test_outputs() is executed in the finally block regardless of
exceptions, ensuring test artifacts are always cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-54: The call to _convert_and_compress can raise and currently
prevents test_paths.clear_test_outputs() from running; wrap the invocation of
_convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that
test_paths.clear_test_outputs() is executed in the finally block regardless of
exceptions, ensuring test artifacts are always cleaned up.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 850ce341-e0b2-4956-9639-b9f0fcc4d884

📥 Commits

Reviewing files that changed from the base of the PR and between a04da68 and 552bfc2.

📒 Files selected for processing (2)
  • components/core/src/clp_s/log_converter/LogConverter.cpp
  • integration-tests/tests/test_log_converter.py

LinZhihao-723
LinZhihao-723 previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex changes lgtm. Need someone else familiar with integration tests to review the testing code changes.

Comment on lines +33 to +35
R"(((( UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|)"
R"((( [\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|(( Z){0,1})|)"
R"(((UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})){0,1}))"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit convoluted, but I can't think of a better way to simplify it. I hope we can leverage log-surgeon to simplify this a bit in the future, lol

Comment thread components/core/src/clp_s/log_converter/LogConverter.cpp Outdated
Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of the PR lgtm and i don't have more comments. Let's invite @quinntaylormitchell to take a look at whether the changes are inline with the overall integration test suite design

Comment thread integration-tests/tests/test_log_converter.py Outdated
Comment thread integration-tests/tests/test_log_converter.py Outdated
name = "simple_unstructured"
integration_test_logs = IntegrationTestLogs(
name=name,
tarball_url=f"{name}.tar.gz",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quinntaylormitchell

IntegrationTestLogs was originally designed only for downloaded logs, and now also wraps generated ones. Since tarball_url is never read by any consumer, do you think we can make it optional with a default of "" (then simple_unstructured doesn't need to provide a dummy value here)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just a note) if so, we shall also update the docstring of IntegrationTestLogs because it does mention "download" at the moment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).

compression_path = str(test_paths.compression_dir)
run_and_log_subprocess([log_converter_bin_path, src_path, "--output-dir", conversion_path])
run_and_log_subprocess(
[clp_s_bin_path, "c", compression_path, conversion_path, "--timestamp-key", "timestamp"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it should be fairly obvious to people who have worked with the log-convertor, do you see value in extracting a named constant

# Matching `LogSerializer::cTimestampKey`.
LOG_CONVERTER_OUTPUT_TIMESTAMP_KEY = "timestamp"

Comment thread integration-tests/tests/test_log_converter.py Outdated
junhaoliao and others added 2 commits April 17, 2026 04:00
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-14: The test uses json.loads in
integration-tests/tests/test_log_converter.py (see the parameterized test that
parses lines) but never imports the json module; add "import json" to the module
imports so json.loads(...) does not raise NameError and the parameterized test
can proceed to compression/search assertions. Locate the top-level imports in
this file and insert the json import alongside pytest and the tests.utils
imports.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4de86338-4882-47db-ac9b-89e1a754e72e

📥 Commits

Reviewing files that changed from the base of the PR and between 552bfc2 and e816a99.

📒 Files selected for processing (2)
  • components/core/src/clp_s/log_converter/LogConverter.cpp
  • integration-tests/tests/test_log_converter.py

Comment thread integration-tests/tests/test_log_converter.py
Copy link
Copy Markdown
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: My review applies only to the files under integration-tests/.

Right now we're making some changes to the testing suite design, but the PRs haven't been merged yet. From the perspective of the new design, there are a number of changes that I would make in this PR, but from the perspective of the currently-implemented design, everything in this PR is appropriate.

Because the core of this PR is a small (and, from the sound of it, important) fix, I think it's appropriate to merge now. I will expand my integration test PR plan to update these log-ingestor tests when the redesign is fully implemented.

With all the above in mind, LGTM.

name = "simple_unstructured"
integration_test_logs = IntegrationTestLogs(
name=name,
tarball_url=f"{name}.tar.gz",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trailing space in timestamp-only log lines causes timestamp parsing failure

4 participants