fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187). by gibber9809 · Pull Request #2192 · y-scope/clp

gibber9809 · 2026-04-09T16:30:06Z

Description

This PR fixes an issue with the timezone regex used by log-converter where " " on its own was accepted as a valid timezone. This leads to log-converter parsing timestamps without a timezone as the timestamp plus a trailing space. Since the new timestamp parser in clp-s is strict (in order to provide lossless compression), clp-s ends up rejecting the timestamp since the trailing space prevents it from matching any known timestamp pattern.

Medium-term, once log-surgeon has stabilized a bit more, we can solve this problem by writing some utilities that convert clp-s timestamp patterns into log-surgeon regex. That way, we eliminate the possibility for log-converter to parse timestamps in a way that clp-s can not interpret.

For now, the fix is to update the timezone regex in log-converter to only accept space as part of the timezone if it is followed by at least some timezone content.

This PR also adds an integration test to validate to give us some basic assurances that the log conversion flow is working as expected.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Validated that log-converter could convert, and clp-s could subsequently compress and search the following unstructured log file, and added this test as a new integration test:

2015-03-23 05:48:30,122 TEST1
2015-03-23 05:48:30,122Z TEST2
2015-03-23 05:48:30,122 Z TEST3
2015-03-23 05:48:30,122+00 TEST4
2015-03-23 05:48:30,122+00Z TEST5
2015-03-23 05:48:30,122 +00 TEST6
2015-03-23 05:48:30,122 +00Z TEST7
2015-03-23 05:48:30,122UTC+00 TEST8
2015-03-23 05:48:30,122UTC+00Z TEST9
2015-03-23 05:48:30,122 UTC+00 TEST10
2015-03-23 05:48:30,122 UTC+00Z TEST11

Summary by CodeRabbit

Bug Fixes
- Improved timestamp parsing to handle additional timezone/UTC/Z variations for more reliable extraction.
Tests
- Added end-to-end integration tests to verify log conversion and compression, including validation of event counts and message markers.
Chores
- Added a reusable test fixture to produce sample unstructured logs and new test path/config utilities to manage test inputs and outputs.

…wed by at least some actual timezone content.

coderabbitai · 2026-04-09T16:30:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Expanded the timestamp regex in LogConverter to accept additional timezone suffix variants; added a session-scoped test fixture that writes 11 varied-timestamp log lines; added an end-to-end integration test that runs log-converter then clp-s; introduced test path/config datatypes and helpers for conversion/compression outputs.

Changes

Cohort / File(s)	Summary
Timestamp Regex Enhancement `components/core/src/clp_s/log_converter/LogConverter.cpp`	Rewrote the timestamp timezone suffix portion of `cTimestampSchema` from a single optional fragment to explicit alternation branches covering combinations of `UTC`, signed offsets (±HH with optional `:MM`), standalone `Z`, and variants with leading space. No other parsing or control flow changed.
Integration Tests & Fixture `integration-tests/tests/fixtures/integration_test_logs.py`, `integration-tests/tests/test_log_converter.py`	Added `simple_unstructured` session fixture that writes 11 hardcoded log lines with varied timestamp formats; added `test_log_converter_transform` which runs the `log-converter` binary, compresses using `clp-s c --timestamp-key timestamp`, optionally inspects compressed output via `clp-s s`, and asserts event counts and message markers.
Test Path/Config Types `integration-tests/tests/utils/config.py`	Added `ClpCorePathConfig.log_converter_binary_path` property; extended `IntegrationTestLogs` with `num_log_events: int

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Harness
    participant LC as "log-converter (binary)"
    participant FS as File System (conversion_dir)
    participant CLP as "clp-s (binary)"
    participant FS2 as File System (compression_dir)

    Test->>LC: invoke with source log file(s)
    LC-->>FS: write converted logs (conversion_dir)
    Test->>CLP: invoke 'clp-s c' on conversion_dir (--timestamp-key timestamp)
    CLP-->>FS2: write compressed output (compression_dir)
    alt num_log_events provided
        Test->>CLP: invoke 'clp-s s' to read compressed output (filter timestamp > 0)
        CLP-->>Test: stdout lines (events)
        Test->>Test: count lines and assert equals expected
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and specifically describes the main change: updating the timezone regex to accept spaces only when followed by actual timezone content, with a reference to the fixed issue.
Docstring Coverage	✅ Passed	Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 33-35: Add automated regression tests that exercise the
LogConverter parsing/serialization boundary for the exact timezone edge cases
mentioned: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30".
Create unit tests that call the LogConverter::parseTimestamp (or the equivalent
parsing function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8dd09ec5-4f55-46e3-b28f-8338aaff0a49

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1329d and 4263f24.

📒 Files selected for processing (1)

components/core/src/clp_s/log_converter/LogConverter.cpp

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.
- Around line 75-77: The event counting treats an empty stdout as one event
because len(output.stdout.strip().split("\n")) returns 1 for "", so update the
logic around run_and_log_subprocess/output.stdout to explicitly check for empty
or whitespace-only stdout: set num_events = 0 if output.stdout is None or
output.stdout.strip() == "" else len(output.stdout.strip().split("\n")) (or use
output.stdout.splitlines() after confirming it's non-empty); modify the code
that computes num_events (referencing variables output, num_events and
test_paths.num_log_events) so empty output yields 0.

In `@integration-tests/tests/utils/config.py`:
- Around line 399-400: The docstring for
ConversionTestPathConfig.clear_test_outputs is misleadingly referring to a
"compression test"; update the docstring to reference the conversion workflow
instead—e.g., change "Remove any existing output directories created by this
compression test." to something like "Remove any existing output directories
created by this conversion workflow." Ensure you update the docstring text in
the ConversionTestPathConfig.clear_test_outputs method so it accurately reflects
the class purpose.
- Around line 386-398: The __post_init__ of ConversionTestPathConfig fails to
validate the provided logs_source_dir which leads to later subprocess failures;
update __post_init__ to retrieve integration_test_path_config.logs_source_dir,
call validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
- Around line 318-319: The field num_log_events currently allows negative
values; add a validation on construction in the class that declares
num_log_events to ensure it is either None or a non-negative integer. If this is
a dataclass, implement a __post_init__ that raises ValueError when
self.num_log_events is not None and self.num_log_events < 0; if it’s a pydantic
model add a `@validator` for "num_log_events" performing the same check. Update
any callers/tests expecting errors accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 456b879a-da17-478e-8eab-cfc25919e21a

📥 Commits

Reviewing files that changed from the base of the PR and between 4263f24 and 180ee22.

📒 Files selected for processing (3)

integration-tests/tests/fixtures/integration_test_logs.py
integration-tests/tests/test_log_converter.py
integration-tests/tests/utils/config.py

coderabbitai · 2026-04-13T19:52:27Z

+    #: Optional number of log events in the downloaded logs.
+    num_log_events: int | None = None


⚠️ Potential issue | 🟡 Minor

Validate num_log_events bounds at construction.

num_log_events currently accepts negative values, which can produce invalid expectations in event-count assertions downstream.

Proposed fix

def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set tarball and extraction paths for integration test logs.""" + if self.num_log_events is not None and self.num_log_events < 0: + err_msg = "`num_log_events` cannot be negative." + raise ValueError(err_msg) + name = self.name.strip() if 0 == len(name): err_msg = "`name` cannot be empty." raise ValueError(err_msg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 318 - 319, The field num_log_events currently allows negative values; add a validation on construction in the class that declares num_log_events to ensure it is either None or a non-negative integer. If this is a dataclass, implement a __post_init__ that raises ValueError when self.num_log_events is not None and self.num_log_events < 0; if it’s a pydantic model add a `@validator` for "num_log_events" performing the same check. Update any callers/tests expecting errors accordingly.

coderabbitai · 2026-04-13T19:52:27Z

+    def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
+        """Initialize and set required directory paths for conversion tests."""
+        test_name = self.test_name.strip()
+        if 0 == len(test_name):
+            err_msg = "`test_name` cannot be empty."
+            raise ValueError(err_msg)
+        test_root_dir = integration_test_path_config.test_root_dir
+        validate_dir_exists(test_root_dir)
+
+        object.__setattr__(self, "test_name", test_name)
+        object.__setattr__(self, "conversion_dir", test_root_dir / f"{test_name}-converted")
+        object.__setattr__(self, "compression_dir", test_root_dir / f"{test_name}-archives")
+


⚠️ Potential issue | 🟡 Minor

Add existence validation for logs_source_dir.

ConversionTestPathConfig stores logs_source_dir but does not validate it. Failing fast here gives clearer diagnostics than letting subprocess calls fail later.

Proposed fix

def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set required directory paths for conversion tests.""" test_name = self.test_name.strip() if 0 == len(test_name): err_msg = "`test_name` cannot be empty." raise ValueError(err_msg) + validate_dir_exists(self.logs_source_dir) test_root_dir = integration_test_path_config.test_root_dir validate_dir_exists(test_root_dir)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 386 - 398, The __post_init__ of ConversionTestPathConfig fails to validate the provided logs_source_dir which leads to later subprocess failures; update __post_init__ to retrieve integration_test_path_config.logs_source_dir, call validate_dir_exists on it (like test_root_dir), and then set the instance attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the logs directory existence is checked early.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

integration-tests/tests/utils/config.py (2)

386-394: ⚠️ Potential issue | 🟡 Minor

Fail fast by validating logs_source_dir existence in ConversionTestPathConfig.

Without this check, failures are deferred to subprocess calls with less actionable errors.

Proposed fix

 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set required directory paths for conversion tests."""
     test_name = self.test_name.strip()
     if 0 == len(test_name):
         err_msg = "`test_name` cannot be empty."
         raise ValueError(err_msg)
+    validate_dir_exists(self.logs_source_dir)
     test_root_dir = integration_test_path_config.test_root_dir
     validate_dir_exists(test_root_dir)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 386 - 394, In
ConversionTestPathConfig.__post_init__ (the method currently validating
test_name and test_root_dir), also validate that self.logs_source_dir (or the
logs_source_dir obtained from the provided integration_test_path_config) exists
by calling validate_dir_exists(logs_source_dir) and raise a clear ValueError if
missing; add the check immediately after validate_dir_exists(test_root_dir) so
failures fail fast with an actionable error message referencing logs_source_dir.

318-323: ⚠️ Potential issue | 🟡 Minor

Validate num_log_events as non-negative at construction.

num_log_events currently accepts negative values, which can make downstream event-count checks invalid.

Proposed fix

 def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
     """Initialize and set tarball and extraction paths for integration test logs."""
+    if self.num_log_events is not None and self.num_log_events < 0:
+        err_msg = "`num_log_events` cannot be negative."
+        raise ValueError(err_msg)
+
     name = self.name.strip()
     if 0 == len(name):
         err_msg = "`name` cannot be empty."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/utils/config.py` around lines 318 - 323, The
num_log_events attribute currently allows negative values; add a validation in
the __post_init__ method to ensure that if self.num_log_events is not None it is
>= 0, otherwise raise a ValueError with a clear message referencing
num_log_events; update the IntegrationTestPathConfig initialization flow (within
__post_init__) to perform this check so invalid negatives are rejected at
construction.

integration-tests/tests/test_log_converter.py (1)

52-55: ⚠️ Potential issue | 🟡 Minor

Ensure cleanup always runs, even when conversion/compression fails.

If _convert_and_compress raises, post-test cleanup is skipped.

Proposed fix

-    _convert_and_compress(clp_core_path_config, test_paths)
-
-    test_paths.clear_test_outputs()
+    try:
+        _convert_and_compress(clp_core_path_config, test_paths)
+    finally:
+        test_paths.clear_test_outputs()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test
currently calls _convert_and_compress(clp_core_path_config, test_paths) directly
so if it raises the subsequent test_paths.clear_test_outputs() is skipped; wrap
the conversion/compression call in a try/finally (or use a pytest fixture
finalizer) so test_paths.clear_test_outputs() always runs in the finally block;
reference the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-4: The module docstring currently claims the tests verify
lossless compression and decompression which is inaccurate; update the
module-level docstring in test_log_converter.py to state that the integration
tests verify CLP core log conversion, clp-s compression, and searchable event
counts (not decompression), so the description matches the actual assertions and
scope of the tests.

---

Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) directly so if it raises
the subsequent test_paths.clear_test_outputs() is skipped; wrap the
conversion/compression call in a try/finally (or use a pytest fixture finalizer)
so test_paths.clear_test_outputs() always runs in the finally block; reference
the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.

In `@integration-tests/tests/utils/config.py`:
- Around line 386-394: In ConversionTestPathConfig.__post_init__ (the method
currently validating test_name and test_root_dir), also validate that
self.logs_source_dir (or the logs_source_dir obtained from the provided
integration_test_path_config) exists by calling
validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing;
add the check immediately after validate_dir_exists(test_root_dir) so failures
fail fast with an actionable error message referencing logs_source_dir.
- Around line 318-323: The num_log_events attribute currently allows negative
values; add a validation in the __post_init__ method to ensure that if
self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a
clear message referencing num_log_events; update the IntegrationTestPathConfig
initialization flow (within __post_init__) to perform this check so invalid
negatives are rejected at construction.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b24d69b9-5ffd-43c5-9389-f1cdf9a79cd4

📥 Commits

Reviewing files that changed from the base of the PR and between 180ee22 and a04da68.

📒 Files selected for processing (2)

integration-tests/tests/test_log_converter.py
integration-tests/tests/utils/config.py

coderabbitai

♻️ Duplicate comments (1)

integration-tests/tests/test_log_converter.py (1)

52-54: ⚠️ Potential issue | 🟡 Minor

Ensure output cleanup always runs when conversion/compression fails.

If _convert_and_compress(...) throws, Line 54 is skipped and test artifacts can leak across runs. Wrap the call in try/finally.

Proposed fix

-    _convert_and_compress(clp_core_path_config, test_paths)
-
-    test_paths.clear_test_outputs()
+    try:
+        _convert_and_compress(clp_core_path_config, test_paths)
+    finally:
+        test_paths.clear_test_outputs()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration-tests/tests/test_log_converter.py` around lines 52 - 54, The call
to _convert_and_compress can raise and currently prevents
test_paths.clear_test_outputs() from running; wrap the invocation of
_convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that
test_paths.clear_test_outputs() is executed in the finally block regardless of
exceptions, ensuring test artifacts are always cleaned up.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-54: The call to _convert_and_compress can raise and currently
prevents test_paths.clear_test_outputs() from running; wrap the invocation of
_convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that
test_paths.clear_test_outputs() is executed in the finally block regardless of
exceptions, ensuring test artifacts are always cleaned up.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 850ce341-e0b2-4956-9639-b9f0fcc4d884

📥 Commits

Reviewing files that changed from the base of the PR and between a04da68 and 552bfc2.

📒 Files selected for processing (2)

components/core/src/clp_s/log_converter/LogConverter.cpp
integration-tests/tests/test_log_converter.py

LinZhihao-723

The regex changes lgtm. Need someone else familiar with integration tests to review the testing code changes.

LinZhihao-723 · 2026-04-16T22:19:22Z

+        R"(((( UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|)"
+        R"((( [\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|(( Z){0,1})|)"
+        R"(((UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})){0,1}))"


This looks a bit convoluted, but I can't think of a better way to simplify it. I hope we can leverage log-surgeon to simplify this a bit in the future, lol

junhaoliao

the rest of the PR lgtm and i don't have more comments. Let's invite @quinntaylormitchell to take a look at whether the changes are inline with the overall integration test suite design

junhaoliao · 2026-04-17T07:37:51Z

+    name = "simple_unstructured"
+    integration_test_logs = IntegrationTestLogs(
+        name=name,
+        tarball_url=f"{name}.tar.gz",


@quinntaylormitchell

IntegrationTestLogs was originally designed only for downloaded logs, and now also wraps generated ones. Since tarball_url is never read by any consumer, do you think we can make it optional with a default of "" (then simple_unstructured doesn't need to provide a dummy value here)?

(just a note) if so, we shall also update the docstring of IntegrationTestLogs because it does mention "download" at the moment.

@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).

junhaoliao · 2026-04-17T07:41:49Z

+    compression_path = str(test_paths.compression_dir)
+    run_and_log_subprocess([log_converter_bin_path, src_path, "--output-dir", conversion_path])
+    run_and_log_subprocess(
+        [clp_s_bin_path, "c", compression_path, conversion_path, "--timestamp-key", "timestamp"]


while it should be fairly obvious to people who have worked with the log-convertor, do you see value in extracting a named constant

# Matching `LogSerializer::cTimestampKey`. LOG_CONVERTER_OUTPUT_TIMESTAMP_KEY = "timestamp"

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com> Co-authored-by: Junhao Liao <junhao@junhao.ca>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-14: The test uses json.loads in
integration-tests/tests/test_log_converter.py (see the parameterized test that
parses lines) but never imports the json module; add "import json" to the module
imports so json.loads(...) does not raise NameError and the parameterized test
can proceed to compression/search assertions. Locate the top-level imports in
this file and insert the json import alongside pytest and the tests.utils
imports.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4de86338-4882-47db-ac9b-89e1a754e72e

📥 Commits

Reviewing files that changed from the base of the PR and between 552bfc2 and e816a99.

📒 Files selected for processing (2)

components/core/src/clp_s/log_converter/LogConverter.cpp
integration-tests/tests/test_log_converter.py

quinntaylormitchell

Note: My review applies only to the files under integration-tests/.

Right now we're making some changes to the testing suite design, but the PRs haven't been merged yet. From the perspective of the new design, there are a number of changes that I would make in this PR, but from the perspective of the currently-implemented design, everything in this PR is appropriate.

Because the core of this PR is a small (and, from the sound of it, important) fix, I think it's appropriate to merge now. I will expand my integration test PR plan to update these log-ingestor tests when the redesign is fully implemented.

With all the above in mind, LGTM.

quinntaylormitchell · 2026-04-17T14:38:53Z

+    name = "simple_unstructured"
+    integration_test_logs = IntegrationTestLogs(
+        name=name,
+        tarball_url=f"{name}.tar.gz",


@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).

Update log-converter timezone pattern to ensure that a space is follo…

4263f24

…wed by at least some actual timezone content.

gibber9809 requested a review from a team as a code owner April 9, 2026 16:30

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread components/core/src/clp_s/log_converter/LogConverter.cpp Outdated

gibber9809 linked an issue Apr 9, 2026 that may be closed by this pull request

Trailing space in timestamp-only log lines causes timestamp parsing failure #2187

Closed

gibber9809 and others added 2 commits April 13, 2026 19:44

Add integration test for log conversion flow.

32e4101

Merge branch 'main' into fix-2187

180ee22

gibber9809 changed the title ~~fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187)."~~ fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187). Apr 13, 2026

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Address rabbit comments.

a04da68

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread integration-tests/tests/test_log_converter.py

gibber9809 added 2 commits April 16, 2026 15:17

Fix incorrect docstring.

43522d1

Merge remote-tracking branch 'upstream/main' into fix-2187

552bfc2

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Merge branch 'main' into fix-2187

2889698

LinZhihao-723 previously approved these changes Apr 16, 2026

View reviewed changes

junhaoliao requested changes Apr 17, 2026

View reviewed changes

junhaoliao and others added 2 commits April 17, 2026 04:00

Merge branch 'main' into fix-2187

10df859

Apply suggestions from code review

e816a99

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com> Co-authored-by: Junhao Liao <junhao@junhao.ca>

gibber9809 dismissed LinZhihao-723’s stale review via e816a99 April 17, 2026 13:20

Address remaining review comment; add missing include.

7167c69

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread integration-tests/tests/test_log_converter.py

gibber9809 requested review from LinZhihao-723 and junhaoliao April 17, 2026 13:31

quinntaylormitchell approved these changes Apr 17, 2026

View reviewed changes

junhaoliao approved these changes Apr 17, 2026

View reviewed changes

gibber9809 merged commit f373a1d into y-scope:main Apr 17, 2026
27 checks passed

junhaoliao added this to the Mid-April 2026 milestone Apr 24, 2026

quinntaylormitchell mentioned this pull request May 14, 2026

refactor(integration-tests): Migrate to SampleDataset and IntegrationTestPathConfig from classes.py; Deactivate IntegrationTestLogs; Deprecate config.IntegrationTestPathConfig. #2241

Merged

3 tasks

		#: Optional number of log events in the downloaded logs.
		num_log_events: int \| None = None

Conversation

gibber9809 commented Apr 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

junhaoliao Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

quinntaylormitchell Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quinntaylormitchell left a comment

Choose a reason for hiding this comment

Uh oh!

quinntaylormitchell Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

gibber9809 commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading