fix(clp-s::log_converter): Update timezone regex to only accept spaces when followed by an actual timezone (fixes #2187).#2192
Conversation
…wed by at least some actual timezone content.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughExpanded the timestamp regex in LogConverter to accept additional timezone suffix variants; added a session-scoped test fixture that writes 11 varied-timestamp log lines; added an end-to-end integration test that runs Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Harness
participant LC as "log-converter (binary)"
participant FS as File System (conversion_dir)
participant CLP as "clp-s (binary)"
participant FS2 as File System (compression_dir)
Test->>LC: invoke with source log file(s)
LC-->>FS: write converted logs (conversion_dir)
Test->>CLP: invoke 'clp-s c' on conversion_dir (--timestamp-key timestamp)
CLP-->>FS2: write compressed output (compression_dir)
alt num_log_events provided
Test->>CLP: invoke 'clp-s s' to read compressed output (filter timestamp > 0)
CLP-->>Test: stdout lines (events)
Test->>Test: count lines and assert equals expected
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/log_converter/LogConverter.cpp`:
- Around line 33-35: Add automated regression tests that exercise the
LogConverter parsing/serialization boundary for the exact timezone edge cases
mentioned: no timezone, lone space, " Z", " UTC", " +05:30", and "UTC+05:30".
Create unit tests that call the LogConverter::parseTimestamp (or the equivalent
parsing function) with inputs for each case and then call
LogConverter::serializeTimestamp (or the corresponding serializer) on the parsed
result, asserting round-trip equality (or normalized canonical form) for each
case so future changes to the regex in LogConverter.cpp are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8dd09ec5-4f55-46e3-b28f-8338aaff0a49
📒 Files selected for processing (1)
components/core/src/clp_s/log_converter/LogConverter.cpp
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) then
test_paths.clear_test_outputs(), but if _convert_and_compress raises the cleanup
is skipped; wrap the call to _convert_and_compress in a try/finally so
test_paths.clear_test_outputs() is invoked in the finally block regardless of
errors. Locate the invocation of _convert_and_compress and replace the
sequential calls with a try { _convert_and_compress(...) } finally {
test_paths.clear_test_outputs() } pattern to guarantee cleanup.
- Around line 75-77: The event counting treats an empty stdout as one event
because len(output.stdout.strip().split("\n")) returns 1 for "", so update the
logic around run_and_log_subprocess/output.stdout to explicitly check for empty
or whitespace-only stdout: set num_events = 0 if output.stdout is None or
output.stdout.strip() == "" else len(output.stdout.strip().split("\n")) (or use
output.stdout.splitlines() after confirming it's non-empty); modify the code
that computes num_events (referencing variables output, num_events and
test_paths.num_log_events) so empty output yields 0.
In `@integration-tests/tests/utils/config.py`:
- Around line 399-400: The docstring for
ConversionTestPathConfig.clear_test_outputs is misleadingly referring to a
"compression test"; update the docstring to reference the conversion workflow
instead—e.g., change "Remove any existing output directories created by this
compression test." to something like "Remove any existing output directories
created by this conversion workflow." Ensure you update the docstring text in
the ConversionTestPathConfig.clear_test_outputs method so it accurately reflects
the class purpose.
- Around line 386-398: The __post_init__ of ConversionTestPathConfig fails to
validate the provided logs_source_dir which leads to later subprocess failures;
update __post_init__ to retrieve integration_test_path_config.logs_source_dir,
call validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
- Around line 318-319: The field num_log_events currently allows negative
values; add a validation on construction in the class that declares
num_log_events to ensure it is either None or a non-negative integer. If this is
a dataclass, implement a __post_init__ that raises ValueError when
self.num_log_events is not None and self.num_log_events < 0; if it’s a pydantic
model add a `@validator` for "num_log_events" performing the same check. Update
any callers/tests expecting errors accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 456b879a-da17-478e-8eab-cfc25919e21a
📒 Files selected for processing (3)
integration-tests/tests/fixtures/integration_test_logs.pyintegration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/config.py
| #: Optional number of log events in the downloaded logs. | ||
| num_log_events: int | None = None |
There was a problem hiding this comment.
Validate num_log_events bounds at construction.
num_log_events currently accepts negative values, which can produce invalid expectations in event-count assertions downstream.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set tarball and extraction paths for integration test logs."""
+ if self.num_log_events is not None and self.num_log_events < 0:
+ err_msg = "`num_log_events` cannot be negative."
+ raise ValueError(err_msg)
+
name = self.name.strip()
if 0 == len(name):
err_msg = "`name` cannot be empty."
raise ValueError(err_msg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/utils/config.py` around lines 318 - 319, The field
num_log_events currently allows negative values; add a validation on
construction in the class that declares num_log_events to ensure it is either
None or a non-negative integer. If this is a dataclass, implement a
__post_init__ that raises ValueError when self.num_log_events is not None and
self.num_log_events < 0; if it’s a pydantic model add a `@validator` for
"num_log_events" performing the same check. Update any callers/tests expecting
errors accordingly.
| def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: | ||
| """Initialize and set required directory paths for conversion tests.""" | ||
| test_name = self.test_name.strip() | ||
| if 0 == len(test_name): | ||
| err_msg = "`test_name` cannot be empty." | ||
| raise ValueError(err_msg) | ||
| test_root_dir = integration_test_path_config.test_root_dir | ||
| validate_dir_exists(test_root_dir) | ||
|
|
||
| object.__setattr__(self, "test_name", test_name) | ||
| object.__setattr__(self, "conversion_dir", test_root_dir / f"{test_name}-converted") | ||
| object.__setattr__(self, "compression_dir", test_root_dir / f"{test_name}-archives") | ||
|
|
There was a problem hiding this comment.
Add existence validation for logs_source_dir.
ConversionTestPathConfig stores logs_source_dir but does not validate it. Failing fast here gives clearer diagnostics than letting subprocess calls fail later.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None:
"""Initialize and set required directory paths for conversion tests."""
test_name = self.test_name.strip()
if 0 == len(test_name):
err_msg = "`test_name` cannot be empty."
raise ValueError(err_msg)
+ validate_dir_exists(self.logs_source_dir)
test_root_dir = integration_test_path_config.test_root_dir
validate_dir_exists(test_root_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration-tests/tests/utils/config.py` around lines 386 - 398, The
__post_init__ of ConversionTestPathConfig fails to validate the provided
logs_source_dir which leads to later subprocess failures; update __post_init__
to retrieve integration_test_path_config.logs_source_dir, call
validate_dir_exists on it (like test_root_dir), and then set the instance
attribute (object.__setattr__(self, "logs_source_dir", logs_source_dir)) so the
logs directory existence is checked early.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
integration-tests/tests/utils/config.py (2)
386-394:⚠️ Potential issue | 🟡 MinorFail fast by validating
logs_source_direxistence inConversionTestPathConfig.Without this check, failures are deferred to subprocess calls with less actionable errors.
Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set required directory paths for conversion tests.""" test_name = self.test_name.strip() if 0 == len(test_name): err_msg = "`test_name` cannot be empty." raise ValueError(err_msg) + validate_dir_exists(self.logs_source_dir) test_root_dir = integration_test_path_config.test_root_dir validate_dir_exists(test_root_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 386 - 394, In ConversionTestPathConfig.__post_init__ (the method currently validating test_name and test_root_dir), also validate that self.logs_source_dir (or the logs_source_dir obtained from the provided integration_test_path_config) exists by calling validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing; add the check immediately after validate_dir_exists(test_root_dir) so failures fail fast with an actionable error message referencing logs_source_dir.
318-323:⚠️ Potential issue | 🟡 MinorValidate
num_log_eventsas non-negative at construction.
num_log_eventscurrently accepts negative values, which can make downstream event-count checks invalid.Proposed fix
def __post_init__(self, integration_test_path_config: IntegrationTestPathConfig) -> None: """Initialize and set tarball and extraction paths for integration test logs.""" + if self.num_log_events is not None and self.num_log_events < 0: + err_msg = "`num_log_events` cannot be negative." + raise ValueError(err_msg) + name = self.name.strip() if 0 == len(name): err_msg = "`name` cannot be empty."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/utils/config.py` around lines 318 - 323, The num_log_events attribute currently allows negative values; add a validation in the __post_init__ method to ensure that if self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a clear message referencing num_log_events; update the IntegrationTestPathConfig initialization flow (within __post_init__) to perform this check so invalid negatives are rejected at construction.integration-tests/tests/test_log_converter.py (1)
52-55:⚠️ Potential issue | 🟡 MinorEnsure cleanup always runs, even when conversion/compression fails.
If
_convert_and_compressraises, post-test cleanup is skipped.Proposed fix
- _convert_and_compress(clp_core_path_config, test_paths) - - test_paths.clear_test_outputs() + try: + _convert_and_compress(clp_core_path_config, test_paths) + finally: + test_paths.clear_test_outputs()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/test_log_converter.py` around lines 52 - 55, The test currently calls _convert_and_compress(clp_core_path_config, test_paths) directly so if it raises the subsequent test_paths.clear_test_outputs() is skipped; wrap the conversion/compression call in a try/finally (or use a pytest fixture finalizer) so test_paths.clear_test_outputs() always runs in the finally block; reference the call to _convert_and_compress and the cleanup method test_paths.clear_test_outputs() when applying the change so cleanup is guaranteed even on exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-4: The module docstring currently claims the tests verify
lossless compression and decompression which is inaccurate; update the
module-level docstring in test_log_converter.py to state that the integration
tests verify CLP core log conversion, clp-s compression, and searchable event
counts (not decompression), so the description matches the actual assertions and
scope of the tests.
---
Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-55: The test currently calls
_convert_and_compress(clp_core_path_config, test_paths) directly so if it raises
the subsequent test_paths.clear_test_outputs() is skipped; wrap the
conversion/compression call in a try/finally (or use a pytest fixture finalizer)
so test_paths.clear_test_outputs() always runs in the finally block; reference
the call to _convert_and_compress and the cleanup method
test_paths.clear_test_outputs() when applying the change so cleanup is
guaranteed even on exceptions.
In `@integration-tests/tests/utils/config.py`:
- Around line 386-394: In ConversionTestPathConfig.__post_init__ (the method
currently validating test_name and test_root_dir), also validate that
self.logs_source_dir (or the logs_source_dir obtained from the provided
integration_test_path_config) exists by calling
validate_dir_exists(logs_source_dir) and raise a clear ValueError if missing;
add the check immediately after validate_dir_exists(test_root_dir) so failures
fail fast with an actionable error message referencing logs_source_dir.
- Around line 318-323: The num_log_events attribute currently allows negative
values; add a validation in the __post_init__ method to ensure that if
self.num_log_events is not None it is >= 0, otherwise raise a ValueError with a
clear message referencing num_log_events; update the IntegrationTestPathConfig
initialization flow (within __post_init__) to perform this check so invalid
negatives are rejected at construction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b24d69b9-5ffd-43c5-9389-f1cdf9a79cd4
📒 Files selected for processing (2)
integration-tests/tests/test_log_converter.pyintegration-tests/tests/utils/config.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
integration-tests/tests/test_log_converter.py (1)
52-54:⚠️ Potential issue | 🟡 MinorEnsure output cleanup always runs when conversion/compression fails.
If
_convert_and_compress(...)throws, Line 54 is skipped and test artifacts can leak across runs. Wrap the call intry/finally.Proposed fix
- _convert_and_compress(clp_core_path_config, test_paths) - - test_paths.clear_test_outputs() + try: + _convert_and_compress(clp_core_path_config, test_paths) + finally: + test_paths.clear_test_outputs()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration-tests/tests/test_log_converter.py` around lines 52 - 54, The call to _convert_and_compress can raise and currently prevents test_paths.clear_test_outputs() from running; wrap the invocation of _convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that test_paths.clear_test_outputs() is executed in the finally block regardless of exceptions, ensuring test artifacts are always cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 52-54: The call to _convert_and_compress can raise and currently
prevents test_paths.clear_test_outputs() from running; wrap the invocation of
_convert_and_compress(clp_core_path_config, test_paths) in a try/finally so that
test_paths.clear_test_outputs() is executed in the finally block regardless of
exceptions, ensuring test artifacts are always cleaned up.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 850ce341-e0b2-4956-9639-b9f0fcc4d884
📒 Files selected for processing (2)
components/core/src/clp_s/log_converter/LogConverter.cppintegration-tests/tests/test_log_converter.py
LinZhihao-723
left a comment
There was a problem hiding this comment.
The regex changes lgtm. Need someone else familiar with integration tests to review the testing code changes.
| R"(((( UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|)" | ||
| R"((( [\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})|(( Z){0,1})|)" | ||
| R"(((UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1})){0,1}))" |
There was a problem hiding this comment.
This looks a bit convoluted, but I can't think of a better way to simplify it. I hope we can leverage log-surgeon to simplify this a bit in the future, lol
junhaoliao
left a comment
There was a problem hiding this comment.
the rest of the PR lgtm and i don't have more comments. Let's invite @quinntaylormitchell to take a look at whether the changes are inline with the overall integration test suite design
| name = "simple_unstructured" | ||
| integration_test_logs = IntegrationTestLogs( | ||
| name=name, | ||
| tarball_url=f"{name}.tar.gz", |
There was a problem hiding this comment.
IntegrationTestLogs was originally designed only for downloaded logs, and now also wraps generated ones. Since tarball_url is never read by any consumer, do you think we can make it optional with a default of "" (then simple_unstructured doesn't need to provide a dummy value here)?
There was a problem hiding this comment.
(just a note) if so, we shall also update the docstring of IntegrationTestLogs because it does mention "download" at the moment.
There was a problem hiding this comment.
@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).
| compression_path = str(test_paths.compression_dir) | ||
| run_and_log_subprocess([log_converter_bin_path, src_path, "--output-dir", conversion_path]) | ||
| run_and_log_subprocess( | ||
| [clp_s_bin_path, "c", compression_path, conversion_path, "--timestamp-key", "timestamp"] |
There was a problem hiding this comment.
while it should be fairly obvious to people who have worked with the log-convertor, do you see value in extracting a named constant
# Matching `LogSerializer::cTimestampKey`.
LOG_CONVERTER_OUTPUT_TIMESTAMP_KEY = "timestamp"Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com> Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/test_log_converter.py`:
- Around line 1-14: The test uses json.loads in
integration-tests/tests/test_log_converter.py (see the parameterized test that
parses lines) but never imports the json module; add "import json" to the module
imports so json.loads(...) does not raise NameError and the parameterized test
can proceed to compression/search assertions. Locate the top-level imports in
this file and insert the json import alongside pytest and the tests.utils
imports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4de86338-4882-47db-ac9b-89e1a754e72e
📒 Files selected for processing (2)
components/core/src/clp_s/log_converter/LogConverter.cppintegration-tests/tests/test_log_converter.py
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Note: My review applies only to the files under integration-tests/.
Right now we're making some changes to the testing suite design, but the PRs haven't been merged yet. From the perspective of the new design, there are a number of changes that I would make in this PR, but from the perspective of the currently-implemented design, everything in this PR is appropriate.
Because the core of this PR is a small (and, from the sound of it, important) fix, I think it's appropriate to merge now. I will expand my integration test PR plan to update these log-ingestor tests when the redesign is fully implemented.
With all the above in mind, LGTM.
| name = "simple_unstructured" | ||
| integration_test_logs = IntegrationTestLogs( | ||
| name=name, | ||
| tarball_url=f"{name}.tar.gz", |
There was a problem hiding this comment.
@junhaoliao You're right, IntegrationTestLogs was originally designed only for downloaded logs. This constraint won't be present with the IntegrationTestDataset class introduced by #2181. For now, I'm fine with how this is used, and I'll change it once #2181 is merged (IntegrationTestLogs will be deprecated and removed).
Description
This PR fixes an issue with the timezone regex used by log-converter where " " on its own was accepted as a valid timezone. This leads to
log-converterparsing timestamps without a timezone as the timestamp plus a trailing space. Since the new timestamp parser in clp-s is strict (in order to provide lossless compression), clp-s ends up rejecting the timestamp since the trailing space prevents it from matching any known timestamp pattern.Medium-term, once log-surgeon has stabilized a bit more, we can solve this problem by writing some utilities that convert clp-s timestamp patterns into log-surgeon regex. That way, we eliminate the possibility for log-converter to parse timestamps in a way that clp-s can not interpret.
For now, the fix is to update the timezone regex in
log-converterto only accept space as part of the timezone if it is followed by at least some timezone content.This PR also adds an integration test to validate to give us some basic assurances that the log conversion flow is working as expected.
Checklist
breaking change.
Validation performed
log-convertercould convert, andclp-scould subsequently compress and search the following unstructured log file, and added this test as a new integration test:Summary by CodeRabbit
Bug Fixes
Tests
Chores