feat(clp-s::timestamp_parser): Add support for specifying locale-dependent weekday and month names.#1708
Conversation
WalkthroughReplaced static month/weekday name tables with bracket-pattern parsing: added bracket-pattern extraction and matching overloads, extended TimestampPattern with offsets/lengths and bracket lengths, refactored parsing/marshalling to use pattern-driven month/weekday lookup, and updated tests to use string-based specifiers containing name lists. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (1)
99-99: Usefalse ==instead of!for negation.As per coding guidelines, prefer
false == <expression>rather than!<expression>.- REQUIRE_FALSE(timestamp_pattern_result.has_error()); + REQUIRE(false == timestamp_pattern_result.has_error());Note: This applies to multiple
REQUIRE_FALSEusages throughout the file. However, sinceREQUIRE_FALSEis a Catch2 test macro that provides better assertion messages thanREQUIRE(false == ...), you may choose to keep it as-is for test readability. Based on coding guidelines, preferfalse == <expression>.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(13 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hppcomponents/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (1)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
nodiscard(43-43)nodiscard(45-48)nodiscard(50-52)nodiscard(54-54)nodiscard(56-59)nodiscard(61-64)nodiscard(66-68)nodiscard(70-72)pattern(30-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (21)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)
5-5: LGTM!Addition of
<cstdint>is appropriate for the newuint16_tmember types.
56-72: LGTM!The new accessors are well-designed:
- Return
const&for vector members to avoid copies.- Use
[[nodiscard]]consistently with the existing codebase style.- Use
uint16_tfor bracket pattern lengths which aligns with the size constraints inextract_bracket_pattern_list.
76-93: LGTM!Constructor properly initializes new members using
std::movefor vectors and value semantics foruint16_tlengths. The initializer list order matches the member declaration order.
98-101: LGTM!New private members are appropriately typed and initialized with default values. The
uint16_ttype for lengths is consistent with the constraints enforced during pattern extraction.
128-132: LGTM!Documentation correctly describes the new parameterized
\B{...}and\A{...}format specifiers, replacing the old\Band\apatterns.components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6)
54-57: LGTM!The function signature change from
chartostd::stringappropriately supports the new bracket-pattern specifiers while maintaining backward compatibility with single-character specifiers through implicit conversion.Also applies to: 90-93
143-147: LGTM!Test cases properly updated to use string-based specifiers for year formats.
163-186: LGTM!Comprehensive test coverage for both full month names (
B{January,...,December}) and abbreviated month names (B{Jan,...,Dec}).
199-223: LGTM!The day-in-week test correctly includes additional date information to ensure the day of week matches the actual date, as the parser validates this constraint.
470-498: LGTM!Expected parsing results are correctly updated to use the new bracket-pattern specifiers. The patterns match the timestamp strings and the expected epoch timestamps.
527-546: LGTM!Timezone-related test cases correctly use the new
\B{...}bracket-pattern specifier for month names combined with timezone offsets.components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (10)
58-59: LGTM!Semantic constants
cDaysInWeekandcMonthsInYearimprove readability and enable validation of bracket pattern lists.
90-115: LGTM!Default date-time patterns updated to use the new bracket-pattern specifiers for month and weekday names. The patterns are comprehensive and cover various timestamp formats.
606-612: LGTM!The
'B'case correctly retrieves month name offsets from the pattern metadata and emits the appropriate month name by extracting it from the raw pattern string.
629-641: LGTM!The
'A'case correctly handles weekday name emission, mirroring the month name logic.
863-866: LGTM!New local variables for storing bracket pattern metadata are properly initialized.
886-907: LGTM!The
'B'case in pattern creation correctly:
- Extracts the bracket pattern
- Parses the comma-separated month names
- Validates the count matches
cMonthsInYear(12)- Stores offsets, lengths, and bracket pattern length
- Advances
pattern_idxappropriately
914-932: LGTM!The
'A'case mirrors the'B'case logic correctly, validating againstcDaysInWeek(7) instead.
1117-1130: LGTM!The
'B'parsing case correctly uses the new overload offind_first_matching_prefixwith bracket pattern metadata, and properly advances bothpattern_idxandtimestamp_idx.
1186-1199: LGTM!The
'A'parsing case correctly handles weekday name parsing using bracket pattern metadata, mirroring the month name logic.
1025-1034: LGTM!The
TimestampPatternconstruction correctly passes all new metadata fields to the constructor.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Generally speaking the way to store the month/day name using an offset list lgtm. Had some questions of whether we should wrap it for cleaness, lol. Check the comment for details.
| std::vector<std::pair<uint16_t, uint16_t>> m_month_name_offsets_and_lengths; | ||
| std::vector<std::pair<uint16_t, uint16_t>> m_weekday_name_offsets_and_lengths; | ||
| uint16_t m_month_name_bracket_pattern_length{}; | ||
| uint16_t m_weekday_name_bracket_pattern_length{}; |
There was a problem hiding this comment.
Some design level comment: I might write a class of DateTypeRepresentation to wrap these, and make this class optional and drop m_uses_date_type_representation flag as we can use the optional value to determine whether it's a daty-type representation. This ensures that m_uses_data_type_representation can't be false when the date-related members are provided.
In terms of parsing: I may move find_first_matching_prefix as a member function of TimestampPattern or DateTypeRepresentation as sth like find_matching_month or find_matching_week_day, which wraps the underlying offset list inside the class itself (working on the offset in the parsing function is a bit hard to read imo)
In terms of marshaling: we can add a member method to return a result of month/day string's view given by the offset, which also avoids directly operating on the offset list.
Open to discussion if u think the current way is preferred.
There was a problem hiding this comment.
I like the idea of adding a find_first_matching_month/find_first_matching_weekday/get_month/get_weekday function to TimestampPattern to make the parsing/marshalling code more readable.
I'm not as sure about the encapsulation inside an optional class -- main reasons being that:
- The
m_uses_date_type_representationflag is independent from those other date-time timestamp related properties, in the sense that date time timestamps don't necessarily need to have a timezone, month name, etc. - I think it will make the create function/members of
TimestampPatternmore complicated
Tentatively going to go ahead with just adding the helpers to simplify the parsing/marshalling code, but open to discussing the encapsulation approach more.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (3)
167-171: Const reference applied per previous feedback.The
candidate_substringsparameter is now passed by const reference, avoiding unnecessary copies.
327-339: Implementation correctly uses const reference.The implementation matches the declaration with const reference parameter.
492-523: Break statement added per previous feedback.The
case ','now properly has an explicitbreakstatement, and thedefaultcase also usesbreakinstead ofcontinue, making the switch behaviour clear.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(13 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (3)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
nodiscard(43-43)nodiscard(45-48)nodiscard(50-52)nodiscard(54-54)pattern(30-31)pattern_idx(65-68)pattern_idx(79-82)pattern_idx(94-95)pattern_idx(107-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (15)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (10)
58-59: LGTM!Constants correctly defined for the number of days in a week and months in a year, used for validating bracket pattern lists.
90-115: Updated default patterns look correct.The patterns now properly use the new bracket-pattern syntax
\B{...}and\A{...}with comma-separated month and weekday names.
607-614: Month marshalling correctly delegates to pattern helper.The implementation properly uses
get_month_and_advance_pattern_idxand calculates the 0-based month index from the 1-based date library value.
630-640: Weekday marshalling correctly delegates to pattern helper.The weekday index calculation using
date::Sundayas the reference correctly produces a 0-6 range matching the Sunday-first ordering in the bracket pattern.
886-906: Month bracket pattern extraction looks correct.The validation for exactly 12 month entries and proper pattern index advancement are correctly implemented. The offset extraction logic properly handles the
{content}structure.
912-930: Weekday bracket pattern extraction looks correct.The validation for exactly 7 weekday entries and proper pattern index advancement mirror the month handling appropriately.
1035-1061: LGTM - Proper pattern index advancement.The
+2ULLoffset correctly skips the specifier character and opening brace, and the advancement bym_*_bracket_pattern_length + 2ULLproperly accounts for the content plus braces.
1063-1090: LGTM - Bounds checking and string view extraction.The bounds checks on
month_idxandweekday_idxprevent out-of-bounds access, and the returnedstring_viewcorrectly references the pattern substring.
1172-1181: Parsing case 'B' correctly uses the new helper.The month index is properly converted to 1-based
parsed_monthand the timestamp index is advanced by the matched month name length.
1237-1246: Parsing case 'A' correctly uses the new helper.The weekday index is stored for later validation against the computed date, maintaining the Sunday=0 convention.
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)
5-5: LGTM - necessary include foruint16_t.
56-68: Documentation and declaration look correct.The method properly documents the expected input (pattern_idx pointing at 'B' in
\B{}) and return value (month index and length pair).
110-129: Constructor properly initializes new members.The constructor signature and initializer list correctly handle the new offset/length vectors and bracket pattern length fields using move semantics for vectors.
134-137: LGTM - new private members appropriately typed.Using
uint16_tfor offsets and lengths is appropriate given the validation inextract_bracket_pattern_listthat ensures content fits within these bounds.
164-172: Documentation correctly reflects new bracket-pattern syntax.The format specifier documentation properly describes the
\B{January,...,December}and\A{Sunday,...,Saturday}patterns with ordering and count requirements.
| /** | ||
| * Finds the first matching month as a prefix of `timestamp`, and advances `pattern_idx` to the | ||
| * closing bracket of the month name format specifier if found. | ||
| * @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format | ||
| * specifier. | ||
| * @return A result containing the index and length of the first matching month, or an error | ||
| * code indicating the failure: | ||
| * - ErrorCodeEnum::IncompatibleTimestampPattern if no matching month is found. | ||
| */ |
There was a problem hiding this comment.
How about:
| /** | |
| * Finds the first matching month as a prefix of `timestamp`, and advances `pattern_idx` to the | |
| * closing bracket of the month name format specifier if found. | |
| * @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format | |
| * specifier. | |
| * @return A result containing the index and length of the first matching month, or an error | |
| * code indicating the failure: | |
| * - ErrorCodeEnum::IncompatibleTimestampPattern if no matching month is found. | |
| */ | |
| /** | |
| * Finds the first matching month as a prefix of `timestamp`. | |
| * @param timestamp | |
| * @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format | |
| * specifier. On success, it returns the index of the closing bracket in the raw pattern. | |
| * @return A result containing the index and length of the first matching month, or an error | |
| * code indicating the failure: | |
| * - Forwards `find_first_matching_prefix`'s return values on failure. | |
| */ |
@param timestampis missing.- Notice that we should order
pattern_idxaftertimestamp, as it's an output parameter. I think this is a doxygen convention that we follow.
- Notice that we should order
- I move the
advaces ...part topattern_idx. - The error documentation should go with
Forwards xxx. It should still be fine even the function is private.
U can apply it to other methods if it looks good.
There was a problem hiding this comment.
Might do a little bit of rewording, but looks good to me. I'll update the docstrings.
| std::vector<std::pair<uint16_t, uint16_t>> m_month_name_offsets_and_lengths; | ||
| std::vector<std::pair<uint16_t, uint16_t>> m_weekday_name_offsets_and_lengths; |
There was a problem hiding this comment.
With the helpers, it might be worth to store these as a vector of string views for simplicity. It also means we don't need to maintain two versions of find_first_matching_prefix.
There was a problem hiding this comment.
The problem is that a vector of string views breaks whenever the TimestampPattern is moved/copied since the address of the underlying string changes (can change even with move because of the small string optimization). Simpler to just store the offsets and derive the string views when needed.
There was a problem hiding this comment.
oh true I forgot that. My bad.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)
97-108: Documentation typo:month_idxshould beweekday_idx.Line 104 states "IncompatibleTimestampPattern if
month_idxis out of bounds" but the function parameter isweekday_idx.* @param weekday_idx An index into the weekdays in a `\A{}` format specifier. * @return A result containing the weekday, or an error code indicating the failure: - * - ErrorCodeEnum::IncompatibleTimestampPattern if `month_idx` is out of bounds. + * - ErrorCodeEnum::IncompatibleTimestampPattern if `weekday_idx` is out of bounds. */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(13 hunks)components/core/src/clp_s/timestamp_parser/TimestampParser.hpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (6)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hppcomponents/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-11-12T18:56:31.067Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
nodiscard(43-43)nodiscard(45-48)nodiscard(50-52)nodiscard(54-54)month_idx(94-95)pattern(30-31)weekday_idx(107-108)timestamp(65-68)timestamp(79-82)components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (1)
static_cast(243-243)
🔇 Additional comments (21)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (7)
5-5: LGTM!The addition of
<cstdint>is appropriate for the newuint16_ttypes used in the member variables.
56-68: LGTM!The documentation and signature for
find_first_matching_month_and_advance_pattern_idxare well-structured. The docstring clarifies the parameter ordering and return semantics as discussed in past reviews.
70-82: LGTM!The documentation and signature for
find_first_matching_weekday_and_advance_pattern_idxfollow the same consistent pattern as the month variant.
84-95: LGTM!The
get_month_and_advance_pattern_idxfunction is well-documented with clear error semantics.
112-129: LGTM!The expanded constructor signature cleanly incorporates the new bracket-pattern metadata. Moving the parameters via
std::moveis appropriate for the vectors.
134-137: LGTM!The new member variables appropriately store the bracket-pattern metadata. Storing offsets/lengths rather than
std::string_viewavoids lifetime issues after move/copy operations, as discussed in past reviews.
164-172: LGTM!The updated documentation for
\B{...}and\A{...}format specifiers clearly describes the expected format and ordering requirements.components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (14)
58-59: LGTM!The new constants
cDaysInWeekandcMonthsInYearare appropriately defined and used for validation in the pattern creation logic.
90-115: LGTM!The default date-time patterns have been updated to use the new bracket-pattern format (
\B{...}and\A{...}). The patterns cover various locale-dependent month and weekday name formats.
158-171: LGTM!The new overload declaration for
find_first_matching_prefixcorrectly takescandidate_substringsby const reference, addressing the past review comment.
327-339: LGTM!The implementation correctly iterates through candidate substrings and matches against the input string prefix. The const reference parameter addresses the past review comment.
492-523: LGTM!The
extract_bracket_pattern_listimplementation correctly:
- Validates the string length fits in
uint16_t- Rejects invalid characters (
' 'and'\\')- Rejects empty elements
- Adds explicit
breakafter the comma case (addressing past review comment)
607-613: LGTM!The month marshalling case correctly uses the new
get_month_and_advance_pattern_idxhelper to retrieve the month name string, making the code more readable as discussed in past design comments.
630-641: LGTM!The weekday marshalling case correctly uses the new
get_weekday_and_advance_pattern_idxhelper, following the same pattern as the month case.
861-906: LGTM!The
case 'B'handling inTimestampPattern::createcorrectly:
- Extracts the bracket pattern
- Parses the month names list
- Validates exactly 12 entries exist
- Stores the bracket pattern length and advances the pattern index
912-930: LGTM!The
case 'A'handling follows the same correct pattern ascase 'B', validating exactly 7 weekday entries.
1023-1032: LGTM!The
TimestampPatternconstructor call correctly passes the new metadata fields.
1035-1047: LGTM!The
find_first_matching_month_and_advance_pattern_idximplementation correctly extracts the month names substring from the pattern and delegates tofind_first_matching_prefix. The+ 2ULLoffset accounts for theB{prefix.
1049-1061: LGTM!The
find_first_matching_weekday_and_advance_pattern_idximplementation follows the same correct pattern as the month variant.
1172-1181: LGTM!The month parsing case correctly uses the new helper and extracts both the month index and length for timestamp advancement.
1237-1246: LGTM!The weekday parsing case correctly uses the new helper and stores the day-of-week index for later validation.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Two nit comment otherwise lgtm.
The PR title also lgtm.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (6)
158-171: LGTM!The function declaration correctly takes
candidate_substringsby const reference, which avoids unnecessary copying. The past review comment suggesting this change has been addressed.
327-339: LGTM!The implementation correctly takes
candidate_substringsby const reference and efficiently matches prefixes usingstarts_with. The past review comment has been addressed.
492-523: LGTM!The implementation correctly parses comma-separated lists from bracket patterns with appropriate validation:
- Rejects backslash and space characters in entries
- Rejects empty entries
- Ensures no trailing commas
- Validates size constraints
The missing
breakstatement flagged in the past review has been addressed.
228-239: Fix typos in docstring.The docstring contains two typos on line 231: "results" should be "result" and "offets" should be "offsets".
Apply this diff:
- * @return A results containing the offets and lengths of every element in the list, or an error + * @return A result containing the offsets and lengths of every element in the list, or an error
1061-1073: Consider usingoperator[]instead of.at()to avoid redundant bounds check.Line 1070 uses
.at(month_idx)after explicitly validatingmonth_idxis in bounds at line 1064. Usingoperator[]would be consistent with the bounds check and avoid redundant checking.Apply this diff:
auto const month_names{ m_pattern.substr(pattern_idx + 2ULL, m_month_name_bracket_pattern_length) }; - auto const [month_offset, month_length] = m_month_name_offsets_and_lengths.at(month_idx); + auto const [month_offset, month_length] = m_month_name_offsets_and_lengths[month_idx]; pattern_idx += m_month_name_bracket_pattern_length + 2ULL; return month_names.substr(month_offset, month_length);
1075-1088: Consider usingoperator[]instead of.at()to avoid redundant bounds check.Lines 1084-1085 use
.at(weekday_idx)after explicitly validatingweekday_idxis in bounds at line 1078. Usingoperator[]would be consistent with the bounds check and avoid redundant checking.Apply this diff:
auto const weekday_names{ m_pattern.substr(pattern_idx + 2ULL, m_weekday_name_bracket_pattern_length) }; auto const [weekday_offset, weekday_length] - = m_weekday_name_offsets_and_lengths.at(weekday_idx); + = m_weekday_name_offsets_and_lengths[weekday_idx]; pattern_idx += m_weekday_name_bracket_pattern_length + 2ULL; return weekday_names.substr(weekday_offset, weekday_length);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp(13 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (3)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (9)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (9)
58-59: LGTM!The constants
cDaysInWeekandcMonthsInYearare appropriately defined and will be used for validation of bracket pattern lists.
90-114: LGTM!The default patterns have been correctly updated to use the new bracket-pattern syntax for locale-dependent month and weekday names. All lists contain the correct number of entries (12 months, 7 weekdays).
607-612: LGTM!The month name marshalling correctly uses the new pattern-driven approach with the
TRYXmacro directly in the append call, which is cleaner than using a temporary variable.
629-638: LGTM!The weekday name marshalling correctly follows the same pattern-driven approach as the month case, using the
TRYXmacro directly in the append call.
859-862: LGTM!The new member variables appropriately store bracket pattern metadata with suitable types. The
uint16_tlength fields align with the size validation inextract_bracket_pattern_list.
884-904: LGTM!The bracket pattern extraction and validation for month names is correctly implemented:
- Properly strips brackets from the pattern
- Validates that exactly 12 month names are provided
- Stores metadata for later use during parsing/marshalling
910-928: LGTM!The bracket pattern extraction and validation for weekday names follows the same correct pattern as the month case, validating that exactly 7 weekday names are provided.
1170-1179: LGTM!The month name parsing correctly uses the new pattern-driven approach, properly converting the 0-based index to a 1-based month value and advancing the timestamp index by the matched length.
1235-1244: LGTM!The weekday name parsing correctly uses the new pattern-driven approach, storing the weekday index for later validation against the computed date.
Description
This PR removes the
\aand\Bpatterns, replacing them with\A{...}and\B{...}, in order to offer support for locale-dependent weekday and month names.\A{...}accepts a list of seven comma-separated weekday names starting with Sunday and ending with Saturday.\B{...}accepts a list of twelve comma-separated month names starting with January and ending with December.E.g.,
\B{January,February,March,April,May,June,July,August,September,October,November,December}\B{Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec}\A{Sun,Mon,Tue,Wed,Thu,Fri,Sat}\A{Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday}This PR follows up on some discussion while wrapping up #1673 about how we should make the new timestamp pattern format locale-independent before including it in the archive format. We will also make one more PR that prevents users from including control characters or JSON escape sequences in their timestamp patterns, in order to remove several edge cases.
Checklist
breaking change.
Validation performed
\Aand\Bsequences accept valid content\Aand\Bpatterns.Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.