Skip to content

feat(clp-s::timestamp_parser): Add support for specifying locale-dependent weekday and month names.#1708

Merged
gibber9809 merged 13 commits intoy-scope:mainfrom
gibber9809:timestamp-parser-locale-support
Dec 3, 2025
Merged

feat(clp-s::timestamp_parser): Add support for specifying locale-dependent weekday and month names.#1708
gibber9809 merged 13 commits intoy-scope:mainfrom
gibber9809:timestamp-parser-locale-support

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Dec 1, 2025

Description

This PR removes the \a and \B patterns, replacing them with \A{...} and \B{...}, in order to offer support for locale-dependent weekday and month names. \A{...} accepts a list of seven comma-separated weekday names starting with Sunday and ending with Saturday. \B{...} accepts a list of twelve comma-separated month names starting with January and ending with December.

E.g.,

  • \B{January,February,March,April,May,June,July,August,September,October,November,December}
  • \B{Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec}
  • \A{Sun,Mon,Tue,Wed,Thu,Fri,Sat}
  • \A{Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday}

This PR follows up on some discussion while wrapping up #1673 about how we should make the new timestamp pattern format locale-independent before including it in the archive format. We will also make one more PR that prevents users from including control characters or JSON escape sequences in their timestamp patterns, in order to remove several edge cases.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added tests verifying that \A and \B sequences accept valid content
  • Modified end-to-end parsing and marshalling tests to use \A and \B patterns.

Summary by CodeRabbit

  • New Features

    • Bracket-style month/day specifiers (e.g., B{...}, A{...}) for customizable month and weekday names, improving date-time formatting and parsing flexibility.
  • Refactor

    • Replaced fixed name tables with dynamic, pattern-driven name resolution and enriched pattern metadata to drive parsing and marshalling.
  • Bug Fixes

    • Improved validation and clearer error reporting for invalid bracket patterns and mismatches.
  • Tests

    • Expanded tests to cover string-based specifiers and bracket-pattern parsing/formatting.

✏️ Tip: You can customize this high-level summary in your review settings.

@gibber9809 gibber9809 requested a review from a team as a code owner December 1, 2025 20:17
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

Replaced static month/weekday name tables with bracket-pattern parsing: added bracket-pattern extraction and matching overloads, extended TimestampPattern with offsets/lengths and bracket lengths, refactored parsing/marshalling to use pattern-driven month/weekday lookup, and updated tests to use string-based specifiers containing name lists.

Changes

Cohort / File(s) Summary
TimestampParser header
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
Extended TimestampPattern constructor to accept and store bracket-pattern metadata (month_name_offsets_and_lengths, weekday_name_offsets_and_lengths, month_name_bracket_pattern_length, weekday_name_bracket_pattern_length). Added public methods for bracket-based resolution: find_first_matching_month_and_advance_pattern_idx, find_first_matching_weekday_and_advance_pattern_idx, get_month_and_advance_pattern_idx, get_weekday_and_advance_pattern_idx. Added <cstdint> and private members; updated docs for \B{...} and \A{...}.
TimestampParser implementation
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
Added cDaysInWeek and cMonthsInYear; removed static month/day name arrays. Implemented extract_bracket_pattern_list() and a new overload of find_first_matching_prefix() that accepts candidate-substring metadata. Populated TimestampPattern with offsets/lengths and bracket-pattern lengths. Refactored marshal_parse, parse_timestamp, and marshal_date_time_timestamp to use bracket-pattern metadata and new helper methods for month/weekday resolution; updated pattern-index advancement and error handling for invalid bracket patterns.
TimestampParser tests
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
Changed assert_specifier_accepts_valid_content signature to accept std::string; updated many tests to use string-based specifiers embedding month/day lists (e.g., "B{Jan,...}", "A{Sun,...}"). Adjusted expected transformed patterns and assertions to reflect bracket-pattern parsing and lookup behaviour.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus on:
    • correctness and edge cases of extract_bracket_pattern_list() (escaping, empty items, whitespace, malformed brackets).
    • correctness of offsets/lengths calculations and safe use of uint16_t (overflow risks for long patterns).
    • semantics and error paths of the new find_first_matching_prefix() overload (matching precedence, partial matches).
    • pattern-index advancement logic in marshal/parse flows to avoid off-by-one or overlap issues.
    • tests updated to ensure they cover both short and full-name variants and error cases.

Possibly related PRs

Suggested reviewers

  • wraymo
  • LinZhihao-723

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: adding support for locale-dependent weekday and month names in the timestamp parser.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (1)

99-99: Use false == instead of ! for negation.

As per coding guidelines, prefer false == <expression> rather than !<expression>.

-    REQUIRE_FALSE(timestamp_pattern_result.has_error());
+    REQUIRE(false == timestamp_pattern_result.has_error());

Note: This applies to multiple REQUIRE_FALSE usages throughout the file. However, since REQUIRE_FALSE is a Catch2 test macro that provides better assertion messages than REQUIRE(false == ...), you may choose to keep it as-is for test readability. Based on coding guidelines, prefer false == <expression>.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b43e8f and 84d3ad6.

📒 Files selected for processing (3)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (13 hunks)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (3 hunks)
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
  • components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (1)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
  • nodiscard (43-43)
  • nodiscard (45-48)
  • nodiscard (50-52)
  • nodiscard (54-54)
  • nodiscard (56-59)
  • nodiscard (61-64)
  • nodiscard (66-68)
  • nodiscard (70-72)
  • pattern (30-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (21)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)

5-5: LGTM!

Addition of <cstdint> is appropriate for the new uint16_t member types.


56-72: LGTM!

The new accessors are well-designed:

  • Return const& for vector members to avoid copies.
  • Use [[nodiscard]] consistently with the existing codebase style.
  • Use uint16_t for bracket pattern lengths which aligns with the size constraints in extract_bracket_pattern_list.

76-93: LGTM!

Constructor properly initializes new members using std::move for vectors and value semantics for uint16_t lengths. The initializer list order matches the member declaration order.


98-101: LGTM!

New private members are appropriately typed and initialized with default values. The uint16_t type for lengths is consistent with the constraints enforced during pattern extraction.


128-132: LGTM!

Documentation correctly describes the new parameterized \B{...} and \A{...} format specifiers, replacing the old \B and \a patterns.

components/core/src/clp_s/timestamp_parser/test/test_TimestampParser.cpp (6)

54-57: LGTM!

The function signature change from char to std::string appropriately supports the new bracket-pattern specifiers while maintaining backward compatibility with single-character specifiers through implicit conversion.

Also applies to: 90-93


143-147: LGTM!

Test cases properly updated to use string-based specifiers for year formats.


163-186: LGTM!

Comprehensive test coverage for both full month names (B{January,...,December}) and abbreviated month names (B{Jan,...,Dec}).


199-223: LGTM!

The day-in-week test correctly includes additional date information to ensure the day of week matches the actual date, as the parser validates this constraint.


470-498: LGTM!

Expected parsing results are correctly updated to use the new bracket-pattern specifiers. The patterns match the timestamp strings and the expected epoch timestamps.


527-546: LGTM!

Timezone-related test cases correctly use the new \B{...} bracket-pattern specifier for month names combined with timezone offsets.

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (10)

58-59: LGTM!

Semantic constants cDaysInWeek and cMonthsInYear improve readability and enable validation of bracket pattern lists.


90-115: LGTM!

Default date-time patterns updated to use the new bracket-pattern specifiers for month and weekday names. The patterns are comprehensive and cover various timestamp formats.


606-612: LGTM!

The 'B' case correctly retrieves month name offsets from the pattern metadata and emits the appropriate month name by extracting it from the raw pattern string.


629-641: LGTM!

The 'A' case correctly handles weekday name emission, mirroring the month name logic.


863-866: LGTM!

New local variables for storing bracket pattern metadata are properly initialized.


886-907: LGTM!

The 'B' case in pattern creation correctly:

  1. Extracts the bracket pattern
  2. Parses the comma-separated month names
  3. Validates the count matches cMonthsInYear (12)
  4. Stores offsets, lengths, and bracket pattern length
  5. Advances pattern_idx appropriately

914-932: LGTM!

The 'A' case mirrors the 'B' case logic correctly, validating against cDaysInWeek (7) instead.


1117-1130: LGTM!

The 'B' parsing case correctly uses the new overload of find_first_matching_prefix with bracket pattern metadata, and properly advances both pattern_idx and timestamp_idx.


1186-1199: LGTM!

The 'A' parsing case correctly handles weekday name parsing using bracket pattern metadata, mirroring the month name logic.


1025-1034: LGTM!

The TimestampPattern construction correctly passes all new metadata fields to the constructor.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking the way to store the month/day name using an offset list lgtm. Had some questions of whether we should wrap it for cleaness, lol. Check the comment for details.

Comment on lines +98 to +101
std::vector<std::pair<uint16_t, uint16_t>> m_month_name_offsets_and_lengths;
std::vector<std::pair<uint16_t, uint16_t>> m_weekday_name_offsets_and_lengths;
uint16_t m_month_name_bracket_pattern_length{};
uint16_t m_weekday_name_bracket_pattern_length{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some design level comment: I might write a class of DateTypeRepresentation to wrap these, and make this class optional and drop m_uses_date_type_representation flag as we can use the optional value to determine whether it's a daty-type representation. This ensures that m_uses_data_type_representation can't be false when the date-related members are provided.
In terms of parsing: I may move find_first_matching_prefix as a member function of TimestampPattern or DateTypeRepresentation as sth like find_matching_month or find_matching_week_day, which wraps the underlying offset list inside the class itself (working on the offset in the parsing function is a bit hard to read imo)
In terms of marshaling: we can add a member method to return a result of month/day string's view given by the offset, which also avoids directly operating on the offset list.
Open to discussion if u think the current way is preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding a find_first_matching_month/find_first_matching_weekday/get_month/get_weekday function to TimestampPattern to make the parsing/marshalling code more readable.

I'm not as sure about the encapsulation inside an optional class -- main reasons being that:

  1. The m_uses_date_type_representation flag is independent from those other date-time timestamp related properties, in the sense that date time timestamps don't necessarily need to have a timezone, month name, etc.
  2. I think it will make the create function/members of TimestampPattern more complicated

Tentatively going to go ahead with just adding the helpers to simplify the parsing/marshalling code, but open to discussing the encapsulation approach more.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (3)

167-171: Const reference applied per previous feedback.

The candidate_substrings parameter is now passed by const reference, avoiding unnecessary copies.


327-339: Implementation correctly uses const reference.

The implementation matches the declaration with const reference parameter.


492-523: Break statement added per previous feedback.

The case ',' now properly has an explicit break statement, and the default case also uses break instead of continue, making the switch behaviour clear.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46c17c0 and e57bbb6.

📒 Files selected for processing (2)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (13 hunks)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
🧠 Learnings (3)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
  • nodiscard (43-43)
  • nodiscard (45-48)
  • nodiscard (50-52)
  • nodiscard (54-54)
  • pattern (30-31)
  • pattern_idx (65-68)
  • pattern_idx (79-82)
  • pattern_idx (94-95)
  • pattern_idx (107-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (15)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (10)

58-59: LGTM!

Constants correctly defined for the number of days in a week and months in a year, used for validating bracket pattern lists.


90-115: Updated default patterns look correct.

The patterns now properly use the new bracket-pattern syntax \B{...} and \A{...} with comma-separated month and weekday names.


607-614: Month marshalling correctly delegates to pattern helper.

The implementation properly uses get_month_and_advance_pattern_idx and calculates the 0-based month index from the 1-based date library value.


630-640: Weekday marshalling correctly delegates to pattern helper.

The weekday index calculation using date::Sunday as the reference correctly produces a 0-6 range matching the Sunday-first ordering in the bracket pattern.


886-906: Month bracket pattern extraction looks correct.

The validation for exactly 12 month entries and proper pattern index advancement are correctly implemented. The offset extraction logic properly handles the {content} structure.


912-930: Weekday bracket pattern extraction looks correct.

The validation for exactly 7 weekday entries and proper pattern index advancement mirror the month handling appropriately.


1035-1061: LGTM - Proper pattern index advancement.

The +2ULL offset correctly skips the specifier character and opening brace, and the advancement by m_*_bracket_pattern_length + 2ULL properly accounts for the content plus braces.


1063-1090: LGTM - Bounds checking and string view extraction.

The bounds checks on month_idx and weekday_idx prevent out-of-bounds access, and the returned string_view correctly references the pattern substring.


1172-1181: Parsing case 'B' correctly uses the new helper.

The month index is properly converted to 1-based parsed_month and the timestamp index is advanced by the matched month name length.


1237-1246: Parsing case 'A' correctly uses the new helper.

The weekday index is stored for later validation against the computed date, maintaining the Sunday=0 convention.

components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (5)

5-5: LGTM - necessary include for uint16_t.


56-68: Documentation and declaration look correct.

The method properly documents the expected input (pattern_idx pointing at 'B' in \B{}) and return value (month index and length pair).


110-129: Constructor properly initializes new members.

The constructor signature and initializer list correctly handle the new offset/length vectors and bracket pattern length fields using move semantics for vectors.


134-137: LGTM - new private members appropriately typed.

Using uint16_t for offsets and lengths is appropriate given the validation in extract_bracket_pattern_list that ensures content fits within these bounds.


164-172: Documentation correctly reflects new bracket-pattern syntax.

The format specifier documentation properly describes the \B{January,...,December} and \A{Sunday,...,Saturday} patterns with ordering and count requirements.

Comment on lines +56 to +64
/**
* Finds the first matching month as a prefix of `timestamp`, and advances `pattern_idx` to the
* closing bracket of the month name format specifier if found.
* @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format
* specifier.
* @return A result containing the index and length of the first matching month, or an error
* code indicating the failure:
* - ErrorCodeEnum::IncompatibleTimestampPattern if no matching month is found.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
/**
* Finds the first matching month as a prefix of `timestamp`, and advances `pattern_idx` to the
* closing bracket of the month name format specifier if found.
* @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format
* specifier.
* @return A result containing the index and length of the first matching month, or an error
* code indicating the failure:
* - ErrorCodeEnum::IncompatibleTimestampPattern if no matching month is found.
*/
/**
* Finds the first matching month as a prefix of `timestamp`.
* @param timestamp
* @param pattern_idx An index into the raw pattern starting at the "B" in a `\B{}` format
* specifier. On success, it returns the index of the closing bracket in the raw pattern.
* @return A result containing the index and length of the first matching month, or an error
* code indicating the failure:
* - Forwards `find_first_matching_prefix`'s return values on failure.
*/
  • @param timestamp is missing.
    • Notice that we should order pattern_idx after timestamp, as it's an output parameter. I think this is a doxygen convention that we follow.
  • I move the advaces ... part to pattern_idx.
  • The error documentation should go with Forwards xxx. It should still be fine even the function is private.

U can apply it to other methods if it looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might do a little bit of rewording, but looks good to me. I'll update the docstrings.

Comment on lines +134 to +135
std::vector<std::pair<uint16_t, uint16_t>> m_month_name_offsets_and_lengths;
std::vector<std::pair<uint16_t, uint16_t>> m_weekday_name_offsets_and_lengths;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the helpers, it might be worth to store these as a vector of string views for simplicity. It also means we don't need to maintain two versions of find_first_matching_prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that a vector of string views breaks whenever the TimestampPattern is moved/copied since the address of the underlying string changes (can change even with move because of the small string optimization). Simpler to just store the offsets and derive the string views when needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh true I forgot that. My bad.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (1)

97-108: Documentation typo: month_idx should be weekday_idx.

Line 104 states "IncompatibleTimestampPattern if month_idx is out of bounds" but the function parameter is weekday_idx.

      * @param weekday_idx An index into the weekdays in a `\A{}` format specifier.
      * @return A result containing the weekday, or an error code indicating the failure:
-     * - ErrorCodeEnum::IncompatibleTimestampPattern if `month_idx` is out of bounds.
+     * - ErrorCodeEnum::IncompatibleTimestampPattern if `weekday_idx` is out of bounds.
      */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a3e2aa and 16b86c0.

📒 Files selected for processing (2)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (13 hunks)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (6)
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2025-05-02T22:27:59.347Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-11-12T18:56:31.067Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.hpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (2)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (9)
  • nodiscard (43-43)
  • nodiscard (45-48)
  • nodiscard (50-52)
  • nodiscard (54-54)
  • month_idx (94-95)
  • pattern (30-31)
  • weekday_idx (107-108)
  • timestamp (65-68)
  • timestamp (79-82)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (1)
  • static_cast (243-243)
🔇 Additional comments (21)
components/core/src/clp_s/timestamp_parser/TimestampParser.hpp (7)

5-5: LGTM!

The addition of <cstdint> is appropriate for the new uint16_t types used in the member variables.


56-68: LGTM!

The documentation and signature for find_first_matching_month_and_advance_pattern_idx are well-structured. The docstring clarifies the parameter ordering and return semantics as discussed in past reviews.


70-82: LGTM!

The documentation and signature for find_first_matching_weekday_and_advance_pattern_idx follow the same consistent pattern as the month variant.


84-95: LGTM!

The get_month_and_advance_pattern_idx function is well-documented with clear error semantics.


112-129: LGTM!

The expanded constructor signature cleanly incorporates the new bracket-pattern metadata. Moving the parameters via std::move is appropriate for the vectors.


134-137: LGTM!

The new member variables appropriately store the bracket-pattern metadata. Storing offsets/lengths rather than std::string_view avoids lifetime issues after move/copy operations, as discussed in past reviews.


164-172: LGTM!

The updated documentation for \B{...} and \A{...} format specifiers clearly describes the expected format and ordering requirements.

components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (14)

58-59: LGTM!

The new constants cDaysInWeek and cMonthsInYear are appropriately defined and used for validation in the pattern creation logic.


90-115: LGTM!

The default date-time patterns have been updated to use the new bracket-pattern format (\B{...} and \A{...}). The patterns cover various locale-dependent month and weekday name formats.


158-171: LGTM!

The new overload declaration for find_first_matching_prefix correctly takes candidate_substrings by const reference, addressing the past review comment.


327-339: LGTM!

The implementation correctly iterates through candidate substrings and matches against the input string prefix. The const reference parameter addresses the past review comment.


492-523: LGTM!

The extract_bracket_pattern_list implementation correctly:

  • Validates the string length fits in uint16_t
  • Rejects invalid characters (' ' and '\\')
  • Rejects empty elements
  • Adds explicit break after the comma case (addressing past review comment)

607-613: LGTM!

The month marshalling case correctly uses the new get_month_and_advance_pattern_idx helper to retrieve the month name string, making the code more readable as discussed in past design comments.


630-641: LGTM!

The weekday marshalling case correctly uses the new get_weekday_and_advance_pattern_idx helper, following the same pattern as the month case.


861-906: LGTM!

The case 'B' handling in TimestampPattern::create correctly:

  • Extracts the bracket pattern
  • Parses the month names list
  • Validates exactly 12 entries exist
  • Stores the bracket pattern length and advances the pattern index

912-930: LGTM!

The case 'A' handling follows the same correct pattern as case 'B', validating exactly 7 weekday entries.


1023-1032: LGTM!

The TimestampPattern constructor call correctly passes the new metadata fields.


1035-1047: LGTM!

The find_first_matching_month_and_advance_pattern_idx implementation correctly extracts the month names substring from the pattern and delegates to find_first_matching_prefix. The + 2ULL offset accounts for the B{ prefix.


1049-1061: LGTM!

The find_first_matching_weekday_and_advance_pattern_idx implementation follows the same correct pattern as the month variant.


1172-1181: LGTM!

The month parsing case correctly uses the new helper and extracts both the month index and length for timestamp advancement.


1237-1246: LGTM!

The weekday parsing case correctly uses the new helper and stores the day-of-week index for later validation.

LinZhihao-723
LinZhihao-723 previously approved these changes Dec 2, 2025
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nit comment otherwise lgtm.
The PR title also lgtm.

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (6)

158-171: LGTM!

The function declaration correctly takes candidate_substrings by const reference, which avoids unnecessary copying. The past review comment suggesting this change has been addressed.


327-339: LGTM!

The implementation correctly takes candidate_substrings by const reference and efficiently matches prefixes using starts_with. The past review comment has been addressed.


492-523: LGTM!

The implementation correctly parses comma-separated lists from bracket patterns with appropriate validation:

  • Rejects backslash and space characters in entries
  • Rejects empty entries
  • Ensures no trailing commas
  • Validates size constraints

The missing break statement flagged in the past review has been addressed.


228-239: Fix typos in docstring.

The docstring contains two typos on line 231: "results" should be "result" and "offets" should be "offsets".

Apply this diff:

- * @return A results containing the offets and lengths of every element in the list, or an error
+ * @return A result containing the offsets and lengths of every element in the list, or an error

1061-1073: Consider using operator[] instead of .at() to avoid redundant bounds check.

Line 1070 uses .at(month_idx) after explicitly validating month_idx is in bounds at line 1064. Using operator[] would be consistent with the bounds check and avoid redundant checking.

Apply this diff:

     auto const month_names{
             m_pattern.substr(pattern_idx + 2ULL, m_month_name_bracket_pattern_length)
     };
-    auto const [month_offset, month_length] = m_month_name_offsets_and_lengths.at(month_idx);
+    auto const [month_offset, month_length] = m_month_name_offsets_and_lengths[month_idx];
     pattern_idx += m_month_name_bracket_pattern_length + 2ULL;
     return month_names.substr(month_offset, month_length);

1075-1088: Consider using operator[] instead of .at() to avoid redundant bounds check.

Lines 1084-1085 use .at(weekday_idx) after explicitly validating weekday_idx is in bounds at line 1078. Using operator[] would be consistent with the bounds check and avoid redundant checking.

Apply this diff:

     auto const weekday_names{
             m_pattern.substr(pattern_idx + 2ULL, m_weekday_name_bracket_pattern_length)
     };
     auto const [weekday_offset, weekday_length]
-            = m_weekday_name_offsets_and_lengths.at(weekday_idx);
+            = m_weekday_name_offsets_and_lengths[weekday_idx];
     pattern_idx += m_weekday_name_bracket_pattern_length + 2ULL;
     return weekday_names.substr(weekday_offset, weekday_length);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a451fc and dc90ebf.

📒 Files selected for processing (1)
  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (13 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
🧠 Learnings (3)
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2024-10-10T05:46:35.188Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:299-307
Timestamp: 2024-10-10T05:46:35.188Z
Learning: In the C++ function `get_schema_subtree_bitmap` in `KeyValuePairLogEvent.cpp`, when a loop uses `while (true)` with an internal check on `optional.has_value()`, and comments explain that this structure is used to silence `clang-tidy` warnings about unchecked optional access, this code is acceptable and should not be refactored to use `while (optional.has_value())`.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp_s/timestamp_parser/TimestampParser.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (9)
components/core/src/clp_s/timestamp_parser/TimestampParser.cpp (9)

58-59: LGTM!

The constants cDaysInWeek and cMonthsInYear are appropriately defined and will be used for validation of bracket pattern lists.


90-114: LGTM!

The default patterns have been correctly updated to use the new bracket-pattern syntax for locale-dependent month and weekday names. All lists contain the correct number of entries (12 months, 7 weekdays).


607-612: LGTM!

The month name marshalling correctly uses the new pattern-driven approach with the TRYX macro directly in the append call, which is cleaner than using a temporary variable.


629-638: LGTM!

The weekday name marshalling correctly follows the same pattern-driven approach as the month case, using the TRYX macro directly in the append call.


859-862: LGTM!

The new member variables appropriately store bracket pattern metadata with suitable types. The uint16_t length fields align with the size validation in extract_bracket_pattern_list.


884-904: LGTM!

The bracket pattern extraction and validation for month names is correctly implemented:

  • Properly strips brackets from the pattern
  • Validates that exactly 12 month names are provided
  • Stores metadata for later use during parsing/marshalling

910-928: LGTM!

The bracket pattern extraction and validation for weekday names follows the same correct pattern as the month case, validating that exactly 7 weekday names are provided.


1170-1179: LGTM!

The month name parsing correctly uses the new pattern-driven approach, properly converting the 0-based index to a 1-based month value and advancing the timestamp index by the matched length.


1235-1244: LGTM!

The weekday name parsing correctly uses the new pattern-driven approach, storing the weekday index for later validation against the computed date.

@gibber9809 gibber9809 merged commit e2f1754 into y-scope:main Dec 3, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants