Skip to content

fix(clp-s): Simplify timestamp range index evaluation code; Fix conversion utility used to compare float AST literals to integer values (fixes #1375).#1369

Merged
gibber9809 merged 10 commits intoy-scope:mainfrom
gibber9809:fix-timestamp-index-eval
Oct 3, 2025

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Oct 2, 2025

Description

This PR fixes a bug in the double_as_int utility that the AST code uses to convert floating point literals into integers such that they can be compared against integer values.

Nominally this code is meant to do the following sorts of conversions:

1 < 1.1 -> 1 < 2
1 > 0.9 -> 1 > 0

Where depending on the kind of operation being performed, we may have to take the floor or ceiling of a given floating-point value in order to ensure correct comparison against an integer value.

Unfortunately, this conversion code had a bug causing the returned integer to always be the floor of the floating-point number for all operations besides == and !=.

We discovered this bug while investigating a case where the timestamp range index was incorrectly not being matched for certain queries -- the issue was that for archives with double encoded epoch range [a, b] we always tried to convert a literal c in a query like timestamp < c into an integer before evaluating the timestamp range index, so even if a < c was true the bug in the AST code would end up turning this comparison into a < floor(c) which may not be true.

Without the AST bug the timestamp range index evaluation code would technically be correct, but the way it was written (to allow float and integer literals to be compared with double-encoded and integer-encoded timestamp ranges interchangeably) was unnecessarily complex.

As a result, this PR:

  • Fixes the AST float conversion bug
  • Adds a search unit test covering the float conversion bug
  • Simplifies the timestamp range index evaluation code by always comparing integer-encoded ranges with integer interpretations of literals and double-encoded ranges with double interpretations of literals
  • Adds search unit tests dedicated to timestamp filtering

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added dedicated unit tests for search against the timestamp column, some of which fail before this change
  • Added test case exercising floating-point conversions that fail before this change

Summary by CodeRabbit

  • New Features

    • Improved timestamp search with support for both floating-point and epoch-based formats.
    • Stricter parsing aligned to data encoding for more accurate comparisons and range queries.
  • Bug Fixes

    • Fixed potential mis-evaluation in timestamp comparisons caused by switch fall-through.
    • Standardized handling when timestamp literals don’t match the expected format, yielding consistent results.
  • Tests

    • Added comprehensive tests for float and integer timestamp searches with new datasets.
    • Expanded general search dataset to cover additional scenarios.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners October 2, 2025 16:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 2, 2025

Walkthrough

Unifies timestamp filter evaluation into a single encoding-guarded switch in TimestampEntry, adds a public timestamp encoding accessor, changes search parsing to depend on the entry's encoding (float vs int), fixes switch fall-through in SearchUtils, and adds/initializes timestamp tests and test data for float and epoch timestamps.

Changes

Cohort / File(s) Summary
TimestampEntry refactor
components/core/src/clp_s/TimestampEntry.cpp, components/core/src/clp_s/TimestampEntry.hpp
Consolidates per-encoding branching into one encoding-guarded switch for evaluate_filter; adds inline accessor auto get_timestamp_encoding() const -> TimestampEncoding.
Search timestamp evaluation
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
Parses timestamp literals according to the range entry's encoding: DoubleEpoch → require float literal, Epoch → require integer literal; non-conforming literals return Unknown.
AST switch fall-through fix
components/core/src/clp_s/search/ast/SearchUtils.cpp
Adds explicit break statements to prevent unintended fall-through in double_as_int switch cases (GT/LTE, LT/GTE, default).
Test harness init
components/core/tests/clp_s_test_utils.cpp
Adds #include for clp_s/TimestampPattern.hpp and calls clp_s::TimestampPattern::init() before constructing clp_s::JsonParser.
Tests: new timestamp cases
components/core/tests/test-clp_s-search.cpp
Adds float-timestamp and epoch-timestamp TEST_CASEs; introduces constants for new input files and timestamp key; adds an extra numeric query to an existing test.
Test data additions/changes
components/core/tests/test_log_files/test_search.jsonl, components/core/tests/test_log_files/test_search_float_timestamp.jsonl, components/core/tests/test_log_files/test_search_int_timestamp.jsonl
Appends one record to test_search.jsonl; adds test_search_float_timestamp.jsonl (float timestamps) and test_search_int_timestamp.jsonl (epoch-millisecond timestamps).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as clp-s-search
  participant Engine as Search Engine
  participant Index as TimestampIndex
  participant Entry as TimestampEntry
  participant Parser as Literal Parser

  User->>CLI: submit timestamp query
  CLI->>Engine: execute(query)
  Engine->>Index: evaluate(range, expr)
  Index->>Entry: get_timestamp_encoding()
  alt Entry.encoding == DoubleEpoch
    Index->>Parser: parse literal as float
    Parser-->>Index: float or Unknown
    alt float
      Index->>Entry: evaluate_filter(op, float)
      Entry-->>Index: True/False/Unknown
    else Unknown
      Note over Index: non-float literal → Unknown
    end
  else Entry.encoding == Epoch
    Index->>Parser: parse literal as int
    Parser-->>Index: int or Unknown
    alt int
      Index->>Entry: evaluate_filter(op, int)
      Entry-->>Index: True/False/Unknown
    else Unknown
      Note over Index: non-int literal → Unknown
    end
  else Other
    Note over Index: unsupported encoding → Unknown
  end
  Index-->>Engine: evaluation result
  Engine-->>CLI: filtered results
  CLI-->>User: output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit's high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title clearly summarizes the key changes by indicating both the simplification of timestamp range index evaluation and the correction to the float-to-integer conversion utility, while also specifying the affected component and related issue, making it informative and directly relevant to the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just two questions

std::vector<std::pair<std::string, std::vector<int64_t>>> queries_and_results{
{R"aa(timestamp < 1759417024400)aa", {0, 1, 2}},
{R"aa(timestamp > 1759417023100)aa", {0, 1, 2}},
{R"aa(timestamp > 1759417024000)aa", {0, 1, 2}},
Copy link
Contributor

@wraymo wraymo Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for comparing timestamp with a floating point value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add one.

{"idx": 10, "ambiguous_varstring": "abcde"}
{"idx": 11, "ambiguous_varstring": "ae"}
{"idx": 12, "ambiguous_varstring": "a*e"}
{"idx": 13, "one": 1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this case for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AST bug I mentioned -- it is exercised with the one < 1.1 AND one > 0.9 AND one: 1.0 test case.

@gibber9809 gibber9809 requested a review from wraymo October 2, 2025 20:53
@kirkrodrigues
Copy link
Member

Nice PR description. Can we file an issue for the bug (it's user-facing, right?) and refer to it in the title? Otherwise, it's a bit awkward to refer users to the PR, and the title itself doesn't (or can't) indicate what the actual bug was from a user perspective (which, in turn, makes writing release notes a bit harder).

@gibber9809 gibber9809 changed the title fix(clp-s): Simplify timestamp range index evaluation code; Fix conversion utility used to compare float AST literals to integer values. fix(clp-s): Simplify timestamp range index evaluation code; Fix conversion utility used to compare float AST literals to integer values (fixes #1375). Oct 3, 2025
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to @wraymo's review.

@gibber9809 gibber9809 merged commit 67276c0 into y-scope:main Oct 3, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clp-s: Incorrect rounding in double_as_int when converting floating-point numbers into integers for comparison against integer values.

3 participants