Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

jainankitk · 2025-05-07T22:43:54Z

Description

When lowerPoint is equal to upperPoint. In fact, there is no need to compare lowerPoint and upperPoint at the same time. The number of comparisons can be reduced by half when collecting document ids to construct bitsets and deeply traversing the bkd tree.
During my reading of Elasticsearch related code, I found that when executing term or terms queries on the date field, the query is rewritten, and a single term is rewritten as a rang query (lowerTerm is the same as upperTerm). The terms query uses BooleanQuery to encapsulate multiple range queries. Therefore, it is more suitable for this scenario, reducing the number of comparisons and improving performance when collecting a large number of documents.

Related #14267

jainankitk · 2025-05-07T22:46:00Z

Addressed review comments, and benchmark results below:

                              TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value                                                                                                                                                                                                                                              
                            IntSet      437.40     (12.5%)      420.59      (7.3%)   -3.8% ( -21% -   18%) 0.235                                                                                                                                                                                                                                                           
             BrowseMonthTaxoFacets        2.55      (7.1%)        2.48      (6.7%)   -2.7% ( -15% -   11%) 0.219                                                                                                                                                                                                                                                           
                          PKLookup      170.61      (5.2%)      166.40      (9.8%)   -2.5% ( -16% -   13%) 0.322                                                                                                                                                                                                                                                           
             BrowseMonthSSDVFacets        3.63     (19.5%)        3.54     (13.1%)   -2.4% ( -29% -   37%) 0.641                                                                                                                                                                                                                                                           
                            Fuzzy1       72.71      (7.1%)       71.37      (6.8%)   -1.8% ( -14% -   13%) 0.403                                                                                                                                                                                                                                                           
           AndHighMedDayTaxoFacets       77.25      (4.2%)       76.05      (5.1%)   -1.5% ( -10% -    8%) 0.295                                                                                                                                                                                                                                                           
               LowIntervalsOrdered       23.31      (7.5%)       23.01      (7.7%)   -1.3% ( -15% -   15%) 0.593                                                                                                                                                                                                                                                           
              MedTermDayTaxoFacets       17.96      (4.7%)       17.77      (5.3%)   -1.1% ( -10% -    9%) 0.495                                                                                                                                                                                                                                                           
                      HighSpanNear        9.69      (4.6%)        9.60      (4.6%)   -1.0% (  -9% -    8%) 0.506                                                                                                                                                                                                                                                           
                       AndHighHigh       84.54      (4.0%)       83.74      (3.6%)   -1.0% (  -8% -    6%) 0.425                                                                                                                                                                                                                                                           
                       LowSpanNear       25.14      (7.8%)       24.91      (8.3%)   -0.9% ( -15% -   16%) 0.721                                                                                                                                                                                                                                                           
                 HighTermMonthSort     1133.10      (6.1%)     1123.71      (4.9%)   -0.8% ( -11% -   10%) 0.637                                                                                                                                                                                                                                                           
                         MedPhrase      281.29      (4.5%)      279.24      (5.0%)   -0.7% (  -9% -    9%) 0.629                                                                                                                                                                                                                                                           
              HighIntervalsOrdered       12.78      (7.1%)       12.72      (6.5%)   -0.5% ( -13% -   14%) 0.827                                                                                                                                                                                                                                                           
            OrHighMedDayTaxoFacets        4.93      (5.5%)        4.91      (8.0%)   -0.5% ( -13% -   13%) 0.830                                                                                                                                                                                                                                                           
                        AndHighMed      137.39      (2.1%)      136.76      (2.4%)   -0.5% (  -4% -    4%) 0.519                                                                                                                                                                                                                                                           
              BrowseDateSSDVFacets        0.63     (10.7%)        0.62     (13.3%)   -0.4% ( -22% -   26%) 0.914                                                                                                                                                                                                                                                           
                      OrNotHighMed      360.87      (5.1%)      359.45      (3.8%)   -0.4% (  -8% -    8%) 0.782                                                                                                                                                                                                                                                           
                  HighSloppyPhrase       13.07      (5.7%)       13.03      (5.0%)   -0.3% ( -10% -   10%) 0.864                                                                                                                                                                                                                                                           
                        HighPhrase       95.78      (3.2%)       95.60      (2.4%)   -0.2% (  -5% -    5%) 0.836                                                                                                                                                                                                                                                           
                      OrNotHighLow     1009.27      (4.0%)     1007.45      (4.3%)   -0.2% (  -8% -    8%) 0.891                                                                                                                                                                                                                                                           
         BrowseDayOfYearSSDVFacets        3.36     (16.2%)        3.35      (8.5%)   -0.1% ( -21% -   29%) 0.972                                                                                                                                                                                                                                                           
                        TermDTSort      145.69      (2.9%)      145.68      (3.5%)   -0.0% (  -6% -    6%) 0.992                                                                                                                                                                                                                                                           
                           Prefix3      660.06      (4.4%)      661.23      (4.1%)    0.2% (  -8% -    9%) 0.896                                                                                                                                                                                                                                                           
                          Wildcard      112.87      (3.7%)      113.14      (3.1%)    0.2% (  -6% -    7%) 0.828                                                                                                                                                                                                                                                           
              HighTermTitleBDVSort       16.47      (5.8%)       16.51      (3.8%)    0.3% (  -8% -   10%) 0.860                                                                                                                                                                                                                                                           
                         OrHighLow      472.58      (3.0%)      473.97      (3.1%)    0.3% (  -5% -    6%) 0.760                                                                                                                                                                                                                                                           
                            IntNRQ      471.18      (3.0%)      472.58      (3.1%)    0.3% (  -5% -    6%) 0.760                                                                                                                                                                                                                                                           
       BrowseRandomLabelTaxoFacets        1.89      (4.4%)        1.90      (4.2%)    0.3% (  -7% -    9%) 0.815                                                                                                                                                                                                                                                           
          AndHighHighDayTaxoFacets       20.08      (4.3%)       20.16      (3.6%)    0.4% (  -7% -    8%) 0.769                                                                                                                                                                                                                                                           
              BrowseDateTaxoFacets        2.30      (9.1%)        2.31     (10.7%)    0.4% ( -17% -   22%) 0.891                                                                                                                                                                                                                                                           
                           Respell       45.34      (5.1%)       45.55      (6.9%)    0.5% ( -11% -   13%) 0.805                                                                                                                                                                                                                                                           
                            Fuzzy2       45.72      (6.7%)       45.94      (6.2%)    0.5% ( -11% -   14%) 0.814                                                                                                                                                                                                                                                           
             HighTermDayOfYearSort      152.97      (3.0%)      153.83      (2.1%)    0.6% (  -4% -    5%) 0.494                                                                                                                                                                                                                                                           
                        AndHighLow     1031.24      (4.7%)     1037.08      (4.0%)    0.6% (  -7% -    9%) 0.683                                                                                                                                                                                                                                                           
                           LowTerm      576.86      (2.6%)      580.90      (3.8%)    0.7% (  -5% -    7%) 0.493                                                                                                                                                                                                                                                           
         BrowseDayOfYearTaxoFacets        2.31      (9.1%)        2.33      (9.5%)    0.8% ( -16% -   21%) 0.784                                                                                                                                                                                                                                                           
                     OrNotHighHigh      343.60      (4.9%)      346.55      (6.0%)    0.9% (  -9% -   12%) 0.617                                                                                                                                                                                                                                                           
               MedIntervalsOrdered        6.80      (5.8%)        6.88      (6.9%)    1.2% ( -10% -   14%) 0.555                                                                                                                                                                                                                                                           
                        OrHighHigh       80.13      (7.8%)       81.08      (7.3%)    1.2% ( -12% -   17%) 0.618                                                                                                                                                                                                                                                           
                         LowPhrase      107.91     (14.1%)      109.76     (16.3%)    1.7% ( -25% -   37%) 0.723                                                                                                                                                                                                                                                           
                         OrHighMed      204.76      (3.0%)      208.43      (2.6%)    1.8% (  -3% -    7%) 0.044                                                                                                                                                                                                                                                           
                 HighTermTitleSort       59.78      (3.9%)       61.12      (3.2%)    2.2% (  -4% -    9%) 0.048                                                                                                                                                                                                                                                           
                           MedTerm      564.17      (4.8%)      577.08      (5.2%)    2.3% (  -7% -   12%) 0.150                                                                                                                                                                                                                                                           
       BrowseRandomLabelSSDVFacets        2.14      (5.2%)        2.19      (5.4%)    2.3% (  -7% -   13%) 0.169                                                                                                                                                                                                                                                           
                   LowSloppyPhrase        9.31      (9.1%)        9.53      (8.4%)    2.4% ( -13% -   21%) 0.392                                                                                                                                                                                                                                                           
                          HighTerm      405.20      (3.1%)      415.04      (5.3%)    2.4% (  -5% -   11%) 0.075                                                                                                                                                                                                                                                           
                      OrHighNotLow      561.56      (4.4%)      577.04      (5.0%)    2.8% (  -6% -   12%) 0.064                                                                                                                                                                                                                                                           
                       MedSpanNear       48.54      (9.5%)       49.97      (8.1%)    3.0% ( -13% -   22%) 0.290                                                                                                                                                                                                                                                           
                     OrHighNotHigh      303.50      (4.7%)      313.29      (4.6%)    3.2% (  -5% -   13%) 0.028                                                                                                                                                                                                                                                           
                   MedSloppyPhrase       22.60      (9.1%)       23.41      (6.9%)    3.6% ( -11% -   21%) 0.164                                                                                                                                                                                                                                                           
                             range     1881.74      (9.2%)     1951.73     (11.7%)    3.7% ( -15% -   27%) 0.264                                                                                                                                                                                                                                                           
                      OrHighNotMed      337.02      (6.3%)      350.53      (7.1%)    4.0% (  -8% -   18%) 0.059

This reverts commit b83f193.

jpountz · 2025-05-15T20:33:56Z

PointRangeQuery bottlenecks of adding doc IDs to a bit set rather than on comparisons, so I wonder if this optimization really helps in practice. On your benchmark run, I see no query with a speedup and a low p-value?

jainankitk · 2025-05-23T19:35:35Z

On your benchmark run, I see no query with a speedup and a low p-value?

I primarily wanted to ensure there is no regression. Do we have any benchmark queries with low == high?

PointRangeQuery bottlenecks of adding doc IDs to a bit set rather than on comparisons, so I wonder if this optimization really helps in practice.

I can see that it is hard to measure the impact of this change. Will try adding some low level benchmark and see if this really helps.

github-actions · 2025-06-07T00:26:35Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

hanbj and others added 5 commits April 27, 2025 18:23

PointInSetQuery early exit on non-matching segments (apache#14268)

50bfd06

Reduce comparisons for equalPoint case

245cac7

Merge branch 'main' into prq-sv

63e5695

Merge branch 'main' into prq-sv

6f3e400

Addressing review comments

64e3182

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 7, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 7, 2025

github-actions bot added the module:core/search label May 7, 2025

jainankitk mentioned this pull request May 7, 2025

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14267

Closed

jainankitk added 2 commits May 7, 2025 15:50

Making inner classes static

b83f193

Revert "Making inner classes static"

687aa6c

This reverts commit b83f193.

github-actions bot added the Stale label Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

Uh oh!

jainankitk commented May 7, 2025

Uh oh!

jainankitk commented May 7, 2025

Uh oh!

jpountz commented May 15, 2025

Uh oh!

jainankitk commented May 23, 2025

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

Uh oh!

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

Are you sure you want to change the base?

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

Uh oh!

Conversation

jainankitk commented May 7, 2025

Description

Uh oh!

jainankitk commented May 7, 2025

Uh oh!

jpountz commented May 15, 2025

Uh oh!

jainankitk commented May 23, 2025

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

Uh oh!