Skip to content

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jainankitk
Copy link
Contributor

Description

When lowerPoint is equal to upperPoint. In fact, there is no need to compare lowerPoint and upperPoint at the same time. The number of comparisons can be reduced by half when collecting document ids to construct bitsets and deeply traversing the bkd tree.
During my reading of Elasticsearch related code, I found that when executing term or terms queries on the date field, the query is rewritten, and a single term is rewritten as a rang query (lowerTerm is the same as upperTerm). The terms query uses BooleanQuery to encapsulate multiple range queries. Therefore, it is more suitable for this scenario, reducing the number of comparisons and improving performance when collecting a large number of documents.

Related #14267

@jainankitk
Copy link
Contributor Author

Addressed review comments, and benchmark results below:

                              TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value                                                                                                                                                                                                                                              
                            IntSet      437.40     (12.5%)      420.59      (7.3%)   -3.8% ( -21% -   18%) 0.235                                                                                                                                                                                                                                                           
             BrowseMonthTaxoFacets        2.55      (7.1%)        2.48      (6.7%)   -2.7% ( -15% -   11%) 0.219                                                                                                                                                                                                                                                           
                          PKLookup      170.61      (5.2%)      166.40      (9.8%)   -2.5% ( -16% -   13%) 0.322                                                                                                                                                                                                                                                           
             BrowseMonthSSDVFacets        3.63     (19.5%)        3.54     (13.1%)   -2.4% ( -29% -   37%) 0.641                                                                                                                                                                                                                                                           
                            Fuzzy1       72.71      (7.1%)       71.37      (6.8%)   -1.8% ( -14% -   13%) 0.403                                                                                                                                                                                                                                                           
           AndHighMedDayTaxoFacets       77.25      (4.2%)       76.05      (5.1%)   -1.5% ( -10% -    8%) 0.295                                                                                                                                                                                                                                                           
               LowIntervalsOrdered       23.31      (7.5%)       23.01      (7.7%)   -1.3% ( -15% -   15%) 0.593                                                                                                                                                                                                                                                           
              MedTermDayTaxoFacets       17.96      (4.7%)       17.77      (5.3%)   -1.1% ( -10% -    9%) 0.495                                                                                                                                                                                                                                                           
                      HighSpanNear        9.69      (4.6%)        9.60      (4.6%)   -1.0% (  -9% -    8%) 0.506                                                                                                                                                                                                                                                           
                       AndHighHigh       84.54      (4.0%)       83.74      (3.6%)   -1.0% (  -8% -    6%) 0.425                                                                                                                                                                                                                                                           
                       LowSpanNear       25.14      (7.8%)       24.91      (8.3%)   -0.9% ( -15% -   16%) 0.721                                                                                                                                                                                                                                                           
                 HighTermMonthSort     1133.10      (6.1%)     1123.71      (4.9%)   -0.8% ( -11% -   10%) 0.637                                                                                                                                                                                                                                                           
                         MedPhrase      281.29      (4.5%)      279.24      (5.0%)   -0.7% (  -9% -    9%) 0.629                                                                                                                                                                                                                                                           
              HighIntervalsOrdered       12.78      (7.1%)       12.72      (6.5%)   -0.5% ( -13% -   14%) 0.827                                                                                                                                                                                                                                                           
            OrHighMedDayTaxoFacets        4.93      (5.5%)        4.91      (8.0%)   -0.5% ( -13% -   13%) 0.830                                                                                                                                                                                                                                                           
                        AndHighMed      137.39      (2.1%)      136.76      (2.4%)   -0.5% (  -4% -    4%) 0.519                                                                                                                                                                                                                                                           
              BrowseDateSSDVFacets        0.63     (10.7%)        0.62     (13.3%)   -0.4% ( -22% -   26%) 0.914                                                                                                                                                                                                                                                           
                      OrNotHighMed      360.87      (5.1%)      359.45      (3.8%)   -0.4% (  -8% -    8%) 0.782                                                                                                                                                                                                                                                           
                  HighSloppyPhrase       13.07      (5.7%)       13.03      (5.0%)   -0.3% ( -10% -   10%) 0.864                                                                                                                                                                                                                                                           
                        HighPhrase       95.78      (3.2%)       95.60      (2.4%)   -0.2% (  -5% -    5%) 0.836                                                                                                                                                                                                                                                           
                      OrNotHighLow     1009.27      (4.0%)     1007.45      (4.3%)   -0.2% (  -8% -    8%) 0.891                                                                                                                                                                                                                                                           
         BrowseDayOfYearSSDVFacets        3.36     (16.2%)        3.35      (8.5%)   -0.1% ( -21% -   29%) 0.972                                                                                                                                                                                                                                                           
                        TermDTSort      145.69      (2.9%)      145.68      (3.5%)   -0.0% (  -6% -    6%) 0.992                                                                                                                                                                                                                                                           
                           Prefix3      660.06      (4.4%)      661.23      (4.1%)    0.2% (  -8% -    9%) 0.896                                                                                                                                                                                                                                                           
                          Wildcard      112.87      (3.7%)      113.14      (3.1%)    0.2% (  -6% -    7%) 0.828                                                                                                                                                                                                                                                           
              HighTermTitleBDVSort       16.47      (5.8%)       16.51      (3.8%)    0.3% (  -8% -   10%) 0.860                                                                                                                                                                                                                                                           
                         OrHighLow      472.58      (3.0%)      473.97      (3.1%)    0.3% (  -5% -    6%) 0.760                                                                                                                                                                                                                                                           
                            IntNRQ      471.18      (3.0%)      472.58      (3.1%)    0.3% (  -5% -    6%) 0.760                                                                                                                                                                                                                                                           
       BrowseRandomLabelTaxoFacets        1.89      (4.4%)        1.90      (4.2%)    0.3% (  -7% -    9%) 0.815                                                                                                                                                                                                                                                           
          AndHighHighDayTaxoFacets       20.08      (4.3%)       20.16      (3.6%)    0.4% (  -7% -    8%) 0.769                                                                                                                                                                                                                                                           
              BrowseDateTaxoFacets        2.30      (9.1%)        2.31     (10.7%)    0.4% ( -17% -   22%) 0.891                                                                                                                                                                                                                                                           
                           Respell       45.34      (5.1%)       45.55      (6.9%)    0.5% ( -11% -   13%) 0.805                                                                                                                                                                                                                                                           
                            Fuzzy2       45.72      (6.7%)       45.94      (6.2%)    0.5% ( -11% -   14%) 0.814                                                                                                                                                                                                                                                           
             HighTermDayOfYearSort      152.97      (3.0%)      153.83      (2.1%)    0.6% (  -4% -    5%) 0.494                                                                                                                                                                                                                                                           
                        AndHighLow     1031.24      (4.7%)     1037.08      (4.0%)    0.6% (  -7% -    9%) 0.683                                                                                                                                                                                                                                                           
                           LowTerm      576.86      (2.6%)      580.90      (3.8%)    0.7% (  -5% -    7%) 0.493                                                                                                                                                                                                                                                           
         BrowseDayOfYearTaxoFacets        2.31      (9.1%)        2.33      (9.5%)    0.8% ( -16% -   21%) 0.784                                                                                                                                                                                                                                                           
                     OrNotHighHigh      343.60      (4.9%)      346.55      (6.0%)    0.9% (  -9% -   12%) 0.617                                                                                                                                                                                                                                                           
               MedIntervalsOrdered        6.80      (5.8%)        6.88      (6.9%)    1.2% ( -10% -   14%) 0.555                                                                                                                                                                                                                                                           
                        OrHighHigh       80.13      (7.8%)       81.08      (7.3%)    1.2% ( -12% -   17%) 0.618                                                                                                                                                                                                                                                           
                         LowPhrase      107.91     (14.1%)      109.76     (16.3%)    1.7% ( -25% -   37%) 0.723                                                                                                                                                                                                                                                           
                         OrHighMed      204.76      (3.0%)      208.43      (2.6%)    1.8% (  -3% -    7%) 0.044                                                                                                                                                                                                                                                           
                 HighTermTitleSort       59.78      (3.9%)       61.12      (3.2%)    2.2% (  -4% -    9%) 0.048                                                                                                                                                                                                                                                           
                           MedTerm      564.17      (4.8%)      577.08      (5.2%)    2.3% (  -7% -   12%) 0.150                                                                                                                                                                                                                                                           
       BrowseRandomLabelSSDVFacets        2.14      (5.2%)        2.19      (5.4%)    2.3% (  -7% -   13%) 0.169                                                                                                                                                                                                                                                           
                   LowSloppyPhrase        9.31      (9.1%)        9.53      (8.4%)    2.4% ( -13% -   21%) 0.392                                                                                                                                                                                                                                                           
                          HighTerm      405.20      (3.1%)      415.04      (5.3%)    2.4% (  -5% -   11%) 0.075                                                                                                                                                                                                                                                           
                      OrHighNotLow      561.56      (4.4%)      577.04      (5.0%)    2.8% (  -6% -   12%) 0.064                                                                                                                                                                                                                                                           
                       MedSpanNear       48.54      (9.5%)       49.97      (8.1%)    3.0% ( -13% -   22%) 0.290                                                                                                                                                                                                                                                           
                     OrHighNotHigh      303.50      (4.7%)      313.29      (4.6%)    3.2% (  -5% -   13%) 0.028                                                                                                                                                                                                                                                           
                   MedSloppyPhrase       22.60      (9.1%)       23.41      (6.9%)    3.6% ( -11% -   21%) 0.164                                                                                                                                                                                                                                                           
                             range     1881.74      (9.2%)     1951.73     (11.7%)    3.7% ( -15% -   27%) 0.264                                                                                                                                                                                                                                                           
                      OrHighNotMed      337.02      (6.3%)      350.53      (7.1%)    4.0% (  -8% -   18%) 0.059 

@jpountz
Copy link
Contributor

jpountz commented May 15, 2025

PointRangeQuery bottlenecks of adding doc IDs to a bit set rather than on comparisons, so I wonder if this optimization really helps in practice. On your benchmark run, I see no query with a speedup and a low p-value?

@jainankitk
Copy link
Contributor Author

On your benchmark run, I see no query with a speedup and a low p-value?

I primarily wanted to ensure there is no regression. Do we have any benchmark queries with low == high?

PointRangeQuery bottlenecks of adding doc IDs to a bit set rather than on comparisons, so I wonder if this optimization really helps in practice.

I can see that it is hard to measure the impact of this change. Will try adding some low level benchmark and see if this really helps.

Copy link

github-actions bot commented Jun 7, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants