-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Pre-calculate minRequiredScore to speedup filterCompetitiveHits #14827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice speedup! I wonder if we can make it work in the general case with something like that (untested):
diff --git a/lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java b/lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java
index 5c677d7a464..0b1e2f30a6e 100644
--- a/lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java
+++ b/lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java
@@ -128,15 +128,20 @@ class ScorerUtil {
double maxRemainingScore,
float minCompetitiveScore,
int numScorers) {
- if ((float) MathUtil.sumUpperBound(maxRemainingScore, numScorers) >= minCompetitiveScore) {
+
+ // Compute minRequiredScore as the greatest float value so that (float) MathUtil.sumUpperBound(minRequiredScore + maxRemainingScore, numScorers) <= minCompetitiveScore.
+ float minRequiredScore = (float) (minCompetitiveScore - maxRemainingScore);
+ while ((float) MathUtil.sumUpperBound(minRequiredScore + maxRemainingScore, numScorers) > minCompetitiveScore) {
+ minRequiredScore = Math.nextDown(minRequiredScore);
+ }
+
+ if (minRequiredScore <= 0) {
return;
}
int newSize = 0;
for (int i = 0; i < buffer.size; ++i) {
- float maxPossibleScore =
- (float) MathUtil.sumUpperBound(buffer.scores[i] + maxRemainingScore, numScorers);
- if (maxPossibleScore >= minCompetitiveScore) {
+ if (buffer.scores[i] >= minRequiredScore) {
buffer.docs[newSize] = buffer.docs[i];
buffer.scores[newSize] = buffer.scores[i];
newSize++;
FWIW I could confirm a speedup as well with this change on wiibigall. |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
This suggests that the |
Got it, I'll dig into this, thanks for your explaination! It helps me a lot :) |
I played a bit more with your change, and it looks like we could make things even further as a follow-up through vectorization by using |
Thinking a bit more about this problem, finding the smallest float that meets the condition may not always be cheap. However, finding a good-enough float should be easy, and it's fine if the hit count changes. |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
I'll run another full task luceneutil to verify the performance impact tonight |
Thanks for your advice! I think i understand what you mean, will try to find a proper way to utilize the vectorization these days. |
luceneutil result under same setup with lastest code:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool to see speedups on queries that have more than 2 clauses as well! I left some comments. When we want to vectorize this (if it helps, but I think it may), we'll want minRequiredScore
to be a float
again, but we can figure it out in a follow-up.
Maybe extract this computation of minRequiredScore
to a pkg-private helper function so that we can unit-test it?
filterCompetitiveHits
when have exact 2 clauses
FWIW, I was a little bit worried about the cost of final patch vs baseline
cached
|
int numScorers = random().nextInt(1, 10); | ||
|
||
double minRequiredScore = | ||
ScorerUtil.minRequiredScore(maxRemainingScore, minCompetitiveScore, numScorers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a way to track the number of iterations and fail the test if it took more than X (e.g. 10) iterations to converge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to assert it in a way that relies on the internal implemenation detail of this function, not sure whether it's accecptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we assert it inside minRequiredScore
, still a little bit ugly though
lucene/core/src/test/org/apache/lucene/search/TestScorerUtil.java
Outdated
Show resolved
Hide resolved
The change from luceneutil
|
Thank you, we may want to look into making this better in follow-ups, but this looks good enough for me so I merged. 👍 |
Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>
Builds are failing after this has been merged? |
I'm taking a look now. |
The seed did not reproduce for me, but I think I understand the problem. The code assumes that if I pushed a fix. |
Description
This PR propose to specialize function
filterCompetitiveHits
when we have exact 2 scorers, in order to reduce float calculation and potential function callsLuceneutil result on
wikimediumall
withsearchConcurrency=0
,taskCountPerCat=5
,taskRepeatCount=50
after 20 iterations