-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add a rescorer that uses DoubleValuesSource values to re-score first pass hits #14776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
@Override | ||
public Explanation explain(IndexSearcher searcher, Explanation firstPassExplanation, int docID) | ||
throws IOException { | ||
Explanation first = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be moved to when it is needed, such as L120?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My flow was to produce first pass explanation, then second explanation, then combined score and final explanation.
} | ||
} | ||
if (leafWithDoc == null) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are throwing exception here instead of Explanation.noMatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a tricky one. Since this explain API can take any docId, I can see the argument of surfacing a "noMatch" here. However, my thinking is that Rescorer#explain
API is meant to explain how the Rescorer would score this docId for given searcher, when rescore()
is called.
If we hit null here, it means that the docId was not matched by the provided searcher in the first-pass itself. This is likely because a different searcher was passed than the one that produced provided docId, which is an error that we should surface.
Suppose we say noMatch
for first pass, and the docId does have a value in DoubleValuesSource. Then the explain API will return a score for the docId, However, the actual rescore API will never return a score for this (searcher, docId) combination because they don't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks for the explanation. Maybe we can add some nuance you mentioned above to the error message, if the only possible case is a different IndexSearcher was passed? Anyhow, no strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
As a follow up, I think we should be able to re-implement QueryRescorer using this class and DoubleValuesSource#fromQuery(Query)
b65f6de
to
0a42fee
Compare
A
Rescorer
that uses values from providedDoubleValuesSource
to re-score top N hits of a query.Enables us to rescore hits from ANN vector queries using full-precision or late-interaction DoubleValuesSources (#14729 and #14708).