Skip to content

Add a rescorer that uses DoubleValuesSource values to re-score first pass hits #14776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 15, 2025

Conversation

vigyasharma
Copy link
Contributor

A Rescorer that uses values from provided DoubleValuesSource to re-score top N hits of a query.

Enables us to rescore hits from ANN vector queries using full-precision or late-interaction DoubleValuesSources (#14729 and #14708).

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@Override
public Explanation explain(IndexSearcher searcher, Explanation firstPassExplanation, int docID)
throws IOException {
Explanation first =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be moved to when it is needed, such as L120?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My flow was to produce first pass explanation, then second explanation, then combined score and final explanation.

}
}
if (leafWithDoc == null) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are throwing exception here instead of Explanation.noMatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a tricky one. Since this explain API can take any docId, I can see the argument of surfacing a "noMatch" here. However, my thinking is that Rescorer#explain API is meant to explain how the Rescorer would score this docId for given searcher, when rescore() is called.

If we hit null here, it means that the docId was not matched by the provided searcher in the first-pass itself. This is likely because a different searcher was passed than the one that produced provided docId, which is an error that we should surface.

Suppose we say noMatch for first pass, and the docId does have a value in DoubleValuesSource. Then the explain API will return a score for the docId, However, the actual rescore API will never return a score for this (searcher, docId) combination because they don't match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for the explanation. Maybe we can add some nuance you mentioned above to the error message, if the only possible case is a different IndexSearcher was passed? Anyhow, no strong opinion here.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

As a follow up, I think we should be able to re-implement QueryRescorer using this class and DoubleValuesSource#fromQuery(Query)

@vigyasharma vigyasharma merged commit c8147c9 into apache:main Jun 15, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants