-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Merge PostingsEnum and ImpactsEnum. #14716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When impacts were introduced, I tried to avoid touching `PostingsEnum`, which had a relatively stable API while impacts were rather experimental. Impacts have been quite successful, so I think it's now time to make things simpler by adding support to impacts directly into `PostingsEnum`? More specifically: - `PostingsEnum` now has `advanceShallow` and `getImpacts` methods. - You can request impacts on a `PostingsEnum` by using the `IMPACTS` flag. - When the `IMPACTS` flag is not set, `PostingsEnum` may return dummy impacts (as for term frequencies when not requested).
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification!
*/ | ||
public abstract ImpactsEnum impacts(int flags) throws IOException; | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this and deprecate in 10x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I was hoping to remove in a subsequent PR.
* Flag to pass to {@link TermsEnum#postings(PostingsEnum, int)} to get impacts in the returned | ||
* enum. | ||
*/ | ||
public static final short IMPACTS = FREQS | 1 << 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wild thought: can we make impacts work on PostingEnum.FREQ
instead introducing this flag? We can delay the reading of impact bytes (where is needsImpacts
mostly used today) to getImpacts
, so that it won't cause any noticeable overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. Let me check the performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a few iterations on wikibigall, it seems to work fine with the current codec. However it looks challenging to apply correctly to old codecs, which use a different PostingsEnum impl when impacts are requested, and that I'd like to avoid touching too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that having a flag also has some benefits, e.g. some impls could decide to store impacts in a separate file in the future and only open this file if impacts are requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying!
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
When impacts were introduced, I tried to avoid touching
PostingsEnum
, which had a relatively stable API while impacts were rather experimental. Impacts have been quite successful, so I think it's now time to make things simpler by adding support to impacts directly intoPostingsEnum
?More specifically:
PostingsEnum
now hasadvanceShallow
andgetImpacts
methods.PostingsEnum
by using theIMPACTS
flag.IMPACTS
flag is not set,PostingsEnum
may return dummy impacts (as for term frequencies when not requested).