Skip to content

Merge PostingsEnum and ImpactsEnum. #14716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 26, 2025

When impacts were introduced, I tried to avoid touching PostingsEnum, which had a relatively stable API while impacts were rather experimental. Impacts have been quite successful, so I think it's now time to make things simpler by adding support to impacts directly into PostingsEnum?

More specifically:

  • PostingsEnum now has advanceShallow and getImpacts methods.
  • You can request impacts on a PostingsEnum by using the IMPACTS flag.
  • When the IMPACTS flag is not set, PostingsEnum may return dummy impacts (as for term frequencies when not requested).

jpountz added 2 commits May 26, 2025 16:21
When impacts were introduced, I tried to avoid touching `PostingsEnum`, which
had a relatively stable API while impacts were rather experimental. Impacts
have been quite successful, so I think it's now time to make things simpler by
adding support to impacts directly into `PostingsEnum`?

More specifically:
 - `PostingsEnum` now has `advanceShallow` and `getImpacts` methods.
 - You can request impacts on a `PostingsEnum` by using the `IMPACTS` flag.
 - When the `IMPACTS` flag is not set, `PostingsEnum` may return dummy impacts
   (as for term frequencies when not requested).
Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

*/
public abstract ImpactsEnum impacts(int flags) throws IOException;
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this and deprecate in 10x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I was hoping to remove in a subsequent PR.

* Flag to pass to {@link TermsEnum#postings(PostingsEnum, int)} to get impacts in the returned
* enum.
*/
public static final short IMPACTS = FREQS | 1 << 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wild thought: can we make impacts work on PostingEnum.FREQ instead introducing this flag? We can delay the reading of impact bytes (where is needsImpacts mostly used today) to getImpacts, so that it won't cause any noticeable overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. Let me check the performance impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few iterations on wikibigall, it seems to work fine with the current codec. However it looks challenging to apply correctly to old codecs, which use a different PostingsEnum impl when impacts are requested, and that I'd like to avoid touching too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that having a flag also has some benefits, e.g. some impls could decide to store impacts in a separate file in the future and only open this file if impacts are requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying!

Copy link

github-actions bot commented Jun 2, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants