Skip to content

Update the IOContext on IndexInput rather than the ReadAdvice #14702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

thecoop
Copy link
Contributor

@thecoop thecoop commented May 22, 2025

Update the IOContext on IndexInput implementations rather than the ReadAdvice. This change means that ReadAdvice is now only used by MMapDirectory and friends. Note that this is a refactoring only, it does not change behaviour.

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@@ -124,7 +124,7 @@ public abstract void search(
*
* <p>The default implementation returns {@code this}
*/
public KnnVectorsReader getMergeInstance() {
public KnnVectorsReader getMergeInstance() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat tangential to this change:

I believe that getMergeInstance() initially didn't throw because the mental model was that it was akin to clone() but specialized for merging, and clone() doesn't throw an IOException. But we now want to run operations that may throw an IOException in getMergeInstance() impls. I think we need to decide if it's best to make all XXXFormat#getMergeInstance() throw an IOException (only a couple of them seem modified in this PR) or if impls should rethrow as an UncheckedIOException. (I don't thave an opinion on this question at this point.)

Another problem is that getMergeInstance() is no longer an independent clone (via other PRs, not yours), as the read advice is updated for everyone, not just for the merge instance. So it feels like this would be better done by a setter, to indicate that this affects the current instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing an IOException is ok here - the top-level MergeState constructor already throws IOException, so no core semantics are being modified here. And it seems reasonable for a io-related method to throw an IOException. The alternative is to throw an UncheckedIOException, which may break the caller of MergeState. I can modify the other getMergeInstance methods to declare throws IOException for consistency here.

For the setter idea, that's a tricky one. The other getMergeInstance methods do create an independent instance, and there are also some implementations of KnnVectorsReader.getMergeInstance which do create a new instance (albeit to wrap other getMergeInstance objects). But we can't do that for Lucene99FlatVectorsReader, due to only having one underlying memory segment that we have to share between the IndexInput objects we use.

Ideally, we would clone - but we can't for this specific implementation. My hunch is that this is ok as-is, given we have AssertingKnnVectorsReader ensuring the calls happen as expected so that nothing breaks (I've tightened up the assertions for this). Any other changes in this area and we would need to re-evaluate though.

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

github-actions bot commented Jun 7, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 7, 2025
@thecoop thecoop requested a review from jpountz June 9, 2025 08:32
@github-actions github-actions bot removed the Stale label Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants