Skip to content

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Pulkitg64
Copy link
Contributor

Description

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedVectorValues.

In a search architecture where searchers and writer runs on separate machine, it is wasteful to have raw float vectors on machine when vector quantization enabled. This PR is adding getQuantizedVectorValues in LeafReader which will help to read QuantizedByteVectors directly without need of reading raw float vectors.

Partially solving #13158

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@benwtrent
Copy link
Member

@Pulkitg64 I don't understand how this is part of #13158

I would have thought the APIs stay the same. Quantization should be able to "rehydrate" the quantized vectors into floating point (or whatever the original values).

So, the segment, depending on what data it has access to, will:

  • Return the original doc value floating point vectors
  • Rehydrate the quantized values.

Either way, users should still be able to call float[] vectorValue(int ord).

I would think there is a sub-class called QuantizedFloatVectorValues, that satisfies the FloatVectorValues interface.

But maybe we add an isApproximate() or a extractQuantizedValues() that returns null, or the QuantizedFloatVectorValues interface.

But it is likely useless for the user to have access to the quantized bytes directly as they don't provide much value without knowing how to use them.

@Pulkitg64
Copy link
Contributor Author

Thanks @benwtrent for the quick review and comments.

Regarding your comment about how this relates to issue #13158 - I agree in a way this PR doesn't directly help create a "read-only" index as mentioned in the issue. Let me clarify the motivation:

This PR addresses a scenario where:

  • Raw (unquantized) vectors are removed from the index since they aren't needed for searching
  • The architecture has searcher and writer running on separate machines

Currently, there's no way to directly access quantized vectors - we can only access raw vectors. But if raw vectors are dropped, this causes errors. This PR adds methods to access ByteQuantizedVectors in such cases.

As for the usefulness of accessing quantized bytes directly - we have specific use cases, such as returning the vectors themselves when requested in a query.

Please let me know your thoughts.

Regarding accessing quantized vectors directly - we could also consider using the QuantizedVectorValues class, which is currently returned by the getFloatVectorValues method. While this class wraps both raw and quantized vectors, its members are private, preventing direct access to the quantized vectors like we're doing in this PR.

Would it make more sense to make the relevant members public in QuantizedVectorValues rather than adding getQuantizedVectorValues to LeafReader?

@benwtrent
Copy link
Member

As for the usefulness of accessing quantized bytes directly - we have specific use cases, such as returning the vectors themselves when requested in a query.

I would assume the caller would want something akin to the float values. What would a caller be expected to do with the quantized bytes directly?

I am saying that returning the quantized bytes, without knowing all the other information (quantized technique, the technique's parameters, etc.) is pretty useless.

@Pulkitg64
Copy link
Contributor Author

We are experimenting with large vector indexes, and since (raw unquantized) vectors consume significant disk space (4x more than quantized vectors), we want to drop the raw vectors from searcher machines. We are currently using vector values for below use cases:

  1. Calculating the dot-product scores and return them in search results
  2. Returning the vectors in search results
  3. Vector counting for metrics

For use case 1 we have started to use vectorScorer which use quantized vectors for computing score so we are good there. For use cases 2 and 3, we currently use floatVectorValues using getFloatVectorValues but need to switch to quantizedVectorValues since searchers won't have float vectors anymore and we are okay in accepting the accuracy loss from float-to-byte quantization.

To address these use cases, we have two options:

  • Introduce a new API: getQuantizedVectorValues to access quantizedByteVector OR
  • Use our local workaround: Make the QuantizedVectorValues class and its members public to directly access quantized vectors

I would like to know your thoughts on whether we should create such an API, and if you think the above use cases don't justify a new API, what are your thoughts on implementing the workaround solution and pushing it upstream?

@benwtrent
Copy link
Member

Returning the vectors in search results

Why would you need to do this?

Generally, I would assume that any access to the vector would be "Give me what I gave you", and the best we can do with quantized vectors is the dequantized vector.

I don't fully understand how serializing a read-only segment that is missing files (e.g. missing the "vec" file), but the format should do the right thing and see that the file isn't there and provide an approximate view of the floating point vectors.

Vector counting for metrics

I don't understand what this means really. Just counting how many vectors there are? This should be doable via the FloatVectorValues interface.

but need to switch to quantizedVectorValues since searchers won't have float vectors anymore and we are okay in accepting the accuracy loss from float-to-byte quantization.

Again, I think we should do the nice thing, de-quantize the vectors as the user asks for them.

It should fully satisfy the FloatVectorValues API, de-quantizing the vectors and indicate that the vector returned is an approximation.

Getting access to the raw quantized bytes is basically useless without all the other parameters that were used to quantized the vector.

@benwtrent
Copy link
Member

benwtrent commented Jun 16, 2025

@Pulkitg64

Basically, I don't think callers should know if they are hitting quantized vectors or raw. Or at least have to make that decision up front.

Requiring the user to pick the right thing seems unnecessary when we have the appropriate interfaces already. Its just all about determining how the format itself knows that its missing the vec file.

@msokolov
Copy link
Contributor

For the return values use case, another choice is to disable it in the case the original vectors were not "stored" in the searchable index. Otherwise, I agree with Ben that we could support "rehydration" in the codec. For example, suppose we see that we have zero full-precision vectors, but nonzero quantized vectors; then we could fall back to "rehydration".

For the counting case (get total number of vectors), should we always use the quantized count where today we use the full-precision count?

@Pulkitg64
Copy link
Contributor Author

Thanks @benwtrent @msokolov

Again, I think we should do the nice thing, de-quantize the vectors as the user asks for them.

Sorry, I am surely missing something. But if we can't access the quantized vector then how can we do this? We still need byte vector to rehydrate float vectors, right?

@benwtrent
Copy link
Member

@Pulkitg64 I think the format has access to the quantized vectors. I am saying we shouldn't add a new getQuantizedVEctors to the leaf or kNN APIs.

When the vec file isn't present, the format should still return a FloatVectorValues, and when folks call float[] getVector(int ord), they get the de-quantized representation of the vector.

This should require no new public facing APIs anywhere.

Here is the idea.

public FloatVectorValues getFloatVectorValues(String field) throws IOException {
    return OffHeapQuantizedFloatVectorValues.load(...);
}

...

static class OffHeapQuantizedFloatVectorValues extends FloatVectorValues {
  
  final float[] scratch;
  int curOrd = -1;

  ...

  public  float[] vectorValue(int ord) throws IOException {
    if (ord == curOrd) return scratch;
    dequantize(ord);
    curOrd = ord;
    return scratch;
  }

  private void dequantize(int ord) throws IOException {
    //read quantized values and parameters
    // dequantize into scratch
  }
  
}

@Pulkitg64
Copy link
Contributor Author

Oh I understand now what you meant in your comments. This approach is much more cleaner and doesn't require any new API addition. Will raise a new revision in some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants