Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

Pulkitg64 · 2025-06-16T11:30:41Z

Description

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedVectorValues.

In a search architecture where searchers and writer runs on separate machine, it is wasteful to have raw float vectors on machine when vector quantization enabled. This PR is adding getQuantizedVectorValues in LeafReader which will help to read QuantizedByteVectors directly without need of reading raw float vectors.

Partially solving #13158

…ntizedByteVectorValues

github-actions · 2025-06-16T11:31:31Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

benwtrent · 2025-06-16T12:15:53Z

@Pulkitg64 I don't understand how this is part of #13158

I would have thought the APIs stay the same. Quantization should be able to "rehydrate" the quantized vectors into floating point (or whatever the original values).

So, the segment, depending on what data it has access to, will:

Return the original doc value floating point vectors
Rehydrate the quantized values.

Either way, users should still be able to call float[] vectorValue(int ord).

I would think there is a sub-class called QuantizedFloatVectorValues, that satisfies the FloatVectorValues interface.

But maybe we add an isApproximate() or a extractQuantizedValues() that returns null, or the QuantizedFloatVectorValues interface.

But it is likely useless for the user to have access to the quantized bytes directly as they don't provide much value without knowing how to use them.

Pulkitg64 · 2025-06-16T17:30:58Z

Thanks @benwtrent for the quick review and comments.

Regarding your comment about how this relates to issue #13158 - I agree in a way this PR doesn't directly help create a "read-only" index as mentioned in the issue. Let me clarify the motivation:

This PR addresses a scenario where:

Raw (unquantized) vectors are removed from the index since they aren't needed for searching
The architecture has searcher and writer running on separate machines

Currently, there's no way to directly access quantized vectors - we can only access raw vectors. But if raw vectors are dropped, this causes errors. This PR adds methods to access ByteQuantizedVectors in such cases.

As for the usefulness of accessing quantized bytes directly - we have specific use cases, such as returning the vectors themselves when requested in a query.

Please let me know your thoughts.

Regarding accessing quantized vectors directly - we could also consider using the QuantizedVectorValues class, which is currently returned by the getFloatVectorValues method. While this class wraps both raw and quantized vectors, its members are private, preventing direct access to the quantized vectors like we're doing in this PR.

Would it make more sense to make the relevant members public in QuantizedVectorValues rather than adding getQuantizedVectorValues to LeafReader?

benwtrent · 2025-06-16T17:45:32Z

As for the usefulness of accessing quantized bytes directly - we have specific use cases, such as returning the vectors themselves when requested in a query.

I would assume the caller would want something akin to the float values. What would a caller be expected to do with the quantized bytes directly?

I am saying that returning the quantized bytes, without knowing all the other information (quantized technique, the technique's parameters, etc.) is pretty useless.

Pulkitg64 · 2025-06-16T19:35:32Z

We are experimenting with large vector indexes, and since (raw unquantized) vectors consume significant disk space (4x more than quantized vectors), we want to drop the raw vectors from searcher machines. We are currently using vector values for below use cases:

Calculating the dot-product scores and return them in search results
Returning the vectors in search results
Vector counting for metrics

For use case 1 we have started to use vectorScorer which use quantized vectors for computing score so we are good there. For use cases 2 and 3, we currently use floatVectorValues using getFloatVectorValues but need to switch to quantizedVectorValues since searchers won't have float vectors anymore and we are okay in accepting the accuracy loss from float-to-byte quantization.

To address these use cases, we have two options:

Introduce a new API: getQuantizedVectorValues to access quantizedByteVector OR
Use our local workaround: Make the QuantizedVectorValues class and its members public to directly access quantized vectors

I would like to know your thoughts on whether we should create such an API, and if you think the above use cases don't justify a new API, what are your thoughts on implementing the workaround solution and pushing it upstream?

benwtrent · 2025-06-16T19:53:13Z

Returning the vectors in search results

Why would you need to do this?

Generally, I would assume that any access to the vector would be "Give me what I gave you", and the best we can do with quantized vectors is the dequantized vector.

I don't fully understand how serializing a read-only segment that is missing files (e.g. missing the "vec" file), but the format should do the right thing and see that the file isn't there and provide an approximate view of the floating point vectors.

Vector counting for metrics

I don't understand what this means really. Just counting how many vectors there are? This should be doable via the FloatVectorValues interface.

but need to switch to quantizedVectorValues since searchers won't have float vectors anymore and we are okay in accepting the accuracy loss from float-to-byte quantization.

Again, I think we should do the nice thing, de-quantize the vectors as the user asks for them.

It should fully satisfy the FloatVectorValues API, de-quantizing the vectors and indicate that the vector returned is an approximation.

Getting access to the raw quantized bytes is basically useless without all the other parameters that were used to quantized the vector.

benwtrent · 2025-06-16T20:00:59Z

@Pulkitg64

Basically, I don't think callers should know if they are hitting quantized vectors or raw. Or at least have to make that decision up front.

Requiring the user to pick the right thing seems unnecessary when we have the appropriate interfaces already. Its just all about determining how the format itself knows that its missing the vec file.

msokolov · 2025-06-17T17:24:53Z

For the return values use case, another choice is to disable it in the case the original vectors were not "stored" in the searchable index. Otherwise, I agree with Ben that we could support "rehydration" in the codec. For example, suppose we see that we have zero full-precision vectors, but nonzero quantized vectors; then we could fall back to "rehydration".

For the counting case (get total number of vectors), should we always use the quantized count where today we use the full-precision count?

Pulkitg64 · 2025-06-18T10:52:09Z

Thanks @benwtrent @msokolov

Again, I think we should do the nice thing, de-quantize the vectors as the user asks for them.

Sorry, I am surely missing something. But if we can't access the quantized vector then how can we do this? We still need byte vector to rehydrate float vectors, right?

benwtrent · 2025-06-18T11:40:56Z

@Pulkitg64 I think the format has access to the quantized vectors. I am saying we shouldn't add a new getQuantizedVEctors to the leaf or kNN APIs.

When the vec file isn't present, the format should still return a FloatVectorValues, and when folks call float[] getVector(int ord), they get the de-quantized representation of the vector.

This should require no new public facing APIs anywhere.

Here is the idea.

public FloatVectorValues getFloatVectorValues(String field) throws IOException {
    return OffHeapQuantizedFloatVectorValues.load(...);
}

...

static class OffHeapQuantizedFloatVectorValues extends FloatVectorValues {
  
  final float[] scratch;
  int curOrd = -1;

  ...

  public  float[] vectorValue(int ord) throws IOException {
    if (ord == curOrd) return scratch;
    dequantize(ord);
    curOrd = ord;
    return scratch;
  }

  private void dequantize(int ord) throws IOException {
    //read quantized values and parameters
    // dequantize into scratch
  }
  
}

Pulkitg64 · 2025-06-18T13:32:36Z

Oh I understand now what you meant in your comments. This approach is much more cleaner and doesn't require any new API addition. Will raise a new revision in some time.

Introduce getQuantizedVectorValues method in LeafReader to access Qua…

fa7f349

…ntizedByteVectorValues

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Jun 16, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jun 16, 2025

github-actions bot added module:core/index module:highlighter module:test-framework labels Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

Pulkitg64 commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

Pulkitg64 commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

Pulkitg64 commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025 •

edited

Loading

Uh oh!

msokolov commented Jun 17, 2025

Uh oh!

Pulkitg64 commented Jun 18, 2025

Uh oh!

benwtrent commented Jun 18, 2025

Uh oh!

Pulkitg64 commented Jun 18, 2025

Uh oh!

Uh oh!

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

Are you sure you want to change the base?

Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues #14792

Conversation

Pulkitg64 commented Jun 16, 2025

Description

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

Pulkitg64 commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

Pulkitg64 commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025

Uh oh!

benwtrent commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msokolov commented Jun 17, 2025

Uh oh!

Pulkitg64 commented Jun 18, 2025

Uh oh!

benwtrent commented Jun 18, 2025

Uh oh!

Pulkitg64 commented Jun 18, 2025

Uh oh!

Uh oh!

benwtrent commented Jun 16, 2025 •

edited

Loading