This is a tracking issue for adding TurboQuant quantization as a feature in Vortex, but notably it will not be a new physical encoding for Vector, but rather it will be a new logical type.
Motivation
We would like to support the lossy compression (quantization) of data in Vortex. However, it was not entirely clear how we would achieve this in a file format, where many of our invariants rely on the assumption that our physical encodings are all lossless.
After some internal discussions, we have realized that "lossy" data and compression MUST live at the logical layer, not at the physical layer. It is logical because losing data is a full modification of the data, not just a different way of storing it.
For TurboQuant is particular, this means that it cannot be an encoding of the Vector type. There are too many assumptions that break if were an encoding, such as unit-normalization breaks for quantized vectors, and scalar functions and canonicalization do not return the same result as if we had not quantized the vectors with TQ.
However, this does not mean we want to remove TQ entirely. We see that there is value in building tooling for users who want to purposefully write TQ-quantized vectors into Vortex files, and reading it back knowing that some of the information in the original data has been lost.
Users should be able to write whatever data they want into Vortex, however they want to structure it. But if they want to use a lossy compression / quantization scheme, they need to first transform the data themselves before writing it to Vortex, and they additionally need to make sure that the default Vortex compressor does not recompress the data that they have specially modified.
Design
It is unclear if this is the ideal way to generally design lossy schemes, but it is certainly the easiest in terms of moving forward.
TurboQuant will live in a new vortex-turboquant crate that lives outside of the top-level vortex dependency tree, mimicking a third-party crate. This will carry a new TurboQuant extension type over a struct array that carries all of the components needed to quantize vector data, which is just the norms of the vectors and the codes (indices) into the centroid book (values), and the centroids can be constructed at read time (cached on the # dimensions and bit width).
Creating a new extension type has the benefit that we do not need to customize the behavior of the compressor, as the default compressor will just compress the inner storage StructArray and not canonicalize into a Vector array.
But on the flip side, this means that we can no longer canonicalize into a Vector array, and we have to reimplement all of the scalar functions on vectors (inner product, cosine similarity, etc) specifically for TQ.
We can still mimic canonicalization by having an unpack method that takes a TurboQuant extension array and converts it into a Vector array. This is actually ideal because we do not want to introduce this idea of computation of "lossy" data into the canonicalization system in Vortex, but we still have this functionality available if we don't want to reimplement scalar function logic on TurboQuant arrays.
Steps
Unresolved questions
TODO
Implementation history
A lot carried over from #7297
This is a tracking issue for adding TurboQuant quantization as a feature in Vortex, but notably it will not be a new physical encoding for
Vector, but rather it will be a new logical type.Motivation
We would like to support the lossy compression (quantization) of data in Vortex. However, it was not entirely clear how we would achieve this in a file format, where many of our invariants rely on the assumption that our physical encodings are all lossless.
After some internal discussions, we have realized that "lossy" data and compression MUST live at the logical layer, not at the physical layer. It is logical because losing data is a full modification of the data, not just a different way of storing it.
For TurboQuant is particular, this means that it cannot be an encoding of the
Vectortype. There are too many assumptions that break if were an encoding, such as unit-normalization breaks for quantized vectors, and scalar functions and canonicalization do not return the same result as if we had not quantized the vectors with TQ.However, this does not mean we want to remove TQ entirely. We see that there is value in building tooling for users who want to purposefully write TQ-quantized vectors into Vortex files, and reading it back knowing that some of the information in the original data has been lost.
Users should be able to write whatever data they want into Vortex, however they want to structure it. But if they want to use a lossy compression / quantization scheme, they need to first transform the data themselves before writing it to Vortex, and they additionally need to make sure that the default Vortex compressor does not recompress the data that they have specially modified.
Design
It is unclear if this is the ideal way to generally design lossy schemes, but it is certainly the easiest in terms of moving forward.
TurboQuant will live in a new
vortex-turboquantcrate that lives outside of the top-levelvortexdependency tree, mimicking a third-party crate. This will carry a newTurboQuantextension type over a struct array that carries all of the components needed to quantize vector data, which is just the norms of the vectors and the codes (indices) into the centroid book (values), and the centroids can be constructed at read time (cached on the # dimensions and bit width).Creating a new extension type has the benefit that we do not need to customize the behavior of the compressor, as the default compressor will just compress the inner storage
StructArrayand not canonicalize into aVectorarray.But on the flip side, this means that we can no longer canonicalize into a
Vectorarray, and we have to reimplement all of the scalar functions on vectors (inner product, cosine similarity, etc) specifically for TQ.We can still mimic canonicalization by having an
unpackmethod that takes aTurboQuantextension array and converts it into aVectorarray. This is actually ideal because we do not want to introduce this idea of computation of "lossy" data into the canonicalization system in Vortex, but we still have this functionality available if we don't want to reimplement scalar function logic onTurboQuantarrays.Steps
Unresolved questions
TODO
Implementation history
A lot carried over from #7297
Vectorextension type: Vector Extension Type #6964cosine_similarity: Vortex Fixed-Shape Tensor #6812l2_norm: Vector Extension Type #6964l2_denorm: L2 Denorm expression #7329inner_product: TurboQuant encoding for Vectors #7269sorfor some "make random" reversible expression: Pull outL2Denormfrom TurboQuant #7349TurboQuantmetadata to be protobuf #7301L2Denorm(norms, Sorf(matrix, Dict(centroids, codes)))L2Denormfrom TurboQuant #7349Constantchildren #7394InnerProductoptimizations #7396vector-search-benchbenchmarking crate #7458vortex-tensor#7525vortex-tensoreven more #7610