Skip to content

Tracking Issue: TurboQuant #7830

@connortsui20

Description

@connortsui20

This is a tracking issue for adding TurboQuant quantization as a feature in Vortex, but notably it will not be a new physical encoding for Vector, but rather it will be a new logical type.

Motivation

We would like to support the lossy compression (quantization) of data in Vortex. However, it was not entirely clear how we would achieve this in a file format, where many of our invariants rely on the assumption that our physical encodings are all lossless.

After some internal discussions, we have realized that "lossy" data and compression MUST live at the logical layer, not at the physical layer. It is logical because losing data is a full modification of the data, not just a different way of storing it.

For TurboQuant is particular, this means that it cannot be an encoding of the Vector type. There are too many assumptions that break if were an encoding, such as unit-normalization breaks for quantized vectors, and scalar functions and canonicalization do not return the same result as if we had not quantized the vectors with TQ.

However, this does not mean we want to remove TQ entirely. We see that there is value in building tooling for users who want to purposefully write TQ-quantized vectors into Vortex files, and reading it back knowing that some of the information in the original data has been lost.

Users should be able to write whatever data they want into Vortex, however they want to structure it. But if they want to use a lossy compression / quantization scheme, they need to first transform the data themselves before writing it to Vortex, and they additionally need to make sure that the default Vortex compressor does not recompress the data that they have specially modified.

Design

It is unclear if this is the ideal way to generally design lossy schemes, but it is certainly the easiest in terms of moving forward.

TurboQuant will live in a new vortex-turboquant crate that lives outside of the top-level vortex dependency tree, mimicking a third-party crate. This will carry a new TurboQuant extension type over a struct array that carries all of the components needed to quantize vector data, which is just the norms of the vectors and the codes (indices) into the centroid book (values), and the centroids can be constructed at read time (cached on the # dimensions and bit width).

Creating a new extension type has the benefit that we do not need to customize the behavior of the compressor, as the default compressor will just compress the inner storage StructArray and not canonicalize into a Vector array.

But on the flip side, this means that we can no longer canonicalize into a Vector array, and we have to reimplement all of the scalar functions on vectors (inner product, cosine similarity, etc) specifically for TQ.

We can still mimic canonicalization by having an unpack method that takes a TurboQuant extension array and converts it into a Vector array. This is actually ideal because we do not want to introduce this idea of computation of "lossy" data into the canonicalization system in Vortex, but we still have this functionality available if we don't want to reimplement scalar function logic on TurboQuant arrays.

Steps

  • Initial prototype (see implementation history)
  • Initial benchmarking (see implementation history)
  • Refactor: TurboQuant again! #7829
  • Block decomposition?
  • PDX?
  • Documentation
  • Public API stabilization

Unresolved questions

TODO

Implementation history

A lot carried over from #7297

Metadata

Metadata

Assignees

Labels

tracking-issueShared implementation context for work likely to span multiple PRs.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions