Overall design and process of Gamma engine

Gamma is the core vector storage and retrieval engine in Vearch. Mainly responsible for vector storage, indexing and retrieval. To facilitate practical application scenarios, storage and indexing of scalar fields in documents are also supported. Vector storage mainly supports both of full memory and disk. If the application is not sensitive to query latency and occupies a lot of storage space, disk storage can be selected. Disk storage of vectors uses Rocksdb engine and customerized mmap method. You can choose independently when creating database tables according to your needs. The overall logic of the Gamma engine is shown in Figure 1.

The Gamma engine currently supports four retrieval models including IVFPQ, IVFFLAT, BINARY and HNSW, and independently implements real-time indexes of the above four models, and can efficiently and concurrently support dynamic updates and queries (read and write) of the above indexes. See the paper for detailed implementation. Vector computing is mainly implemented by SIMD instructions in the Faiss project open sourced by Facebook AI Research Institute. The IVFPQ index stores the PQ-quantized codes of the original high-dimensional vector, which occupies less space and is suitable for storage and query of vector data with a scale of more than 100 million. After the vectors are clustered by IVFFLAT index, they are stored in buckets according to the cluster center. By adjusting the number of nprobes in traversal search, the recall rate can reach nearly 100%. Due to the huge amount of computation of brute force search, it is generally only applicable to the data scale of millions or less in the case of CPU. HNSW is a very practical graph index proposed in the field of ANN in recent years. It has fast retrieval speed and high recall rate. However, due to the storage of raw vectors and the adjacent edge relationship of its storage nodes, it occupies a large amount of memory and is generally suitable for the retrieval data scale of 10 million levels. BINARY index supports the calculation of binary hamming distance. Among the above four retrieval models, IVFPQ supports distributed GPU retrieval.

In practical application scenarios, in addition to embedding vectors, a document often needs to store and query some scalar fields related to the document. Therefore, the Gamma engine internally implements the storage and indexing of common scalar fields, which avoids the need for cross-process or network-intensive calls when associating scalar fields in actual applications. In addition, the scalar field index helps avoid redundant vector calculations.

Figure 1