Skip to content

Scaling Vector Search: Numba, Cppyy, PeachPy, and SimSIMD

Compare
Choose a tag to compare
@ashvardanian ashvardanian released this 24 Jun 13:30
· 0 commits to 34ab912f153cd450bbf54bf9b1a0c5d0e75ad1ec since this release

USearch has now been used in two internet-scale companies/projects 馃帀

In one case, it's only 1 Billion vectors. The second is closer to 10 Billion per node. So it feels like a good time to enter the Big ANN competition track 馃悗 Still, we aim for an even larger scale and introduce a few improvements, such as the new CompiledMetric interface that allows one to mix C/C++, Python, and Assembly code for different hardware architectures in the same USearch configuration file 馃く It should come in handy, given the rise in the number of exotic AI hardware vendors 馃槈

SIMD and JIT

The common idea across Unum projects is to use SIMD for hardware acceleration when we hit the wall.
Writing SIMD can be challenging, but distributing it may be even more difficult. You are compiling for modern hardware and modern libraries, and all of a sudden, someone deploys it on an old machine...

illegal instruction

A good balance can be achieved by compiling the core of USearch for higher compatibility and leaving plugs for hardware-specific optimizations. That is where Just-in-Time Compilation comes in. USearch allows combining the pre-compiled HNSW structure with JIT-ed distance metrics, which would tune the resulting assembly for the current target architecture. Such functionality was already supported with Numba JIT, but now you have more flexibility to choose different backends and define functions with different signatures:

  • MetricSignature.ArrayArray, aka float distance(scalar *, scalar *).
  • MetricSignature.ArrayArraySize, aka float distance(scalar *, scalar *, size_t).
  • MetricSignature.ArraySizeArraySize, aka float distance(scalar *, size_t, scalar *, size_t).

This enum is a core part of CompiledMetric and is passed down to our CPython binding.

class CompiledMetric(NamedTuple):
    pointer: int
    kind: MetricKind
    signature: MetricSignature

Numba JIT

Just to remind you, a minimal Numba + USearch example would look like this:

from numba import cfunc, types, carray

ndim = 256
signature = types.float32(
    types.CPointer(types.float32),
    types.CPointer(types.float32))

@cfunc(signature)
def inner_product(a, b):
    a_array = carray(a, ndim)
    b_array = carray(b, ndim)
    c = 0.0
    for i in range(ndim):
        c += a_array[i] * b_array[i]
    return 1 - c

index = Index(ndim=ndim, metric=CompiledMetric(
    pointer=inner_product.address,
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArray,
))

No scary C++ or Assembly involved.

Cppyy

Similarly, you can use Cppyy with Cling to JIT-compile native C or C++ code and pass it to USearch, which may be a good idea if you want to explicitly request loop-unrolling or other low-level optimizations!

import cppyy
import cppyy.ll

cppyy.cppdef("""
float inner_product(float *a, float *b) {
    float result = 0;
#pragma unroll
    for (size_t i = 0; i != ndim; ++i)
        result += a[i] * b[i];
    return 1 - result;
}
""".replace("ndim", str(ndim)))

function = cppyy.gbl.inner_product
index = Index(ndim=ndim, metric=CompiledMetric(
    pointer=cppyy.ll.addressof(function),
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArraySize,
))

PeachPy

Instead of writing in C or C++, you can go one level down and directly assemble a kernel for x86. Below is an example of constructing the "Inner Product" distance for 8-dimensional f32 vectors for x86 using PeachPy.

from peachpy import (
    Argument,
    ptr,
    float_,
    const_float_,
)
from peachpy.x86_64 import (
    abi,
    Function,
    uarch,
    isa,
    GeneralPurposeRegister64,
    LOAD,
    YMMRegister,
    VSUBPS,
    VADDPS,
    VHADDPS,
    VMOVUPS,
    VFMADD231PS,
    VPERM2F128,
    VXORPS,
    RETURN,
)

a = Argument(ptr(const_float_), name="a")
b = Argument(ptr(const_float_), name="b")

with Function(
    "inner_product", (a, b), float_, target=uarch.default + isa.avx2
) as asm_function:
    
    # Request two 64-bit general-purpose registers for addresses
    reg_a, reg_b = GeneralPurposeRegister64(), GeneralPurposeRegister64()
    LOAD.ARGUMENT(reg_a, a)
    LOAD.ARGUMENT(reg_b, b)

    # Load the vectors
    ymm_a = YMMRegister()
    ymm_b = YMMRegister()
    VMOVUPS(ymm_a, [reg_a])
    VMOVUPS(ymm_b, [reg_b])

    # Prepare the accumulator
    ymm_c = YMMRegister()
    ymm_one = YMMRegister()
    VXORPS(ymm_c, ymm_c, ymm_c)
    VXORPS(ymm_one, ymm_one, ymm_one)

    # Accumulate A and B products into C
    VFMADD231PS(ymm_c, ymm_a, ymm_b)

    # Reduce the contents of a YMM register
    ymm_c_permuted = YMMRegister()
    VPERM2F128(ymm_c_permuted, ymm_c, ymm_c, 1)
    VADDPS(ymm_c, ymm_c, ymm_c_permuted)
    VHADDPS(ymm_c, ymm_c, ymm_c)
    VHADDPS(ymm_c, ymm_c, ymm_c)

    # Negate the values from "similarity" to "distance"
    VSUBPS(ymm_c, ymm_one, ymm_c)

    # A common convention is to return floats in XMM registers
    RETURN(ymm_c.as_xmm)

python_function = asm_function.finalize(abi.detect()).encode().load()
metric = CompiledMetric(
    pointer=python_function.loader.code_address,
    kind=MetricKind.IP,
    signature=MetricSignature.ArrayArray,
)
index = Index(ndim=ndim, metric=metric)

SimSIMD

Last but not the list, for the most common metrics and vector spaces, I am going to provide pre-compiled optimized CPython Capsules as part of SimSIMD. Let us know which embeddings you use the most - OpenAI, BERT, ViT, or maybe UForm?

Release Notes: 0.19.0 (2023-06-24)

Add

Fix

  • Re-indexing vectors on restart (529c137)

Improve

  • Preserve metric_kind_t in snapshots (5ec0b65)
  • Support Cppyy JIT compilation (80f99cd)
  • Support signatures for JIT-ed metrics (c89e145)

Hashes

  • Source code (zip): 19c2e9f41134c17ac5db20567b268ee1fce1b4d5dbb42a8b6e5a252ac9e242de
  • Source code (tar.gz): 34e5bd00639504d752f0ab5eb96e62bc34ab18537e6e31823f3e487a4d8139a3